New Soil Near Infrared Training Material (part 1)

In this post we will load the NIR data and prepare it for visualization
Author

José Ramón Cuesta

Published

April 24, 2025

Load and prepare the data.

The data is available in the data folder of the repository. Is in a CSV format and can be loaded using the read_csv function from the readr package.

library(readr)

url <- "https://raw.githubusercontent.com/FAO-SID/SoilFER-Spec/main/data/dat2.csv"
dat2 <- read_csv(url)

Now we will follow the instructions of the Soil spectroscopy training material changing the name of the first column (index of the samples) to “sample”.

colnames(dat2)[1] <- "sample"
head(dat2, c(10, 7))
# A tibble: 10 × 7
   sample Organic_Carbon  Clay  Silt  Sand   X350   X351
    <dbl>          <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
 1      1          0.795 15.0   40.1  44.9 0.0622 0.0624
 2      2          0.696  8.79  52.5  38.7 0.0735 0.0725
 3      3          1.46  13.8   41.8  44.4 0.0685 0.0699
 4      4          3.36  31.1   49.5  19.4 0.0590 0.0615
 5      5          3.71  33.1   62.3   4.6 0.0624 0.0626
 6      6          1.39  33.3   60.6   6.1 0.0997 0.0988
 7      7          3.38  53.8   44.7   1.5 0.0920 0.0936
 8      8          3.16  27.3   66.3   6.4 0.0703 0.0673
 9      9          3.49  30.5   48.5  21   0.0473 0.0440
10     10          2.34  21.5   61.5  17   0.0666 0.0691

Now we have the sample number, the four parameters and the wavelengths from 350 to 2500 nm in steps of 1 nm. In total we have 2151 wavelengths also called datapoints. Our next step is to remove the X from the wavelengths column names. There are several ways to do this. One of the most common ways is to use the gsub function which replaces all occurrences of a pattern in a string with a replacement string. In this case, we want to replace “X” with ““ (an empty string). We can do this for all the columns except the first four (sample number and parameters) using the following code:

colnames(dat2)[-c(1:5)] <- gsub("X", "", colnames(dat2)[-c(1:5)])
head(dat2, c(10, 7))
# A tibble: 10 × 7
   sample Organic_Carbon  Clay  Silt  Sand  `350`  `351`
    <dbl>          <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
 1      1          0.795 15.0   40.1  44.9 0.0622 0.0624
 2      2          0.696  8.79  52.5  38.7 0.0735 0.0725
 3      3          1.46  13.8   41.8  44.4 0.0685 0.0699
 4      4          3.36  31.1   49.5  19.4 0.0590 0.0615
 5      5          3.71  33.1   62.3   4.6 0.0624 0.0626
 6      6          1.39  33.3   60.6   6.1 0.0997 0.0988
 7      7          3.38  53.8   44.7   1.5 0.0920 0.0936
 8      8          3.16  27.3   66.3   6.4 0.0703 0.0673
 9      9          3.49  30.5   48.5  21   0.0473 0.0440
10     10          2.34  21.5   61.5  17   0.0666 0.0691

Now we can isolate the wavelengths into a matrix. We can do this by using the as.matrix function. The as.matrix function converts a data frame to a matrix. In this case, we want to convert all columns except the first four columns (sample number and parameters) to a matrix. We can do this using the following code:

my_spectra <- as.matrix(dat2[, -c(1:5)])

Now we remove the wavelengths columns from dat2:

dat <- dat2[, c(1:5)]

and reassign the spectra to dat as a single variable called spc_raw, and remove my_spectra from the environment.

dat$spc_raw <- my_spectra
rm(my_spectra)

As a first check we can use matplot to visualize the spectra. We can use the matplot function to plot the spectra. The matplot function is used to plot matrices. In this case, we want to plot the spectra (the matrix) against the wavelengths (the column names of the matrix). We can do this using the following code:

Plot the spectra (using classical R)

matplot(colnames(dat$spc_raw), t(dat$spc_raw), type = "l", lty = 1, col = "grey", xlab = "Wavelength (nm)", ylab = "Reflectance (%)", main = "NIR Spectra of the samples")

This is my first approach to this training material. As you see in the paper the authors use this time the ggplot2 package to plot the spectra, and I will use it in the next post.

Bibliography:

Soil spectroscopy training material Wadoux, A., Ramirez-Lopez, L., Ge, Y., Barra, I. & Peng, Y. 2025. A course on applied data analytics for soil analysis with infrared spectroscopy – Soil spectroscopy training manual 2. Rome, FAO.