library(readr)
<- "https://raw.githubusercontent.com/FAO-SID/SoilFER-Spec/main/data/dat2.csv"
url <- read_csv(url) dat2
Load and prepare the data.
The data is available in the data
folder of the repository. Is in a CSV format and can be loaded using the read_csv
function from the readr
package.
Now we will follow the instructions of the Soil spectroscopy training material changing the name of the first column (index of the samples) to “sample”.
colnames(dat2)[1] <- "sample"
head(dat2, c(10, 7))
# A tibble: 10 × 7
sample Organic_Carbon Clay Silt Sand X350 X351
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.795 15.0 40.1 44.9 0.0622 0.0624
2 2 0.696 8.79 52.5 38.7 0.0735 0.0725
3 3 1.46 13.8 41.8 44.4 0.0685 0.0699
4 4 3.36 31.1 49.5 19.4 0.0590 0.0615
5 5 3.71 33.1 62.3 4.6 0.0624 0.0626
6 6 1.39 33.3 60.6 6.1 0.0997 0.0988
7 7 3.38 53.8 44.7 1.5 0.0920 0.0936
8 8 3.16 27.3 66.3 6.4 0.0703 0.0673
9 9 3.49 30.5 48.5 21 0.0473 0.0440
10 10 2.34 21.5 61.5 17 0.0666 0.0691
Now we have the sample number, the four parameters and the wavelengths from 350 to 2500 nm in steps of 1 nm. In total we have 2151 wavelengths also called datapoints. Our next step is to remove the X from the wavelengths column names. There are several ways to do this. One of the most common ways is to use the gsub
function which replaces all occurrences of a pattern in a string with a replacement string. In this case, we want to replace “X” with ““ (an empty string). We can do this for all the columns except the first four (sample number and parameters) using the following code:
colnames(dat2)[-c(1:5)] <- gsub("X", "", colnames(dat2)[-c(1:5)])
head(dat2, c(10, 7))
# A tibble: 10 × 7
sample Organic_Carbon Clay Silt Sand `350` `351`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.795 15.0 40.1 44.9 0.0622 0.0624
2 2 0.696 8.79 52.5 38.7 0.0735 0.0725
3 3 1.46 13.8 41.8 44.4 0.0685 0.0699
4 4 3.36 31.1 49.5 19.4 0.0590 0.0615
5 5 3.71 33.1 62.3 4.6 0.0624 0.0626
6 6 1.39 33.3 60.6 6.1 0.0997 0.0988
7 7 3.38 53.8 44.7 1.5 0.0920 0.0936
8 8 3.16 27.3 66.3 6.4 0.0703 0.0673
9 9 3.49 30.5 48.5 21 0.0473 0.0440
10 10 2.34 21.5 61.5 17 0.0666 0.0691
Now we can isolate the wavelengths into a matrix. We can do this by using the as.matrix
function. The as.matrix
function converts a data frame to a matrix. In this case, we want to convert all columns except the first four columns (sample number and parameters) to a matrix. We can do this using the following code:
<- as.matrix(dat2[, -c(1:5)]) my_spectra
Now we remove the wavelengths columns from dat2:
<- dat2[, c(1:5)] dat
and reassign the spectra to dat as a single variable called spc_raw, and remove my_spectra from the environment.
$spc_raw <- my_spectra
datrm(my_spectra)
As a first check we can use matplot to visualize the spectra. We can use the matplot
function to plot the spectra. The matplot
function is used to plot matrices. In this case, we want to plot the spectra (the matrix) against the wavelengths (the column names of the matrix). We can do this using the following code:
Plot the spectra (using classical R)
matplot(colnames(dat$spc_raw), t(dat$spc_raw), type = "l", lty = 1, col = "grey", xlab = "Wavelength (nm)", ylab = "Reflectance (%)", main = "NIR Spectra of the samples")
This is my first approach to this training material. As you see in the paper the authors use this time the ggplot2
package to plot the spectra, and I will use it in the next post.
Bibliography:
Soil spectroscopy training material Wadoux, A., Ramirez-Lopez, L., Ge, Y., Barra, I. & Peng, Y. 2025. A course on applied data analytics for soil analysis with infrared spectroscopy – Soil spectroscopy training manual 2. Rome, FAO.