Neural networks determining wine composition with IR spectroscopy: overcoming data scarcity issues
O. Sarmanova12*, L. Utegenova2, A. Guskov1, S. Burikov12, I. Plastinin1,
T. Dolenko12, S. Dolenko1
1- Skobeltsyn Research Institute of Nuclear Physics, Lomonosov Moscow State University, 1/2 Leninskie Gory,
Moscow, 119991, Russian Federation 2- Faculty of Physics, Lomonosov Moscow State University, 1/2 Leninskie Gory, Moscow, 119991, Russian Federation
* oe.sarmanova@physics.msu.ru
Wine is a complex object containing various alcohols (ethanol, methanol, glycerol, aliphatic and aromatic alcohols), esters, acetals, waxes and oils, carbohydrates, organic acids, mineral and phenolic compounds, vitamin-like substances, etc. [1]. Concentration of these substances must be controlled during the wine production process via cheap, express and non-destructive analysis method.
In this work, the problem of determining the composition of wines by their IR absorption spectra using neural networks (NN) was solved. The IR spectrum of wine consists of a set of overlapping bands of different shapes and intensities characteristic of its constituent substances and it is deformed due to interactions between their molecules. Using IR spectra to train adaptive models requires obtaining a representative data set, what may be difficult to do experimentally.
In this study it is proposed to use IR absorption spectra of solutions modeling white and red wines to circumvent the problem of data scarcity and lack of models to analyze the IR absorption spectra of real wines. Therefore, the main components of model solutions were selected in such a way that the IR absorption spectra of the model solutions were similar to the IR spectra of both white and red wines. This approach allows experimentally obtaining a representative set of IR spectra with known concentrations of the main components of wines for NN training and using trained networks to determine the concentrations of these components in real wines.
Aqueous solutions containing ethanol, a mixture of glucose, fructose and sucrose, tartaric, malic and citric acids, glycerin and sulfur dioxide were used as models. The concentration ranges of these components in the model solutions were from 8 to 18 vol.% for ethanol, from 10 to 200 g/l for sugars (glucose, fructose and sucrose), from 3 to 12 g/l for acids (tartaric, malic and citric acid), from 5 to 25 g/l for glycerin, from 0.1 to 1 g/l for sulfur dioxide. A total of 1734 IR absorption spectra of model solutions were experimentally obtained.
Partial least squares (PLS), multilayer perceptron (MLP), convolutional neural network (CNN) were trained to predict wine components concentration in model solutions and real wines. Even though PLS and CNN demonstrated a similar level of mean absolute error (MAE) in determining the concentration of components in model solutions, PLS could not determine the concentration of the desired components in real wines. The MAE for determining the concentration of ethanol, sugar, acids, and glycerol with CNN in real wines (10 samples) was 0.8 vol.%, 5.1 g/L, 1.6 g/L, 1.9 g/L, respectively, which satisfies the needs of winemaking.
The study was carried out at the expense of the grant No. 24-11-00266 from the Russian Science Foundation, https://rscf.ru/en/project/24-11-00266/.
[1] M. Butnariu and A. Butu, Qualitative and Quantitative Chemical Composition of Wine, Quality Control in the Beverage Industry, 385417, (2019).