Chinese Journal of Quantum Electronics ›› 2025, Vol. 42 ›› Issue (3): 313-323.doi: 10.3969/j.issn.1007-5461.2025.03.003

• Spectroscopy • Previous Articles     Next Articles

Determination of lycopene in cherry tomatoes using near infrared spectroscopy combined with machine learning

GAO Xiangkun 1,2 , DONG Xuan 1,2 , LIU Chao 1,2 , ZHAN Jie 1 , HUANG Qing 1*   

  1. 1 Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; 2 University of Science and Technology of China, Hefei 230026, China
  • Received:2023-03-12 Revised:2023-03-31 Published:2025-05-28 Online:2025-05-28
  • Supported by:

Abstract: Qualitative and quantitative analysis models were established using machine learning algorithms for near infrared (NIR) spectroscopy detection of lycopene in cherry tomatoes. Firstly, the extraction and detection methods of lycopene were optimized, and then based on the selected spectral in the bands of 7000 – 8000 cm–1 and 10000 – 11000 cm–1 , a synergy interval partial least squares model (siPLS) for the prediction of lycopene content in cherry tomatoes was established. Compared with the commonly used partial least squares (PLS) quantitative model at present, the siPLS model has a certain improvement in the prediction accuracy, with training set correlation coefficient Rc of 0.8008, training set cross validation root mean square error ERMSECV of 9.56 mg/kg, and test set correlation coefficient Rp of 0.8683, test set root mean square error ERMSEP of 4.59 mg/kg. Furthermore, the support vector regression (SVR) algorithm was introduced to establish a quantitative model, and the comparison results show that the SVR model has better performance than the siPLS model, with Rc=0.9559, ERMSEC= 4.229 mg/kg and Rp=0.8959, ERMSEP=8.363 mg/kg. Finally, a concentration classification model of lycopene in cherry tomato was established based on the support vector machine (SVM) and multi-channel convolutional neural network-gated recurrent unit (CNN-GRU) joint model, and the result shows that compared with the SVR model, the multi-channel CNN-GRU joint model has higher qualitative recognition accuracy.

Key words: spectroscopy, qualitative and quantitative analysis models, machine learning, lycopene, cherry tomato, synergy interval partial least squares

CLC Number: