量子电子学报 ›› 2025, Vol. 42 ›› Issue (3): 313-323.doi: 10.3969/j.issn.1007-5461.2025.03.003

• 光谱 • 上一篇    下一篇

近红外光谱联合机器学习测定樱桃番茄中的番茄红素

高翔堃 1,2, 董 璇 1,2, 刘 超 1,2, 詹 杰 1, 黄 青 1*   

  1. 1 中国科学院合肥物质科学研究院, 安徽 合肥 230031; 2 中国科学技术大学, 安徽 合肥 230026
  • 收稿日期:2023-03-12 修回日期:2023-03-31 出版日期:2025-05-28 发布日期:2025-05-28
  • 通讯作者: E-mail: huangq@ipp.ac.cn E-mail:E-mail: huangq@ipp.ac.cn
  • 作者简介:高翔堃 ( 1996 - ), 山西太原人, 研究生, 主要从事光谱技术与应用方面的研究。E-mail: gxk@mail.ustc.edu.cn
  • 基金资助:
    安徽省中央引导地方科技发展专项资金项目 (S20200706050011)

Determination of lycopene in cherry tomatoes using near infrared spectroscopy combined with machine learning

GAO Xiangkun 1,2 , DONG Xuan 1,2 , LIU Chao 1,2 , ZHAN Jie 1 , HUANG Qing 1*   

  1. 1 Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China; 2 University of Science and Technology of China, Hefei 230026, China
  • Received:2023-03-12 Revised:2023-03-31 Published:2025-05-28 Online:2025-05-28
  • Supported by:

摘要: 针对樱桃番茄中番茄红素的近红外光谱检测, 采用机器学习算法建立了定性和定量分析模型。首先对番茄 红素的提取与检测方法进行优化, 然后基于7000~8000 cm−1 和10000~11000 cm−1 两个波段的光谱, 建立了用于樱桃 番茄中番茄红素含量预测的组合间隔偏最小二乘 (siPLS) 模型。相较于现阶段的偏最小二乘 (PLS) 定量模型, siPLS 模 型 在 预 测 准 确 度 方 面 有 一 定 的 提 升 , 其 训 练 集 相 关 系 数 Rc = 0.8008, 训 练 集 交 叉 验 证 均 方 根 误 差 ERMSECV = 9.56 mg/kg, 测试集相关系数Rp=0.8683, 测试集均方根误差ERMSEP = 4.59 mg/kg。进一步引入回归型支持向 量 机 (SVR) 算 法 建 立 定 量 模 型 , 对 比 分 析 表 明 , SVR 模 型 比 siPLS 模 型 的 性 能 更 优 , 其 Rc=0.9559, ERMSEC = 4.229 mg/kg; Rp=0.8959, ERMSEP = 8.363 mg/kg。最后, 基于支持向量机 (SVM) 和多通道卷积神经网络 (CNN)-门控循环单元 (GRU) 联合模型, 建立了樱桃番茄中番茄红素的浓度分类模型。结果表明, 相较于SVR模型, 多通道CNN-GRU联合模型具有更高的定性识别准确率。

关键词: 光谱学, 定性和定量分析模型, 机器学习, 番茄红素, 樱桃番茄, 组合间隔偏最小二乘

Abstract: Qualitative and quantitative analysis models were established using machine learning algorithms for near infrared (NIR) spectroscopy detection of lycopene in cherry tomatoes. Firstly, the extraction and detection methods of lycopene were optimized, and then based on the selected spectral in the bands of 7000 – 8000 cm–1 and 10000 – 11000 cm–1 , a synergy interval partial least squares model (siPLS) for the prediction of lycopene content in cherry tomatoes was established. Compared with the commonly used partial least squares (PLS) quantitative model at present, the siPLS model has a certain improvement in the prediction accuracy, with training set correlation coefficient Rc of 0.8008, training set cross validation root mean square error ERMSECV of 9.56 mg/kg, and test set correlation coefficient Rp of 0.8683, test set root mean square error ERMSEP of 4.59 mg/kg. Furthermore, the support vector regression (SVR) algorithm was introduced to establish a quantitative model, and the comparison results show that the SVR model has better performance than the siPLS model, with Rc=0.9559, ERMSEC= 4.229 mg/kg and Rp=0.8959, ERMSEP=8.363 mg/kg. Finally, a concentration classification model of lycopene in cherry tomato was established based on the support vector machine (SVM) and multi-channel convolutional neural network-gated recurrent unit (CNN-GRU) joint model, and the result shows that compared with the SVR model, the multi-channel CNN-GRU joint model has higher qualitative recognition accuracy.

Key words: spectroscopy, qualitative and quantitative analysis models, machine learning, lycopene, cherry tomato, synergy interval partial least squares

中图分类号: