|本期目录/Table of Contents|

[1]刘德真,李圆媛*.基于深度学习和多组学数据的肺腺癌分期预测研究[J].武汉工程大学学报,2024,46(02):190-196.[doi:10.19843/j.cnki.CN42-1779/TQ.202307022]
 LIU Dezhen,LI Yuanyuan*.Stage prediction of lung adenocarcinoma based on deep learning andmulti-omics data[J].Journal of Wuhan Institute of Technology,2024,46(02):190-196.[doi:10.19843/j.cnki.CN42-1779/TQ.202307022]
点击复制

基于深度学习和多组学数据的肺腺癌分期预测研究(/HTML)
分享到:

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:
46
期数:
2024年02期
页码:
190-196
栏目:
机电与信息工程
出版日期:
2024-04-28

文章信息/Info

Title:
Stage prediction of lung adenocarcinoma based on deep learning and
multi-omics data
文章编号:
1674 - 2869(2024)02 - 0190 - 07
作者:
刘德真李圆媛*
武汉工程大学光电信息与能源工程学院、数理学院,湖北 武汉 430205
Author(s):
LIU DezhenLI Yuanyuan*
School of Optical Information and Energy Engineering,School of Mathematics and Physics,
Wuhan Institute of Technology,Wuhan 430205,China
关键词:
肺腺癌分期深度学习集成策略随机森林算法
Keywords:
staging of lung adenocarcinoma deep learning integration strategy random forest algorithm
分类号:
TP18;Q811.4
DOI:
10.19843/j.cnki.CN42-1779/TQ.202307022
文献标志码:
A
摘要:
为解决癌症分期难以精准决策这一问题,对452例肺腺癌患者的信使核糖核酸(mRNA)转录数据、微核糖核酸(miRNA)转录数据和DNA甲基化3种组学数据进行集成融合,并采用随机森林算法进行分期预测。首先对从癌症基因组图谱(TCGA)数据库获取的3种组学数据进行预处理,将mRNA转录数据和DNA甲基化数据进行基因位点匹配,再使用4种不同的多组学集成策略对预处理后的组学数据进行集成,最后使用随机森林算法对集成后的数据进行分期预测并使用准确度、卡帕系数以及曲线下面积(AUC)作为预测效果的评价指标。研究结果显示,采用多组学集成策略在分期预测上具有更高的准确率,其中基于深度学习的集成策略的预测效果最好,评价指标分别为0.940、0.931和0.986,有希望应用于未来的肺腺癌分期预测中。

Abstract:
To improve accuracy in decision-making in cancer staging, this study integrated three kinds of omics data, including messenger ribonucleic acid(mRNA) transcript data,micro ribonucleic acid(miRNA) transcript data and DNA methylation,from 452 lung adenocarcinoma patients,and used random forest algorithm to predict stages. First, three kinds of omics data obtained from the cancer genome altas(TCGA) database were preprocessed and the mRNA sequencing data were matched up with DNA methylation data at gene loci, then four different multi-omics integration strategies were adopted to integrate the preprocessed data, and finally a random forest algorithm was applied to the integrated data for the prediction of staging, and accuracy, Kappa coefficient and the area under the curve(AUC) were used to evaluate the performance of the prediction. The results show that adoption of the multi-omics integration strategies can achieve high accuracy. The integration strategy based on deep learning is considered as the most effective one,with accuracy, Kappa coefficient and AUC values of 0.940, 0.931 and 0.986, respectively, and it can offer relevant guidance for the lung adenocarcinoma staging prediction in the future.

参考文献/References:

[1] LANCET T. Globocan 2018:counting the toll of cancer[J].Lancet(London,England),2018,392(10152):985

[2] ZHOU C C. Lung cancer molecular epidemiology in China:recent trends[J]. Translational Lung Cancer Research,2014,3(5):270-279.
[3] GOODGAME B,VISWANATHAN A,MILLER C R, et al. A clinical model to estimate recurrence risk in resected stage I non-small cell lung cancer[J]. American Journal of Clinical Oncology,2008,31(1):22-28.
[4] 余显学. 基于基因表达数据的癌症亚型发现双聚类方法研究[D]. 重庆:西南大学,2018.
[5] MALLICK P K,MOHAPATRA S K,CHAE G S,et al. Convergent learning-based model for leukemia classification from gene expression[J]. Personal and Ubiquitous Computing,2023,27(3):1103-1110.
[6] XU Q,CHEN Y R. An aging-related gene signature-based model for risk stratification and prognosis prediction in lung adenocarcinoma[J]. Frontiers in Cell and Developmental Biology,2021,9:685379.
[7] NAEEM A,KHAN A H,AYUBI S U D,et al. Predicting the metastasis ability of prostate cancer using machine learning classifiers[J]. Journal of Computing & Biomedical Informatics,2023,4(2):1-7.
[8] AHMED Z. Practicing precision medicine with intelligently integrative clinical and multi-omics data analysis[J]. Human Genomics,2020,14(1):35.
[9] ZARAYENEH N,KO E,OH J H,et al. Integration of multi-omics data for integrative gene regulatory network inference[J]. International Journal of Data Mining and Bioinformatics,2017,18(3):223-239.
[10] RAPPOPORT N,SAFRA R,SHAMIR R. MONET:multi-omic module discovery by omic selection[J]. PLOS Computational Biology,2020,16(9):1008182.
[11] TINI G, MARCHETTI L, PRIAMI C, et al. Multi-omics integration—a comparison of unsupervised clustering methodologies[J]. Briefings in Bioinformatics,2019,20(4):1269-1279.
[12] TOMCZAK K,CZERWI?SKA P,WIZNEROWICZ M. The cancer genome atlas(TCGA):an immeasurable source of knowledge[J]. Contemporary Oncology,2015,19(1A):68-77.
[13] JENSEN M A, FERRETTI V, GROSSMAN R L,et al. The NCI genomic data commons as an engine for precision medicine[J]. Blood,2017,130(4):453-459.
[14] TROYANSKAYA O,CANTOR M,SHERLOCK G,et al. Missing value estimation methods for DNA microarrays[J]. Bioinformatics,2001,17(6):520-525.
[15] WOLD S,ESBENSEN K,GELADI P. Principal component analysis[J]. Chemometrics and Intelligent Laboratory Systems,1987,2(1/2/3):37-52.
[16] VAN DER MAATEN L,HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008,9(11):2579-2605.
[17] BECHT E,MCINNES L,HEALY J,et al. Dimensionality reduction for visualizing single-cell data using UMAP[J]. Nature Biotechnology,2019,37(1):38-44.
[18] PICARD M,SCOTT-BOYER M P,BODEIN A,et al. Integration strategies of multi-omics data for machine learning analysis[J]. Computational and Structural Biotechnology Journal,2021,19:3735-3747.
[19] HINTON G E,SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science,2006,313(5786):504-507.
[20] 张健,丁世飞,张楠,等. 受限玻尔兹曼机研究综述[J]. 软件学报,2019,30(7):2073-2090.
[21] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research,2010,11(12):3371-3408.
[22] HUANG S Y,YEH Y R,EGUCHI S. Robust kernel principal component analysis[J]. Neural Computation, 2009,21(11):3179-3213.
[23] LAI P L, FYFE C. Kernel and nonlinear canonical correlation analysis[J]. International Journal of Neural Systems,2000,10(5):365-377.
[24] BELKIN M,NIYOGI P. Laplacian eigenmaps for dimensionality reduction and data representation[J]. Neural Computation,2003,15(6):1373-1396.
[25] 班伟. 受试者工作特征曲线评估血清β2-MG、CEA、CA125、NSE、CYFRA21-1联合诊断早期肺癌价值[J].吉林医学,2022,43(5):1384-1386.
[26] TANG W,HU J,ZHANG H,et al. Kappa coefficient:a popular measure of rater agreement[J]. Shanghai Archives of Psychiatry,2015,27(1):62-67.

相似文献/References:

[1]汪然然,娄联堂*.基于图像分析和深度学习的复合绝缘子憎水性分级[J].武汉工程大学学报,2021,43(05):580.[doi:10.19843/j.cnki.CN42-1779/TQ. 202106003]
 WANG Ranran,LOU Liantang*.Hydrophobicity Classification of Composite Insulators Based on Image Analysis and Deep Learning[J].Journal of Wuhan Institute of Technology,2021,43(02):580.[doi:10.19843/j.cnki.CN42-1779/TQ. 202106003]

备注/Memo

备注/Memo:
收稿日期:2023-07-24
基金项目:国家自然科学基金(12001408)
作者简介:刘德真,硕士研究生。Email:1205125262@qq.com
*通信作者:李圆媛,博士,教授。Email:yuanyuanli_wit@hotmail.com
引文格式:刘德真,李圆媛. 基于深度学习和多组学数据的肺腺癌分期预测研究[J]. 武汉工程大学学报,2024,46(2):190-196.

更新日期/Last Update: 2024-05-01