|本期目录/Table of Contents|

[1]张 晴,吴晓晓,李 想,等.基于自监督预训练的单细胞类型注释方法 [J].武汉工程大学学报,2026,48(01):103-110.[doi:10.19843/j.cnki.CN42-1779/TQ.202412007]
 ZHANG Qing,WU Xiaoxiao,LI Xiang,et al.A single-cell type annotation method based on self-supervised pretraining [J].Journal of Wuhan Institute of Technology,2026,48(01):103-110.[doi:10.19843/j.cnki.CN42-1779/TQ.202412007]
点击复制

基于自监督预训练的单细胞类型注释方法
(/HTML)

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:
48
期数:
2026年01期
页码:
103-110
栏目:
智能制造
出版日期:
2026-02-28

文章信息/Info

Title:
A single-cell type annotation method based on self-supervised pretraining
文章编号:
1674 - 2869(2026)01 - 0103 - 08
作者:
张 晴1吴晓晓1李 想1马 威1吴通权2谢诒诚*3吴兴隆*1
1. 武汉工程大学计算机科学与工程学院,湖北 武汉 430205;
2. 合肥工业大学机械工程学院,安徽 合肥 230009;
3. 浙江大学医学院附属儿童医院皮肤科,浙江 杭州 310052

Author(s):
ZHANG Qing1 WU Xiaoxiao1 LI Xiang1 MA Wei1 WU Tongquan2 XIE Yicheng*3 WU Xinglong*1
1. School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China;
2. School of Mechanical?Engineering, Hefei University of Technology, Hefei 230009, China;
3. Department of Dermatology, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou 310052, China

关键词:
细胞类型注释自监督预训练深度学习单细胞RNA测序
Keywords:

分类号:
TP391.1
DOI:
10.19843/j.cnki.CN42-1779/TQ.202412007
文献标志码:
A
摘要:
为了应对单细胞RNA测序中准确注释细胞类型的挑战,提出了基于迁移学习和Transformer的深度学习网络单细胞标签注释网络(ScLabel-Net),旨在对小鼠肺部的大规模单细胞数据集进行高效、准确的细胞类型注释。ScLabel-Net首先在约10万个细胞的单细胞肺部数据集上进行预训练,通过自监督学习捕捉基因间的相似性,然后将模型迁移到相对较少的数据集上,对特定细胞类型注释任务进行微调。考虑到单细胞数据中常见的细胞类型分布不平衡现象,微调数据集时应用了随机上采样技术,以减轻分布不平衡对注释结果的影响。实验结果表明,ScLabel-Net在GSE267861、GSE264032和Quake等3个小鼠肺部数据集上的细胞类型注释准确率分别达到0.955、0.922和0.986。此外,ScLabel-Net在小鼠其他器官(如气管、肾脏、胰腺)的单细胞数据集上也表现出优异的泛化能力,准确率分别达到 0.981、0.951和0.987,验证了ScLabel-Net跨器官的适用性,进一步证明了ScLabel-Net在复杂生物系统和疾病研究中的广泛应用潜力。
Abstract:
To address the challenges in accurate cell type annotation in single-cell RNA sequencing, such as efficiently and accurately annotate cell types in large-scale single-cell datasets from mouse lung, a deep learning single-cell label annotation network (ScLabel-Net) based on transfer learning and Transformer was proposed. ScLabel-Net was first pre-trained on a single-cell lung dataset containing approximately 100 000 cells, leveraging self-supervised learning to capture inter-gene relationships. It was then transferred to smaller datasets for fine-tuning, enabling precise cell type annotation. To mitigate the impact of class imbalance (a common issue in single-cell data), a random up-sampling technique was applied during fine-tuning. Experimental results showed that ScLabel-Net achieved cell type annotation accuracies of 0.955, 0.922, and 0.986 on three mouse lung datasets (GSE267861, GSE264032, and Quake). Moreover, ScLabel-Net also demonstrated excellent generalization ability on single-cell datasets from other mouse organs (trachea, kidney, and pancreas) with accuracies of 0.981, 0.951, and 0.987 respectively, verifying cross-organ applicability of ScLabel-Net and its potential for application in fields of complex biological systems and diseases.

参考文献/References:

[ 1 ] JOVIC D,LIANG X,ZENG H,et al. Single-cell RNA sequencing technologies and applications:a brief overview[J]. Clinical and Translational Medicine,2022,12(3):e694.
[ 2 ] 丁宁,张然然,范广轩,等. 基于单细胞转录组测序的鹿茸生长中心细胞异质性研究[J]. 黑龙江畜牧兽医,2024(19):108-112,123-127.
[ 3 ] CUI H T,WANG C,MAAN H,et al. scGPT:toward building a foundation model for single-cell multi-omics using generative AI[J]. Nature Methods,2024,21(8):1470-1480.
[ 4 ] 袁佳欣,刘宏德. scTransformer:一种基于深度学习的单细胞类型识别方法[J]. 生物信息学,2025,23(2):96-110.
[ 5 ] FRANZéN O, GAN L M, BJ?RKEGREN J L M. PanglaoDB:a web server for exploration of mouse and human single-cell RNA sequencing data[J]. Database,2019,2019:baz046.
[ 6 ] HU C X,LI T Y,XU Y Q,et al. CellMarker 2.0:an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data[J]. Nucleic Acids Research,2023,51(D1):D870-D876.
[ 7 ] 熊怡绚. 面向单细胞转录组数据的细胞身份识别方法研究[D]. 武汉:华中师范大学,2024.
[ 8 ] 李霖,沈永健,张鹏宇,等. 基于CNN-Transformer的自编码器红外和可见光图像融合方法[J]. 遥测遥控,2024,45(5):109-119.
[ 9 ] KINGMA D P, WELLING M. Auto-encoding variational bayes[Z/OL]. (2022-12-10)[2025-01-10].https://doi.org/10.48550/arXiv.1312.6114.
[10] GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al. Generative adversarial networks[J]. Communications of the ACM,2020,63(11):139-144.
[11] 郭兴君,李晓红,史婉媱,等. 融合模体感知和图Transformer编码的社区检测[J]. 计算机工程与科学,2024,46(11):2081-2090.
[12] 杨文瀚,胡之浩,郭伟壮,等. 基于加权基因共表达网络分析和机器学习筛选椎间盘退行性变中的标志线粒体基因[J]. 脊柱外科杂志,2024,22(5):321-327.
[13] YANG F,WANG W C,WANG F,et al. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data[J]. Nature Machine Intelligence,2022,4(10):852-866.
[14] 郑建华,李小敏,刘双印,等. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法[J]. 计算机科学,2021,48(7):145-154.
[15] JIAO L F,WANG G,DAI H H,et al. scTransSort:Transformers for intelligent annotation of cell types by gene embeddings [J]. Biomolecules,2023,13(4):611.
[16] 刘德真,李圆媛. 基于深度学习和多组学数据的肺腺癌分期预测研究[J]. 武汉工程大学学报,2024,46(2):190-196.
[17] WOLF F A,ANGERER P,THEIS F J. SCANPY:large-scale single-cell gene expression data analysis[J]. Genome Biology,2018,19:15.
[18] PAN C Z,WEI H,CHEN B,et al. Inhalation of itraconazole mitigates bleomycin-induced lung fibrosis via regulating SPP1 and C3 signaling pathway pivotal in the interaction between phagocytic macrophages and diseased fibroblasts[J]. Journal of Translational Medicine,2024,22(1) :1058.
[19] KANG Z Y,HUANG Q Y,ZHEN N X,et al. Heterogeneity of immune cells and their communications unveiled by transcriptome profiling in acute inflammatory lung injury[J]. Frontiers in Immunology,2024,15:1382449.
[20] CAO Z J,WEI L,LU S,et al. Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST[J]. Nature Communications,2020,11:3458.
[21] ZENG Y S,LUO M,SHANGGUAN N Y,et al. Deciphering cell types by integrating scATAC-seq data with genome sequences[J]. Nature Computational Science,2024,4(4):285-298.
[22] WANG H C,FU T F,DU Y Q,et al. Scientific discovery in the age of artificial intelligence[J]. Nature,2023,620:47-60.
[23] 宋鑫伟,李瑞芳,高姗,等. 线粒体核糖体蛋白基因内含子间最佳匹配片段的相对位置分布[J]. 内蒙古师范大学学报(自然科学版),2024,53(3):306-312.
[24] WAN H,YUAN M S,FU Y W,et al. Continually adapting pre-trained language model to universal annotation of single-cell RNA-seq data[J]. Briefings in Bioinformatics,2024,25(2):bbae047.
[25] DEVLIN J,CHANG M W,LEE K,et al. BERT:pre-training of deep bidirectional Transformers for language understanding[Z/OL]. (2019-05-24)[2025-01-10]. https://doi.org/10.48550/arXiv.1810.04805.
[26] 张渊,姚峰. 基于知识嵌入式预训练语言模型的文本分类方法研究[J]. 武汉工程大学学报,2023,45(6):674-679.
[27] ARAN D,LOONEY A P,LIU L Q,et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage[J]. Nature Immunology,2019,20(2):163-172.
[28] KISELEV V Y,YIU A,HEMBERG M. Scmap:projection of single-cell RNA-seq data across data sets[J]. Nature Methods,2018,15(5):359-362.
[29] de?KANTER J K,LIJNZAAD P,CANDELLI T,et al. CHETAH:a selective,hierarchical cell type identification method for single-cell RNA sequencing[J]. Nucleic Acids Research,2019,47(16):e95.
[30] MA F Y,PELLEGRINI M. ACTINN:automated identification of cell types in single cell RNA sequencing[J]. Bioinformatics,2020,36(2):533-538.
[31] CHEN J W,XU H,TAO W Y,et al. Transformer for one stop interpretable cell type annotation[J]. Nature Communications,2023,14:223.
[32] JAMIL A,AHMAD A,MOEEN-UD-DIN M ,et al. Unveiling the mechanism of micro-and-nano plastic phytotoxicity on terrestrial plants:a comprehensive review of omics approaches[J]. Environment International,2025,195:109257.

相似文献/References:

备注/Memo

备注/Memo:
收稿日期:2024-12-06
基金项目:国家自然科学基金(82302085);新疆维吾尔自治区人工智能辅助影像诊断重点实验室基金(XJRG ZN2024008);湖北省自然科学基金(2020BCB002)
作者简介:张 晴,硕士研究生。Email:1714436732@qq.com
*通信作者:谢诒诚,博士,研究员。Email: ycxie@zju.edu.cn
吴兴隆,博士,副教授。Email: xwu@wit.edu.cn

更新日期/Last Update: 2026-03-10