«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

HTML)

分享到：

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:: 47
期数:: 2025年03期

页码:: 343-348

栏目:: 智能制造

出版日期:: 2025-06-30

文章信息/Info

Title:: A credit card default prediction model based on SMOTE-XGBoost algorithm

文章编号:: 1674 - 2869（2025）03 - 0343- 06

作者:: 赵阳; 张杰萌; 严国义^*; 武汉工程大学光电信息与能源工程学院、数理学院，湖北武汉 430205

Author(s):: ZHAO Yang; ZHANG Jiemeng; YAN Guoyi^*; School of Optical Information and Energy Engineering， School of Mathematics and Physics，
Wuhan Institute of Technology， Wuhan 430205， China

关键词:: SMOTE; XGBoost; 违约预测; 不平衡数据

Keywords:: SMOTE; XGBoost; default prediction; imbalanced data

分类号:: TP39

DOI:: 10.19843/j.cnki.CN42-1779/TQ.202312031

文献标志码:: A

摘要:: 针对信用卡违约现象，提出了一种基于SMOTE-XGBoost算法的预测模型。该模型采用合成少数类过采样技术（SMOTE）对数据集进行处理，选择极限梯度提升树（XGBoost）模型作为学习器，提升模型整体的预测效果。为验证SMOTE的有效性以及XGBoost算法的最优性，本文首先采用随机森林、神经网络、梯度提升决策树、逻辑回归、k近邻、XGBoost和LightGBM模型对原数据集进行数据建模分析和预测，之后使用Regular-SMOTE、Borderline-SMOTE和SVM-SMOTE采样方式对数据集做相对平衡处理，然后再使用7种模型分别对平衡处理后的数据集进行建模分析和预测，并引入准确率、精确率、F1指数、曲线下面积作为模型好坏的评价指标。不同采样方式和模型之间的对比分析结果表明，在经过SMOTE 采样后，各模型的预测效果显著提升，其中使用XGBoost模型对经过SVM-SMOTE采样后的数据进行建模分析，该方法的预测效果最好，采用此模型可为金融行业制定放贷策略和降低企业风险提供决策支持。

Abstract:: To address credit card defaults, a prediction model based on SMOTE-XGBoost algorithm was proposed, which uses the synthetic minority oversampling technique (SMOTE) to process the dataset, and selects the extreme gradient boosting tree (XGBoost) model as the learner to improve its overall predictive performance. In order to verify the validity of SMOTE and the optimality of the XGBoost algorithm, this study first used random forest, neural network, gradient boosting decision tree, logistic regression, k-nearest neighbor, XGBoost and LightGBM models to model, analyze and predict the original dataset, and then used Regular-SMOTE, Borderline-SMOTE and SVM-SMOTE sampling methods to perform relative balance processing on the dataset, and later used seven models to model, analyze and predict the balanced dataset respectively, and introduced accuracy, precision, F1 measure, and area under the curve as evaluation indexes of the model. The results of comparative analysis between different sampling methods and models showed that after SMOTE sampling, the prediction performance of each model is significantly improved. The XGBoost model was used to model and analyze the data after SVM-SMOTE sampling, and it had the best predictive power. This prediction model provides decision support for the financial industry to formulate lending strategies and reduce enterprise risks.

参考文献/References:

［1］ BRAUSE R， LANGSDORF T， HEPP M. Neural data mining for credit card fraud detection［C］//Proceedings 11th International Conference on Tools with Artificial Intelligence. Piscataway， NJ： IEEE， 1999： 103-106.?

［2］范巍强，刘暾东. 基于BP神经网络的信用卡违约风险预测［J］.电脑知识与技术，2011，7（10）：2348-2349.

［3］郭建山，钱军浩. 基于随机森林的信用卡违约预测研究［J］. 现代信息科技， 2020， 4（3）： 1-4， 9.

［4］杨磊，姚汝婧. 基于Transformer的信用卡违约预测模型研究［J］. 计算机仿真， 2021， 38（8）： 440-444.

［5］ DOUZAS G， BACAO F. Effective data generation for imbalanced learning using conditional generative adversarial networks［J］. Expert Systems with Applications， 2018， 91： 464-471.

［6］ YI H K， JIANG Q C， YAN X F， et al. Imbalanced classification based on minority clustering synthetic minority oversampling technique with wind turbine fault detection application［J］. IEEE Transactions on Industrial Informatics， 2021， 17（9）： 5867-5875.

［7］刘志函，张忠林，赵磊. 面向不平衡数据分类的DPC-SMOTE过采样算法［J］. 哈尔滨理工大学学报， 2024，29（6）：45-60.

［8］ CHAWLA N V， BOWYER K W， HALL L O， et al. SMOTE： synthetic minority over-sampling technique［J］. Journal of Artificial Intelligence Research， 2002， 16： 321-357.

［9］张梦，陈旭勇，彭元林，等. 基于改进合成少数类过采样技术的非概率可靠性指标解［J］. 武汉工程大学学报， 2024， 46（2）： 231-236.

［10］ HAN H， WANG W Y， MAO B H. Borderline-SMOTE： a new over-sampling method in imbalanced data sets learning［C］//International Conference on Intelligent Computing. Berlin， German： Springer， 2005： 878-887.

［11］ NGUYEN H M， COOPER E W， KAMEI K. Borderline over-sampling for imbalanced data classification［J］. International Journal of Knowledge Engineering and Soft Data Paradigms， 2011， 3（1）： 4-21.

［12］ CHEN T Q， GUESTRIN C. XGBoost： a scalable tree boosting system［C］//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： Association for Computing Machinery，2016：785-794.

［13］刘佩. 基于XGBoost算法的医保信息系统入侵安全风险监测方法［J］. 中国医疗设备， 2024， 39（5）： 61-65.

［14］朱小平，张丽英，刘静，等. 基于XGBoost的自动驾驶汽车事故风险预测研究［J］. 时代汽车， 2024（6）： 187-189.

［15］胡晓东，吕铭春，阿克弘，等. 基于优化XGBoost算法的电信用户流失预测［J］. 科技与创新， 2024（10）： 36-39， 44.

相似文献/References:

备注/Memo

备注/Memo:: 收稿日期：2024-01-19
基金项目：国家自然科学基金（12101469）
作者简介：赵阳，硕士研究生。Email：601911304@qq.com
*通信作者：严国义，博士，副教授。Email：yanguoyi@wit.edu.cn
引文格式：赵阳，张杰萌，严国义. 基于SMOTE-XGBoost算法的信用卡违约预测模型研究［J］. 武汉工程大学学报，2025，47（3）：343-348.

更新日期/Last Update: 2025-07-09

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

文章信息/Info

参考文献/References:

相似文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics