|本期目录/Table of Contents|

[1]段艳会1,2,李晓林1,等.基于条件随机场的中文地址行政区划提取方法[J].武汉工程大学学报,2015,37(11):47-51.[doi:10. 3969/j. issn. 1674-2869. 2015. 11. 010]
 ,,et al.Extraction of administrative division of Chinese address based on conditional random fields[J].Journal of Wuhan Institute of Technology,2015,37(11):47-51.[doi:10. 3969/j. issn. 1674-2869. 2015. 11. 010]
点击复制

基于条件随机场的中文地址行政区划提取方法(/HTML)
分享到:

《武汉工程大学学报》[ISSN:1674-2869/CN:42-1779/TQ]

卷:
37
期数:
2015年11期
页码:
47-51
栏目:
机电与信息工程
出版日期:
2015-12-30

文章信息/Info

Title:
Extraction of administrative division of Chinese address based on conditional random fields
文章编号:
1674-2869(2015)11-0047-05
作者:
段艳会1李晓林12*黄 爽1
1.智能机器人湖北省重点实验室(武汉工程大学),湖北 武汉 430205;2.武汉工程大学计算机科学与工程学院,湖北 武汉 430205
Author(s):
DUAN Yan-hui1LI Xiao-lin1HUANG Shuang1
1.Hubei Key Laboratory of Intelligent Robot(Wuhan Institute of Technology), Wuhan 430205, China;2.School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China
关键词:
位置信息解析条件随机场训练语料
Keywords:
location information parsing condition random fields training corpus
分类号:
TP391.41
DOI:
10. 3969/j. issn. 1674-2869. 2015. 11. 010
文献标志码:
A
摘要:
为了在非规范中文地址中有效的提取行政区划信息,提出了一种基于条件随机场的方法. 该方法根据中文地址中行政区划的表达特点和特征,采用判别式概率模型,在观测序列已知的基础上对目标序列建模,通过构建语料训练集和建立相应的特征模板,得到行政区划的表达模型,然后使用该模型对测试集进行测试,并与标注好的测试数据进行比对,验证模型的性能. 实验表明,与最大熵模型相比,条件随机场模型总的性能指标在其之上,地址信息解析的准确率能达到89.93%.
Abstract:
To extract the information of administrative division effectively from the non-standard Chinese address, a method based on conditional random fields was proposed. According to the characteristics of administrative division, the model of the target sequence was constructed on the basis of the observation sequence by using the discriminative probability model. Then, the expression model of the administrative division was obtained by constructing the corpus training set and the corresponding feature template. Finally, the performance of the model was verified by testing the test set and comparing its results with the marked test data. Experimental results show that the performance of the model is better than that of the maximum entropy model, and the accuracy rate of analysis of address information reaches 89.93%.

参考文献/References:

[1] 朱俊.中文标准地址库构建关键技术研究[D].南京:南京师范大学,2013.ZHU Jun. Reasearch on Key Techniques of constructing Chinese standard address database[D]. Nanjing:Nanjing Normal University, 2013.(in Chinese)[2] LAWRENCE R, RABOMER. A tutorial on hidden markov models and selected applications in speech recognition[J]. Proceedings of the IEEE,1989,77 (2):257-286.[3] 申彦.大规模数据集高效数据挖掘算法研究[D].镇江:江苏大学,2013.SHENG Yan. Research on efficient data mining algorithm for large scale data sets[D]. Zhengjiang:Jiangsu University,2013.(in Chinese)[4] 周鑫.半监督算法在自然语言处理中应用的研究[D].哈尔滨:哈尔滨工业大学,2014.ZHOU Xin. Research on Application of semi supervised algorithm in natural language processing[D]. Harbin:Harbin Institute of Technology,2014(in Chinese)[5] MCCALLUM A, FREITAG D, PEREIRA F. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]//Proc JcML,2000:591-598.[6] PEARL J. Probabilistic reasoning in intelligent systems: networks of plausible inference[C]//1th ed, San Mateo, CA: Morgan Kaufmann,1988:117-133.[7] LAFFERTY J, MCCAI LUMA, PEREIRA F. Condi- tional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C]//Proc ICML,2001.[8] THOMPSON JD,HIGGINS DG,GIBSON TJ,et al. Improving the sensitivity of progre- ssive multiple sequence alignment through sequence weighting,position specific gap penalties and weight matrix choice[J]. Nucleic Acids Research,1994,22(22):4673-4680.[9] JIAYI Zhao,XIPENG Qiu,SHU Zhang. Part-of-Speech Tagging for Chinese-English Mixed Texts with Dynamic Features[J]. Journal of Computational Information Systems(JCIS),2012:1379-1388.[10] 田昕辉,李成基.带有短语切分的中文文本分类方法[J].计算机技术与发展,2010,20 (1):9-13.TIAN Xin-hui,LI Cheng-ji. Chinese text classification method with phrase segmentation[J]. Computer Technology and Development,2010,20(1):9-13.(in Chinese)[11] SUN X L, JIA L M, DONG H H, et al. Urban expressway traffic state forecasting based on multimode maximum entropy model[J]. Science China Technological Sciences, 2010, 53(10): 2808-2816.

相似文献/References:

备注/Memo

备注/Memo:
收稿日期:2015-10-13基金项目:国家863 项目(2013AA12A202);武汉工程大学研究生教育创新基金项目(CX2014090)作者简介:段艳会(1993-),女,湖北公安人,硕士研究生.研究方向:数据挖掘尧机器学习.* 通信联系人
更新日期/Last Update: 2015-12-12