《計算機應用研究》|Application Research of Computers

基于深度學習的中文微博作者身份識別研究

Research on author identity recognition of Chinese microblog based on deep learning

免費全文下載 (已被下載 次)  
獲取PDF全文
作者 徐曉霖,蔡滿春,蘆天亮
機構 中國人民公安大學 信息技術與網絡安全學院,北京 102623
統計 摘要被查看 次,已被下載
文章編號 1001-3695(2020)01-003-0016-03
DOI 10.19734/j.issn.1001-3695.2018.05.0486
摘要 作者身份識別一直在公安行業和文檢工作中起著重要的作用?,F有的作者語言風格建模過程繁瑣、文本特征工程沒有普適性。針對此問題,在無須專家進行特征建模的情況下,提出CABLSTM中文微博作者身份識別模型,并在公開微博語料集測試該模型準確度。該模型為最大化提取短文本特征,融合attention機制于CNN中并去除池化層,通過雙向LSTM以獲取上下文相關信息,身份識別結果通過softmax層進行輸出。實驗結果表明,該模型在進行中文微博作者身份識別任務中與傳統機器學習算法、TextCNN和LSTM算法相對比,在準確率、召回率、<i>F</i>值方面都有一定的提升。
關鍵詞 作者身份識別; 長短時記憶網絡; 卷積神經網絡; 特征自動提取
基金項目 國家重點研發計劃重點專項資助項目(2017YFB0802804)
國家自然科學基金資助項目(61602489)
中國人民公安大學2018年基本科研業務費科研機構項目(2018JKF504)
本文URL http://www.048285.live/article/01-2020-01-003.html
英文標題 Research on author identity recognition of Chinese microblog based on deep learning
作者英文名 Xu Xiaolin, Cai Manchun, Lu Tianliang
機構英文名 School of Information Technology & Network Security,People's Public Security University of China,Beijing 102623,China
英文摘要 Author identification always plays an important role in the public security and literary inspection work. Texts feature extraction is cumbersome and not universal. To solve this problem, this paper proposed the CABLSTM Chinese microblog author identification model without expert feature modeling, and tested the accuracy of the model in the open microblog corpus. This model maximized the extraction of short text features, fused the attention mechanism in the CNN and removed the pooling layer, and obtained context-related information through the bidirectional LSTM. The identity recognition result was output through the softmax layer. Experimental results show that the model has a certain improvement in accuracy, recall rate, and <i>F</i>-measure in comparison with traditional machine learning algorithms and TextCNN and LSTM algorithms in the identification task of Chinese microblog authors.
英文關鍵詞 author identification; LSTM; CNN; automatic feature extraction
參考文獻 查看稿件參考文獻
 
收稿日期 2018/5/29
修回日期 2018/7/11
頁碼 16-18,25
中圖分類號 TP391.72
文獻標志碼 A
012曾道人三尾中特书