基于Bert模型的互联网不良信息检测

蔡鑫

doi:10.11959/j.issn.1000-0801.2020303

您当前的位置：

首页 >

文章列表页 >

基于Bert模型的互联网不良信息检测

专栏：信息安全 | 更新时间：2024-06-05

- 基于Bert模型的互联网不良信息检测
- Internet bad information detection based on Bert model
- 电信科学 2020年36卷第11期页码：121-126
- 作者机构：
- 作者简介：
  
  [ "蔡鑫（1975- ），男，中国电信股份有限公司上海研究院高级工程师，主要研究方向为数据分析挖掘、人工智能、数据规划和信息安全" ]
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2020303
  中图分类号： TP393
- 网络出版日期：2020-11，
  
  纸质出版日期：2020-11-20
- 稿件说明：
移动端阅览
蔡鑫. 基于Bert模型的互联网不良信息检测[J]. 电信科学, 2020,36(11):121-126.

Xin CAI. Internet bad information detection based on Bert model[J]. Telecommunications science, 2020, 36(11): 121-126.
蔡鑫. 基于Bert模型的互联网不良信息检测[J]. 电信科学, 2020,36(11):121-126. DOI： 10.11959/j.issn.1000-0801.2020303.

Xin CAI. Internet bad information detection based on Bert model[J]. Telecommunications science, 2020, 36(11): 121-126. DOI： 10.11959/j.issn.1000-0801.2020303.

摘要

针对互联网不良信息检测这一业务场景，探讨了基于网站文本内容进行检测的方法。回顾了经典的文本分析技术，重点介绍了Bert模型的关键技术特点及其两种不同用法。详细描述了利用其中的特征提取方法，进行网站不良信息检测的具体实施方案，并且与传统的TF-IDF模型以及word2vec+LSTM模型进行了对比验证，证实了这一方法的有效性。

Abstract

In view of the business scenario of bad information detection on the internet

the method of detection based on the text content of the website was discussed .Classical text analysis techniques were reviewed.The key technical features and two different usages of Bert model were introduced.The specific implementation scheme of using the feature extraction method to detect website bad information was described in detail

and was compared with the traditional TF-IDF model and word2vec+LSTM model.The validity of this method is verified.

关键词

Keywords

references

蔡鑫 , 娄京生 . 基于LSTM深度学习模型的中国电信官方微博用户情绪分析 [J ] . 电信科学 , 2017 , 33 ( 12 ): 136 - 141 .

CAI X , LOU J S . Sentiment analysis of telecom official mi-cro-blog users based on LSTM deep learning model [J ] . Tele-communications Science , 2017 , 33 ( 12 ): 136 - 141 .

SCOTT D . Indexing by latent semantic analysis [J ] . Journal of the American Society for Information Science , 1990 ( 41 ):6.

BLEI D M , NG A Y , JORDAN M I , et al . Latent dirichlet Allocation [J ] . Journal of Machine Learning Research , 2012 ( 3 ): 993 - 1022 .

MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . arXiv:1301.3781 , 2013

PETERS M , NEUMAN M , IYYER M , et al . Deep Contextualized Word Representations [J ] . arXiv:1802.05365 , 2018

RADFORD A , SALINMANS T . Improving language understanding by generative pre-training [J ] . 2018

DEVLIN J , CHANG M , LEE K , et al . BERT:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv:1810.04805 , 2018

浏览量

945

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于深度学习的智能电网窃电检测混合模型研究

APT攻击下的无线通信网络最优主动防御决策模型

基于频谱形状的低复杂度雷达信号分类

基于吸引模式的局部二阶梯度轮廓人脸识别算法

移动通信网络投诉热点问题智能预警方法