Xin CAI. Internet bad information detection based on Bert model[J]. Telecommunications science, 2020, 36(11): 121-126. DOI: 10.11959/j.issn.1000-0801.2020303.
In view of the business scenario of bad information detection on the internet
the method of detection based on the text content of the website was discussed .Classical text analysis techniques were reviewed.The key technical features and two different usages of Bert model were introduced.The specific implementation scheme of using the feature extraction method to detect website bad information was described in detail
and was compared with the traditional TF-IDF model and word2vec+LSTM model.The validity of this method is verified.
CAI X , LOU J S . Sentiment analysis of telecom official mi-cro-blog users based on LSTM deep learning model [J ] . Tele-communications Science , 2017 , 33 ( 12 ): 136 - 141 .
SCOTT D . Indexing by latent semantic analysis [J ] . Journal of the American Society for Information Science , 1990 ( 41 ):6.
BLEI D M , NG A Y , JORDAN M I , et al . Latent dirichlet Allocation [J ] . Journal of Machine Learning Research , 2012 ( 3 ): 993 - 1022 .
MIKOLOV T , CHEN K , CORRADO G , et al . Efficient estimation of word representations in vector space [J ] . arXiv:1301.3781 , 2013
PETERS M , NEUMAN M , IYYER M , et al . Deep Contextualized Word Representations [J ] . arXiv:1802.05365 , 2018
RADFORD A , SALINMANS T . Improving language understanding by generative pre-training [J ] . 2018
DEVLIN J , CHANG M , LEE K , et al . BERT:pre-training of deep bidirectional transformers for language understanding [J ] . arXiv:1810.04805 , 2018