浏览全部资源
扫码关注微信
1. 空天信息安全与可信计算教育部重点实验室,武汉大学国家网络安全学院,湖北 武汉 430072
2. 武汉市互联网舆情研究中心,湖北 武汉 430014
[ "徐永昌(1998- ),男,武汉大学国家网络安全学院硕士生,主要研究方向为普适计算" ]
[ "黄士多(1965- ),男,武汉市互联网舆情研究中心副研究员,主要研究方向为网络舆情、社交媒体分析等" ]
[ "艾浩军(1972- ),男,博士,武汉大学国家网络安全学院副教授,主要研究方向为普适计算与室内定位" ]
网络出版日期:2023-08,
纸质出版日期:2023-08-20
移动端阅览
徐永昌, 黄士多, 艾浩军. 基于对比学习的社交媒体地理位置预测方法[J]. 电信科学, 2023,39(8):58-68.
Yongchang XU, Shiduo HUANG, Haojun AI. A social media geolocation method based on comparative learning[J]. Telecommunications science, 2023, 39(8): 58-68.
徐永昌, 黄士多, 艾浩军. 基于对比学习的社交媒体地理位置预测方法[J]. 电信科学, 2023,39(8):58-68. DOI: 10.11959/j.issn.1000-0801.2023154.
Yongchang XU, Shiduo HUANG, Haojun AI. A social media geolocation method based on comparative learning[J]. Telecommunications science, 2023, 39(8): 58-68. DOI: 10.11959/j.issn.1000-0801.2023154.
以往基于社交媒体文本的定位方法主要集中在将文本语义空间映射到地理位置空间,忽略了文本之间的语义相关性和地理位置之间的距离相关性。提出了一种新的无监督多层次对比学习框架,并设计了 3 个对比学习模块:语义学习模块、位置学习模块和跨层次学习模块。首先利用Transformer编码器获取文本的语义表示,以无监督的对比学习方式,聚拢位置相近文本之间的语义表示和地理表示,随后进行有监督训练,输出地理位置分类或回归结果。在4个数据集上与5个基线模型的对比实验结果表明,该框架有效地提升了社交媒体地理定位的准确性。
Previous work on social media text-based geolocation focused on mapping language semantic space to geospatial space
which ignores the semantic correlation between social media texts and the distance correlation between geographical locations.To take advantage of these correlations
mCLF
a new unsupervised multiple-level contrastive learning framework was proposed
three contrastive learning modules were designed: a semantic learning module
a location learning module
and a cross-learning module.Transformer encoder was used to obtain semantic representation of posts
utilizing unsupervised contrastive learning method to decrease the distance of semantic representations and location representations of posts with near locations
and then fine-tuned the model with supervised method for geographic location regression or classification outputs.Compared with five baseline methods
extensive experiments based on four datasets demonstrate the effectiveness of the proposed framework.
SAKAKI T , OKAZAKI M , MATSUO Y . Earthquake shakes Twitter users:real-time event detection by social sensors [C ] // Proceedings of the 19th International Conference on World Wide Web . NewYork:ACM Press , 2010 : 851 - 860 .
KINSELLA S , MURDOCK V , O’HARE N . “I’m eating a sandwich in Glasgow”:modeling locations with tweets [C ] // Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents . NewYork:ACM Press , 2011 : 61 - 68 .
PAUL M J , DREDZE M . You are what your Tweet:analyzing twitter for public health [J ] . Artificial Intelligence , 2011 ( 38 ): 265 - 272 .
DO T H , NGUYEN D M , TSILIGIANNI E , et al . Multiview deep learning for predicting twitter users’ location [J ] . arXiv preprint , 2017 ,arXiv:1712.08091.
WING B , BALDRIDGE J . Simple supervised document geolocation with geodesic grids [C ] // Meeting of the Association for Computational Linguistics:Human Language Technologies . DBLP , 2012 .
HAN B , COOK P , BALDWIN T . Geolocation prediction in social media data by finding location indicative words [J ] . 24th International Conference on Computational Linguistics - Proceedings of COLING 2012:Technical Papers , 2012 : 1045 - 1062 .
RAHIMI A , COHN T , BALDWIN T . A neural model for user geolocation and lexical dialectology [C ] // Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume2:Short Papers) . Stroudsburg:Associationfor Computational Linguistics , 2017 : 209 - 216 .
JURGENS D . That’s what friends are for:inferring location in online social media platforms based on social relationships [J ] . Proceedings of the International AAAI Conference on Web and Social Media , 2021 , 7 ( 1 ): 273 - 282 .
WANG F J , LU C T , QU Y Z , et al . Collective geographical embedding for geolocating social network users [M ] // Advances in Knowledge Discovery and Data Mining . Cham : Springer International Publishing , 2017 : 599 - 611 .
HUANG B X , CARLEY K . A hierarchical location prediction neural network for twitter user geolocation [C ] // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Stroudsburg:Associationfor Computational Linguistics , 2019 : 4731 - 4741 .
RAHIMI A , COHN T , BALDWIN T . Semi-supervised user geolocation via graph convolutional networks [J ] . 2018 : 2009 - 2019 .DOI:10.18653/v1/P18-1187.
RAHIMI A , VU D , COHN T , et al . Exploiting text and network context for geolocation of social media users [J ] . 2015 : 1362 - 1367 .DOI:10.3115/v1/N15-1153.
SCALIA G , FRANCALANCI C , PERNICI B . CIME:context-aware geolocation of emergency-related posts [J ] . GeoInformatica , 2022 , 26 ( 1 ): 125 - 157 .
ZHENG C , JIANG J Y , ZHOU Y C , et al . Social media user geolocation via hybrid attention [C ] // Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . New York:ACM Press , 2020 : 1641 - 1644 .
ALEXIS C , KIELA D . SentEval:an evaluation toolkit for universal sentence representations [J ] . arXiv preprint , 2018 ,arXiv:1803.05449.
REIMERS N , GUREVYCH I . Sentence-BERT:sentence embeddings using Siamese BERT-networks [C ] // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . Stroudsburg:Association for Computational Linguistics , 2019 : 3980 - 3990 .
GAO T Y , YAO X C , CHEN D Q . SimCSE:simple contrast ivelearning of sentence embeddings [C ] // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . Stroudsburg:Associationfor Computational Linguistics , 2021 : 6894 - 6910 .
GIORGI J , NITSKI O , WANG B , et al . DeCLUTR:deep contrastive learning for unsupervised textual representations [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers) . Stroudsburg:Association for Computational Linguistics , 2021 : 879 - 895 .
CER D , YANG Y , KONG S Y , et al . Universal sentence encod-er [J ] . 2018 : 169 - 174 .DOI:10.18653/v1/D18-2029.2018.
HUANG J J , TANG D Y , ZHONG W J , et al . WhiteningBERT:an easy unsupervised sentence embedding approach [C ] // Proceedings of Findings of the Association for Computational Linguistics:EMNLP 2021 . Stroudsburg:Associationfor Computational Linguistics , 2021 : 238 - 244 .
LI B H , ZHOU H , HE J X , et al . On the sentence embeddings from pre-trained language models [C ] // Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Stroudsburg:Association for Computational Linguistics , 2020 : 9119 - 9130 .
SU J , CAO J , LIU W , et al . Whitening sentence representations for better semantics and faster retrieval [J ] . arXiv preprint , 2021 ,arXiv:2103.15316.
CARLSSON F , GYLLENSTEN A C , GOGOULOU E , et al . Semantic re-tuning with contrastive tension [C ] // In 9th International Conference on Learning Representations,Austria:Virtual Event , 2021 .
KIM T , YOO K M , LEE S G . Self-guided contrastive learning for BERT sentence representations [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers) . Stroudsburg:Association for Computational Linguistics , 2021 : 2528 - 2540 .
YAN Y M , LI R M , WANG S R , et al . ConSERT:a contrastive framework for self-supervised sentence representation transfer [C ] // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers) . Stroudsburg:Association for Computational Linguistics , 2021 : 5065 - 5075 .
MENG Y , XIONG C , BAJAJ P , et al . COCO-LM:correcting and contrasting text sequences for language model pretraining [J ] . arXiv preprint , 2021 ,arXiv:2102.08473.
WU Z , WANG S , GU J , et al . CLEAR:contrastive learning for sentence representation [J ] . arXiv preprint , 2020 ,arXiv:2012.15466.
HADSELLR , CHOPRA S , LECUN Y . Dimensionality reduction by learning an invariant mapping [C ] // Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) . Piscataway:IEEE Press , 2006 : 1735 - 1742 .
EISENSTEIN J , O'CONNOR B , SMITH N A , et al . A latent variable model for geographic lexical variation [C ] // Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing,EMNLP 2010,9-11 October 2010,MIT Stata Center,Massachusetts,USA,A meeting of SIGDAT,a Special Interest Group of the ACL . Stroudsburg:Association for Computational Linguistics , 2010 .
CHAKRAVARTHI B R , GAMAN M , IONESCU R T , et al . Findings of the VarDial evaluation campaign 2021 [C ] // Proceedings of the Eighth Workshop on NLP for Similar Languages,Varieties and Dialects,VarDial@EACL 2021,Kiyv,Ukraine , 2021 : 1 - 11 .
SCHERRER Y , LJUBEŠIĆ N . Social media variety geolocation with geobert [C ] // Proceedings of the Eighth Workshop on NLP for Similar Lan-guages,Varieties and Dialects . Stroudsburg:Association for Computational Linguistics , 2021 .
RAHIMI A , BALDWIN T , COHN T . Continuous repre-sentation of location for geolocation and lexical dialectology using mixture density networks [C ] // Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,EMNLP , 2017 : 167 - 176 .
RAHIMI A , VU D , COHN T , et al . Exploiting text and network context for geolocation of social media users [C ] // Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies . Stroudsburg:Association for Computational Linguistics , 2015 : 1362 - 1367 .
CHA M , GWON Y , KUNG H . Twitter geolocation and regional classification via sparse coding [J ] . Proceedings of the International AAAI Conference on Web and Social Media , 2021 , 9 ( 1 ): 582 - 585 .
ROLLER S , SPERIOSU M , RALLAPALLI S , et al . Supervised text-based geolocation using language models on an adaptive grid [J ] . EMNLP-CoNLL2012- 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning,Proceedings of the Conference , 2012 : 1500 - 1510 .
0
浏览量
368
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构