1.青岛海尔电冰箱有限公司,山东 青岛 266700
2.海尔优家智能科技(北京)有限公司,北京 100006
3.华东师范大学软件工程学院,上海 200062
4.上海市高可信计算重点实验室,上海 200062
[ "曾谁飞(1978- ),男,博士,青岛海尔电冰箱有限公司、海尔优家智能科技(北京)有限公司工程师,主要研究方向为人工智能、大模型、深度学习、神经网络、机器学习、多模态等。" ]
[ "孟瑶(1999- ),女,华东师范大学软件工程学院硕士生,主要研究方向为深度学习、自然语言理解、软件建模、风险预警、可信人工智能。" ]
[ "刘静(1964- ),女,博士,华东师范大学软件工程学院教授,主要研究方向为软件建模、风险预警、可信人工智能。" ]
收稿:2025-03-30,
修回:2025-04-24,
纸质出版:2025-05-20
移动端阅览
曾谁飞,孟瑶,刘静.基于GNN与注意力机制的文本分类模型[J].电信科学,2025,41(05):129-140.
ZENG Shuifei,MENG Yao,LIU Jing.Text classification model based on GNN and attention mechanism[J].Telecommunications Science,2025,41(05):129-140.
曾谁飞,孟瑶,刘静.基于GNN与注意力机制的文本分类模型[J].电信科学,2025,41(05):129-140. DOI: 10.11959/j.issn.1000-0801.2025136.
ZENG Shuifei,MENG Yao,LIU Jing.Text classification model based on GNN and attention mechanism[J].Telecommunications Science,2025,41(05):129-140. DOI: 10.11959/j.issn.1000-0801.2025136.
针对图数据动态聚合未知邻节点学习能力难及融合语义特征不足造成的模型性能欠佳而分类准确率低的问题,提出了一种基于图神经网络(graph neural network,GNN)和注意力机制的分类模型——图注意力文本分类(graph attention text classification,GATC)。首先,构建了一种归纳式学习的图神经模型,利用聚合函数实现动态嵌入未知邻节点,增强模型泛化能力。其次,引入多头潜在注意力机制,通过低秩联合压缩技术减少推理键值缓存,显著地降低了内存占用,提高了模型性能。最后,融合GNN和门循环单元(gated recurrent unit,GRU)网络模型,进一步捕获图数据中结构和时序属性信息的语义特征,实现了特征的高效融合,并提升了模型的分类准确率。实验结果表明,所提方法既有效,又相比算法ADGL(adaptive dynamic graph learning)+MLA(multi-head latent attention)的分类准确率在CSI 100、CSI 300和Rus 1K数据集上分别提高至少4.0%、2.4%和3.1%。
Addressing the issue of low classification accuracy raised by the poor performance of the model
which is caused by the difficulty in learning from dynamic aggregation unknown neighboring nodes of graph data and insufficient fusion of semantic features
a model named graph attention text classification(GATC) based on graph neural network (GNN) and attention mechanism was proposed. Firstly
an inductive learning of graph neural network model was constructed
and dynamic embedding the unknown neighboring node was implemented by using an aggregation function to enhance the model’s generalization ability. Secondly
the reasoning cache size of key-value was reduced by the introduction of multi-head latent attention mechanism that utilized the low-rank key-value joint compression technology
which significantly diminished memory usage and improved the performance of the model. Finally
the integration of GNN and gated recurrent unit (GRU) network models further captured the semantic feature information of structural and temporal attributes for graph data
resulting in achieving efficient feature fusion and improving the classification accuracy of the model. The experimental results show that the proposed method not only is effective
but also improves the accuracy of classification that is increased at least 4.0%
2.4% and 3.1% on the CSI 100,CSI 300 and Rus 1K datasets
respectively
compared with the algorithm ADGL+MLA (adaptive dynamic graph learning+multi-head latent attention).
DEEPSEEK-AI , LIU A X , FENG B , et al . DeepSeek-V3 technical report [EB ] . arXiv preprint , 2024 , arXiv: 2412.19437 .
DEEPSEEK-AI , GUO D Y , YANG D J , et al . DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning [EB ] . arXiv preprint , 2025 , arXiv: 2501.12948 .
YAO L , MAO C S , LUO Y . Graph convolutional networks for text classification [EB ] . arXiv preprint , 2018 , arXiv: 1809.05679 .
DENG C H , YUE Z C , ZHANG Z R . Polynormer: polynomial-expressive graph transformer in linear time [EB ] . arXiv preprint , 2024 , arXiv 2403.01232.
LUO Y K , SHI L , WU X M . Classic GNNs are strong baselines: reassessing GNNs for node classification [EB ] . arXiv preprint , 2024 , arXiv: 2406.08993 .
VELIČKOVIĆ P , CUCURULL G , CASANOVA A , et al . Graph attention networks [EB ] . arXiv preprint , 2017 , arXiv: 1710.10903 .
JIN W , DERR T , WANG Y Q , et al . Node similarity preserving graph convolutional networks [C ] // Proceedings of the 14th ACM International Conference on Web Search and Data Mining . New York : ACM Press , 2021 : 148 - 156 .
HAMILTON W L , YING R , LESKOVEC J . Inductive representation learning on large graphs [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . Massachusetts : MIT Press , 2017 : 1024 - 1034 .
YUAN J Y , GAO H Z , DAI D M , et al . Native sparse attention: hardware-aligned and natively trainable sparse attention [J ] . arXiv preprint , 2025 , arXiv: 2502.11089 .
EL-KISHKY A , WEI A , SARAIVA A , et al . Competitive programming with large reasoning models [J ] . arXiv preprint , arXiv: 2502.06807 , 2025 .
WU S W , XIONG Y T , LIANG H , et al . D2-GCN: a graph convolutional network with dynamic disentanglement for node classification [J ] . Frontiers of Computer Science , 2025 , 19 : 191305 .
BUSBRIDGE D , SHERBURN D , CAVALLO P . Relational graph attention networks [J ] . arXiv preprint , 2019 ,arXiv: 1904.05811 .
FENG F , HE X , WANG X , et al . Temporal relational ranking for stock prediction [J ] . arXiv preprint , 2018 , arXiv: 1809.09441 .
CARDOSO J V D M , PALOMAR D P . Learning undirected graphs in financial markets [C ] // Proceedings of the 2020 54th Asilomar Conference on Signals, Systems, and Computers . [ S.l. : s.n. ] , 2020 : 741 - 745 .
KIM R , HOSO C , JEONG M , et al . HATS: a hierarchical graph attention network for stock movement prediction [J ] . arXiv preprint , 2019 , arXiv: 1908.07999 .
HAJIRAMEZANALI E , HASANZADEH A , DUFFIELD N , et al . Variational graph recurrent neural networks [C ] // Proceedings of the 33rd International Conference on Neural Information Processing Systems . [ S.l. : s.n. ] , 2019 : 10701 - 10711 .
VASWANI A , SHAZEER N , PARMAR N , et al . Attention is all you need [C ] // Proceedings of the 31st International Conference on Neural Information Processing Systems . [ S.l. : s.n. ] , 2017 : 6000 - 6010 .
SHAZEER N . Fast transformer decoding: one write-head is all you need [J ] . arXiv preprint , 2019 , arXiv: 1911.02150 .
AINSLIE J , LEEOT J , JONG M D , et al . GQA: training generalized multi-query transformer models from multi-head checkpoints [J ] . arXiv preprint , 2023 , arXiv: 2305.13245 .
DEEPSEEK-AI , LIU A X , FENG B , et al . DeepSeek-V2: a strong, economical, and efficient mixture-of-experts language model [J ] . arXiv preprint , 2024 , arXiv: 2405.04434 .
TIAN H , ZHANG X , ZHENG X , et al . Learning dynamic dependencies with graph evolution recurrent unit for stock predictions [J ] . IEEE Transactions on Systems, Man, and Cybernetics: Systems , 2023 , 53 ( 11 ): 6705 - 6717 .
HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscatarvay : IEEE Press , 2016 : 770 - 778 .
GEVA M , SCHUSTER R , BERANT J , et al . Transformer feed-forward layers are key-value memories [C ] // Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing . [ S.l. : s.n. ] , 2021 : 5484 - 5495 .
LIN T Y , GOYAL P , GIRSHICK R , et al . Focal loss for dense object detection [J ] . IEEE Transactions on Pattern Analysis & Machine Intelligence , 2017 ( 99 ): 2999 - 3007 .
0
浏览量
19
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621