基于生成式因果语言模型的水印嵌入与检测

刘明录; 郑彦; 韩雪; 袁向阳; 邓超

doi:10.11959/j.issn.1000-0801.2023179

您当前的位置：

首页 >

文章列表页 >

基于生成式因果语言模型的水印嵌入与检测

专题：网络智能化与生成式人工智能 | 更新时间：2024-06-05

- 基于生成式因果语言模型的水印嵌入与检测
- Watermark embedding and detection based on generative causal language model
- 电信科学 2023年39卷第9期页码：32-42
- 作者机构：
- 作者简介：
  
  [ "刘明录（1987- ），男，中国移动研究院人工智能与智慧运营中心算法研究员，主要研究方向为自然语言处理、知识图谱等" ]
  [ "郑彦（1993- ），男，中国移动通信有限公司研究院人工智能与智慧运营中心算法研究员，主要研究方向为大型语言模型及模型的可解释性、公平性" ]
  [ "韩雪（1981- ），女，博士，现任中国移动通信有限公司研究院人工智能与智慧运营中心研究科学家，主要研究方向为NLP和多模态融合技术" ]
  [ "袁向阳（1978- ），男，中国移动通信有限公司研究院人工智能与智慧运营中心副总经理，主要研究方向为BSS、OSS等 IT支撑系统及AI技术在网络智能化中的应用" ]
  [ "邓超（1978- ），男，中国移动通信有限公司研究院人工智能与智慧运营中心常务副总经理，主要研究方向为人工智能、通信网络智能化、大数据和 IT 技术研发" ]
- 基金信息：
- DOI：10.11959/j.issn.1000-0801.2023179
  中图分类号： TP181
- 网络出版日期：2023-08，
  
  纸质出版日期：2023-08-25
- 稿件说明：
移动端阅览
刘明录, 郑彦, 韩雪, 等. 基于生成式因果语言模型的水印嵌入与检测[J]. 电信科学, 2023,39(9):32-42.

Minglu LIU, Yan ZHENG, Xue HAN, et al. Watermark embedding and detection based on generative causal language model[J]. Telecommunications science, 2023, 39(9): 32-42.
刘明录, 郑彦, 韩雪, 等. 基于生成式因果语言模型的水印嵌入与检测[J]. 电信科学, 2023,39(9):32-42. DOI： 10.11959/j.issn.1000-0801.2023179.

Minglu LIU, Yan ZHENG, Xue HAN, et al. Watermark embedding and detection based on generative causal language model[J]. Telecommunications science, 2023, 39(9): 32-42. DOI： 10.11959/j.issn.1000-0801.2023179.

摘要

基于人工智能内容生成（AIGC）技术生成文本具有道德、法律的合规性风险，需要对生成文本内容的流通进行规范和监管，因此对 AIGC 生成文本版权保护的迫切需求随之出现。水印技术是目前使用最广泛的数字版权保护方式。提出了一种应用于生成式因果语言模型的生成文本的水印添加技术，采用事中水印嵌入的方式在文本生成过程中隐式地嵌入文本水印特征编码，相较于传统事后水印添加技术对生成文本质量影响小，具有低感知、透明、鲁棒等优点。实验结果表明，提出的水印嵌入策略具有较好的鲁棒性，经过用户一定程度的编辑后仍旧能有效检出文本嵌入水印。与原有生成策略进行对比，所提方法与现有模型耦合度低，无须调整原有模型结构、训练策略、部署方式，不增加原有生成过程计算成本。

Abstract

Artificial intelligence generated content (AIGC) generated text itself carried moral and legal compliance risks

and the circulation of generated text content need to be regulated.Therefore

there was an urgent need for copyright protection of AIGC generated text.Watermarking technology was currently the most widely used method for digital copyright protection.A watermark embedding technology was proposed for generating text using generative causal language models.An in-process watermark embedding method was adopted

which implicitly embeded text watermark during the text generation process.Compared to traditional post-process watermark embedding technology

it had less impact on the quality of generated text and had advantages such as low perception

transparency

and robustness.The proposed method has low coupling with existing models and can eliminate the need to adjust the original model structure

training strategies

deployment methods

and increase the computational cost of the original generation process.Through experimental results

the proposed watermark embedding strategy has good robustness and can effectively detect text embedded watermarks even after a certain degree of editing by users.

关键词

Keywords

references

刘豪 , 孙星明 , 刘晋飚 . 基于字体颜色的文本数字水印算法 [J ] . 计算机工程 , 2005 , 31 ( 15 ): 129 - 131 .

LIU H , SUN X M , LIU J B . Color-based watermarking algorithm for text documents [J ] . Computer Engineering , 2005 , 31 ( 15 ): 129 - 131 .

王慧琴 , 李人厚 . 二值文本数字水印技术的研究与仿真 [J ] . 系统仿真学报 , 2004 , 16 ( 3 ): 521 - 524 .

WANG H Q , LI R H . A binary text digital watermarking algorithm [J ] . Journal of System Simulation , 2004 , 16 ( 3 ): 521 - 524 .

周新民 , 孙星明 , 刘超 . 基于汉字结构知识的鲁棒性公开文本水印 [J ] . 计算机工程与应用 , 2006 , 42 ( 8 ): 165 - 167 , 169 .

ZHOU X M , SUN X M , LIU C . Robust public text watermarking based on structure knowledge of Chinese characters [J ] . Computer Engineering and Applications , 2006 , 42 ( 8 ): 165 - 167 , 169 .

张宇 , 刘挺 , 陈毅恒 , 等 . 自然语言文本水印 [J ] . 中文信息学报 , 2005 , 19 ( 1 ): 56 - 62 , 70 .

ZHANG Y , LIU T , CHEN Y H , et al . Natural language watermarking [J ] . Journal of Chinese Information Processing , 2005 , 19 ( 1 ): 56 - 62 , 70 .

林建滨 , 何路 , 李天智 , 等 . 一种抗攻击的中文同义词替换文本水印算法 [J ] . 西北大学学报(自然科学版) , 2010 , 40 ( 3 ): 433 - 436 .

LIN J B , HE L , LI T Z , et al . An anti-attack watermarking based on synonym substitution algorithm for Chinese text [J ] . Journal of Northwest University (Natural Science Edition) , 2010 , 40 ( 3 ): 433 - 436 .

傅瑜 , 王保保 . 文本水印附加空格编码方法的实现及其性能 [J ] . 长安大学学报(自然科学版) , 2002 , 22 ( 3 ): 85 - 87 .

FU Y , WANG B B . Extra space coding for embedding wartermark into text documents and its performance [J ] . Journal of Chang’an University (Natural Science Edition) , 2002 , 22 ( 3 ): 85 - 87 .

张震宇 , 李千目 , 戚湧 . 基于不可见字符的文本水印设计 [J ] . 南京理工大学学报(自然科学版) , 2017 , 41 ( 4 ): 405 - 411 .

ZHANG Z Y , LI Q M , QI Y . Text watermarking design based on invisible characters [J ] . Journal of Nanjing University of Science and Technology , 2017 , 41 ( 4 ): 405 - 411 .

RADFORD A , NARASIMHAN K . Improving language understanding by generative pre-training [Z ] . 2018 .

ZENG A , LIU X , DU Z , et al . GLM-130B:an open bilingual pre-trained model [J ] . arXiv preprint , 2022 ,arXiv:2210.02414.

Wikipedia . Beam search [Z ] . 2023 .

OUYANG L , WU J , JIANG X , et al . Training language models to follow instructions with human feedback [J ] . arXiv preprint , 2022 ,arXiv:2203.02155.

Wikipedia . Edit_distance [Z ] . 2023 .

YUAN S , ZHAO H Y , DU Z X , et al . WuDaoCorpora:a super large-scale Chinese corpora for pre-training language models [J ] . AI Open , 2021 ( 2 ): 65 - 68 .

GitHub . CLUE [Z ] . 2023 .

DU Z , QIAN Y , LIU X , et al . GLM:general language model pretraining with autoregressive blank infilling [J ] . arXiv preprint , 2021 ,arXiv:2103.10360.

BROWN T , MANN B , RYDER N , et al . Language models are few-shot learners [J ] . Advances in Neural Information Processing Systems , 2020 ( 33 ): 1877 - 1901 .

浏览量

382

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据