AI watermarks must be waterproof to be effective

Kathleen William October 24, 2024

Close-up of a mobile phone showing AI-powered applications — Watermarking could soon be available at an AI chatbot near you.Photo credit: Jonathan Raa/NurPhoto/Getty

Rarely has a tool or technology emerged so suddenly from the world of research into the public consciousness – and widespread use – as generative artificial intelligence (AI). The ability of large language models (LLMs) to create text and images little different from those created by humans is disrupting, if not revolutionizing, countless areas of human activity. But the potential for abuse is already evident, from academic plagiarism to the mass generation of misinformation. There are fears that AI is evolving so quickly that without guardrails it may soon be too late to ensure accuracy and reduce harm¹.

This week, Sumanth Dathathri of DeepMind, Google's AI research lab in London, and his colleagues report on their test of a new approach to “watermarking” AI-generated text by embedding a “statistical signature,” a form of digital identifier that is possible used to confirm the origin of the text². The word watermark comes from the era of paper and printing and describes a variation in paper thickness that is not usually immediately noticeable to the naked eye and does not alter the printed text. A watermark in digitally generated text or images should also be invisible to the user – but immediately recognizable for special software.

Read the article: Scalable watermarks for identifying large language model output

The work of Dathatthri and his colleagues represents an important milestone for watermarking digital texts. However, there is still a long way to go before companies and regulators can reliably say whether a text is the product of a human or a machine. Given the need to reduce the harm caused by AI, more researchers need to get involved to ensure that watermarking technology delivers on its promise.

The authors' approach to watermarking LLM editions is not new. A version of this is also being tested by OpenAI, the San Francisco, California, company behind ChatGPT. However, there is limited literature on how the technology works and its strengths and limitations. One of the most important contributions came in 2022, when Scott Aaronson, a computer scientist at the University of Texas at Austin, described how watermarking can be achieved in a much-discussed talk. Others have also made valuable contributions—including John Kirchenbauer and his colleagues at the University of Maryland at College Park, who published a watermark detection algorithm last year³.

The DeepMind team has gone one step further and shown that watermarking can be implemented at scale. The researchers integrated a technology they call SynthID-Text into Google's AI-powered chatbot Gemini. In a live experiment with nearly 20 million Gemini users asking questions to the chatbot, people noticed no loss in quality for watermarked answers compared to non-watermarked answers. This is important because users are unlikely to accept watermarked content if they view it as inferior to non-watermarked text.

AI models fed with AI-generated data quickly spew nonsense

However, it is still comparatively easy for a determined person to remove a watermark and make AI-generated text look like it was written by a person. This is because the watermarking process used in the DeepMind experiment subtly alters the way an LLM statistically selects its “tokens” – how it draws from its massive training set of billions of words in light of a specific user prompt, articles, books, and others Sources to put together a plausible-sounding answer. This change can be detected by an analysis algorithm. But there are ways to remove the signal – for example, by paraphrasing or translating the LLM's output, or asking another LLM to rewrite it. And a watermark once removed is not really a watermark.

Proper watermarking is important as authorities seek to regulate AI to limit the harm it could cause. Watermarks are considered a central technology. Last October, US President Joe Biden directed the Gaithersburg, Maryland-based National Institute of Standards and Technology (NIST) to establish rigorous security testing standards for AI systems before they are released for public use. NIST is seeking public comment on its plans to reduce the risk of harm from AI, including the use of watermarking, which it says must be robust. There is still no fixed date as to when the plans will be finalized.

How does ChatGPT “think”? Psychology and neuroscience crack AI models for large languages

Unlike the United States, the European Union has taken a legislative approach by passing the EU Artificial Intelligence Law in March and setting up an AI office to enforce it. China's government has already implemented mandatory watermarking, and the state of California wants to do the same.

But even if the technical hurdles can be overcome, watermarking will only be truly useful if it is acceptable to businesses and users. Although regulation will likely force companies to act to some degree in the next few years, whether users will trust watermarks and similar technologies is another question.

There is an urgent need for improved technological capabilities to combat the misuse of generative AI and a need to understand how people interact with these tools – how malicious actors use AI, whether users trust watermarks, and what a trustworthy information environment looks like in the online space of generative AI. These are all questions that researchers need to investigate.

In a welcome move, DeepMind has made the model and underlying code for SynthID-Text free to everyone. The work is an important advance, but the technology itself is still in its infancy. We need it to grow up quickly.