close
close

Guiltandivy

Source for News

Google releases technology to watermark AI-generated text
Update Information

Google releases technology to watermark AI-generated text

Google is making SynthID Text, its technology that allows developers to watermark and recognize text written by generative AI models, generally available.

SynthID Text can be downloaded from Google's Hugging Face AI platform and updated Responsible GenAI Toolkit.

“We are making our SynthID Text watermarking tool available as an open source solution,” the company wrote in a post on

How does SynthID Text work exactly?

Given a prompt like “What is your favorite fruit?”, text-generating models predict which “token” is most likely to follow the other – one token at a time. Tokens, which can be a single character or word, are the building blocks that a generative model uses to process information. A model assigns each possible token a score equal to the percentage probability that it is included in the output text. According to Google, SynthID Text adds additional information to this token distribution by “modulating the probability that tokens will be generated.”

“The final scoring pattern for both word choices from the model, combined with the adjusted probability scores, is considered a watermark,” the company wrote in a blog post. “This scoring pattern is compared to the expected scoring pattern for text with and without watermarks, helping SynthID identify whether the text was generated by an AI tool or whether it may have come from other sources.”

Google claims that SynthID Text, which has been integrated into its Gemini models since this spring, does not compromise on the quality, accuracy, or speed of text generation and works even on text that has been trimmed, paraphrased, or altered.

However, the company also admits that its watermarking approach has limitations.

For example, SynthID Text does not perform well for short texts, texts that have been rewritten or translated from another language, or answers to factual questions. “When responding to factual prompts, there are fewer opportunities to adjust token distribution without compromising factual accuracy,” the company explains. “These include prompts such as 'What is the capital of France?' or questions where little or no variety is expected, such as 'Recite a poem by William Wordsworth'.”

Google isn't the only company working on AI technology for text watermarking. OpenAI has been researching watermarking methods for years, but has delayed their release for technical and commercial reasons.

Text watermarking techniques, if widely adopted, could help turn the tide against inaccurate – but increasingly popular – “AI detectors” that incorrectly flag papers and essays written in more generic language. The question, however, is whether they will be widely adopted – and whether the standard or technology proposed by one organization will prevail over others.

There may soon be legal mechanisms that force the hands of developers. The Chinese government has introduced mandatory watermarking for AI-generated content, and the state of California wants to do the same.

The situation is urgent. According to a report by the European Union law enforcement agency, 90% of online content could be generated synthetically by 2026, creating new challenges for law enforcement related to disinformation, propaganda, fraud and deception. According to an AWS study, almost 60% of all sentences on the web could already be AI-generated, thanks to the widespread use of AI translators.

TechCrunch has an AI-focused newsletter! Register here to receive it in your inbox every Wednesday.

LEAVE A RESPONSE

Your email address will not be published. Required fields are marked *