The makers of a revolutionary AI system that can write news stories and fiction books – called "deepfakes for text" – have taken the unusual step of not disclosing their research, for fear of possible abuse.
OpenAI, a non-profit research company supported by Elon Musk, says the new AI model, called GPT2, is so good and the risk of malicious use is so high that it breaks with its normal practice of releasing the full study to the public to have more time to discuss the impact of the technological breakthrough.
How OpenAI writes convincing news stories and fiction work – video
In essence, GPT2 is a text generator. The AI system gets text, from a few words to a whole page, and is asked to write the next few sentences based on its predictions about what to come. The system pushes the limits of what was considered possible, both in terms of the quality of the output and the wide variety of possible applications.
When used to simply generate new text, GPT2 is able to write plausible passages that match what it contains in both style and subject. It rarely shows one of the peculiarities that mark earlier AI systems, such as forgetting what it writes halfway through a paragraph, or grinding the syntax of long sentences.
Enter the opening sentence of George Orwell's Eighty-nine – "It was a fierce cold day in April and the bells were striking thirteen" – and the system recognizes the vaguely futuristic tone and novelistic style and continues with:
"I was on my way to a new job in Seattle in my car, I put the gas in, put the key in it and let it go, I just imagined what the day would be like in a hundred years … In 2045 I was a teacher at one or another school in a poor part of rural China, I started with the Chinese history and history of science. "
Feed it the first few paragraphs of a Guardian story about the Brexit, and its output is plausible newspaper prose, packed with quotes & # 39; quotes & # 39; from Jeremy Corbyn, entries from the Irish border and answers from the prime minister's spokesperson.
One of those, fully artificial, paragraphs states: "Asked for clarification of the reports, a May spokesperson said:" The Prime Minister has made it absolutely clear that it is her intention to leave the EU as quickly as possible and that will be under her negotiating mandate. fall as confirmed in the Queen's speech last week. & # 39; "
From a research point of view GPT2 is groundbreaking in two ways. One of them is the size, says Dario Amodei, research director of OpenAI. The models "were 12 times larger and the dataset was 15 times larger and much wider" than the previous advanced AI model. It was trained in a dataset with about 10 million articles, selected by searching the social news site Reddit for links with more than three votes. The huge collection of text weighed 40 GB, enough to store about 35,000 copies of Moby Dick.
The amount of data that GPT2 has trained has directly influenced quality, giving it more knowledge about understanding written text. It also led to the second breakthrough. GPT2 is a much more general goal than previous text models. By structuring the entered text, it can perform tasks, including translating and summarizing, and passing simple reading comprehension tests, which often perform as well or better than other AIs that have been built specifically for those tasks.
However, this quality has also led to OpenAI's ability to push AI forward and keep GPT2 behind closed doors for the foreseeable future while assessing what malicious users can do with it. "We have to experiment to find out what they can and can not do," says Jack Clark, the head of the charity. "If you can not predict all the possibilities of a model, you should try to see what it can do – there are many more people than we who can think about what they can do maliciously."
To show what that means, OpenAI has created one version of GPT2 with a few modest tweaks that can be used to generate infinite positive or negative reviews of products. Spam and fake news are two other obvious potential drawbacks, as is the unfiltered nature of the AI. Because it is being trained on the internet, it is not difficult to encourage it to generate intolerant text, conspiracy theories, and so on.
Instead, the goal is to show what is possible to prepare the world for what will become mainstream within a year or two. "I have a term for this, the escalator from hell," said Clark, "it always lowers and lowers the technology. The rules with which you can control the technology have fundamentally changed.
"We do not say that we know the right thing to do here, we do not lay down the line and say" This is the way "We try to think more rigorously here We try to build the road while we travel over it. & # 39;