Computers are getting closer to passing the Turing Test.

Last month, OpenAI, the Elon Musk-founded artificial intelligence research lab, announced the arrival of the newest version of an AI system it had been working on that can mimic human language a model called GPT-3.
In the weeks that followed, people got the chance to play with the program. If you follow news about AI, you may have seen someheadlines calling it a huge step forward even a scary one.
Ive now spent the past few days looking at GPT-3 in greater depth and playing around with it. Im here to tell you the hype is real. It has its shortcomings, but make no mistake:GPT-3 represents a tremendous leap for AI.
A year ago I sat down to play with GPT-3s precursor dubbed (you guessed it)GPT-2. My verdict at the timewas that it waspretty good. When given a prompt say, a phrase or sentence GPT-2 could write a decent news article, making up imaginary sources and organizations and referencing them across a couple of paragraphs. It was by no means intelligent itdidnt really understand the world but it was still an uncanny glimpse of what it might be like to interact with a computer that does.
A year later, GPT-3 is here and its smarter. A lot smarter. OpenAI took the same basic approach it had taken for GPT-2 (more on this below), and spent more time training it with a bigger dataset. The result is a program that is significantly better at passing various tests of language ability that machine learning researchers have developed to compare our computer programs. (You can sign up to play with GPT-3, but theres a waitlist.)
But that description understates what GPT-3 is, and what it does.
It surprises me continuously, Arram Sabeti, an inventor with early access to GPT-3 who has published hundreds of examples of results from the program, told me. A witty analogy, a turn of phrase the repeated experience I have is theres no way it just wrote that. It exhibits things that feel very much like general intelligence.
Not everyone agrees.Artificial intelligence programs lack consciousness and self-awareness, researcher Gwern Branwen wrote in his article about GPT-3. They will never be able to have a sense of humor. They will never be able to appreciate art, or beauty, or love. They will never feel lonely. They will never have empathy for other people, for animals, for the environment. They will never enjoy music or fall in love, or cry at the drop of a hat.
Sorry, I lied GPT-3 wrote that. Branwen fed it a prompt a few words expressing skepticism about AI and GPT-3came up with a long and convincing rant about how computers wont ever be really intelligent.
Branwen himselftold me he was taken aback by GPT-3s capabilities. As GPT-style programs scale, they get steadily better at predicting the next word. But up to a point, Branwen said, that improved prediction just makes it a little more accurate a mimic: a little better at English grammar, a little better at trivia questions. GPT-3 suggests to Branwen that past a certain point, that [improvement at prediction] starts coming from logic and reasoning and what looks entirely too much like thinking.
GPT-3 is, in some ways, a really simple program. It takes a well-known, not even state-of-the-art approach from machine learning. Fed most of the internet as data to train itself on news stories, wiki articles, even forum posts and fanfictionsand given lots of time and resources to chew on it, GPT-3 emerges as an uncannily clever language generator. Thats cool in its own right, and has big implications for the future of AI.
How GPT-3 works
To understand what a leap GPT-3 represents, it would be helpful to review two basic concepts in machine learning: supervised and unsupervised learning.
Until a few years ago, language AIs were taught predominantly through an approach called supervised learning. Thats where you have large, carefully labeled data sets that contain inputs and desired outputs. You teach the AI how to produce the outputs given the inputs.
That can produce good results sentences, paragraphs, and stories that do a solid job mimicking human language but it requires building huge data sets and carefully labeling each bit of data.
Supervised learning isnt how humans acquire skills and knowledge. We make inferences about the world without the carefully delineated examples from supervised learning. In other words, we do a lot of unsupervisedlearning.
Many people believe that advances in general AI capabilities will require advances in unsupervised learning where AI gets exposed to lots of unlabeled data and has to figure out everything else itself. Unsupervised learning is easier to scale since theres lots more unstructured data than there is structured data (no need to label all that data), and unsupervised learning may generalize better across tasks.
GPT-3(like its predecessors) is an unsupervised learner; it picked up everything it knows about language from unlabelled data. Specifically, researchers fed it most of the internet, from popular Reddit posts to Wikipedia to news articles to fan fiction.
GPT-3 uses this vast trove of information to do an extremely simple task: guess what words are most likely to come next, given a certain initial prompt. For example, if you want GPT-3 to write a news story about Joe Bidens climate policy, you might type in: Joe Biden today announced his plan to fight climate change. From there, GPT-3 will take care of the rest.
Heres what GPT-3 can do
OpenAI controls access to GPT-3; you can request access for research, a business idea, or just to play around, though theres a long waiting list for access. (Its free for now, but might be available commercially later.) Once you have access, you can interact with the program by typing in prompts for it to respond to.
GPT-3 has been used for all kinds of projects so far, from making imaginary conversations between historical figures to summarizing movies with emoji to writing code.
Sabeti prompted GPT-3 to write Dr. Seuss poems about Elon Musk. An excerpt:
But then, in his haste,he got into a fight.He had some emails that he sentthat werent quite polite.
The SEC said, Musk,your tweets are a blight.
Not bad for a machine.
GPT-3 can even correctly answer medical questions and explain its answers (though you shouldnt trust all its answers; more about that later):
So @OpenAI have given me early access to a tool which allows developers to use what is essentially the most powerful text generator ever. I thought Id test it by asking a medical question. The bold text is the text generated by the AI. Incredible… (1/2) pic.twitter.com/4bGfpI09CL
Qasim Munye (@QasimMunye) July 2, 2020
You can ask GPT-3 to write simpler versions of complicated instructions, or write excessively complicated instructions for simple tasks. At least one person has gotten GPT-3 to write a productivity blog whose bot-written posts performed quite well on tech news aggregator Hacker News.
Of course, there are some things GPT-3 shouldnt be used for: having casual conversations and trying to get true answers, for two. Tester after tester has pointed out that GPT-3 makes up a lot of nonsense. This isnt because it doesnt know the answer to a question asking with a different prompt will often get the correct answer but because the inaccurate answer seemed plausible to the computer.
Relatedly, GPT-3 will by default try to give reasonable responses to nonsense questions like how many bonks are in a quoit? That said, if you add to the prompt that GPT- 3 should refuse to answer nonsense questions, then it will do that.
So GPT-3 shows its skills to best effects in areas where we dont mind filtering out some bad answers, or areas where were not so concerned with the truth.
Branwen has an extensive catalog of examples of fiction writing by GPT-3. One of my favorites is a letter denying Indiana Jones tenure, which is lengthy and shockingly coherent, and concludes:
It is impossible to review the specifics of your tenure file without becoming enraptured by the vivid accounts of your life. However, it is not a life that will be appropriate for a member of the faculty at Indiana University, and it is with deep regret that I must deny your application for tenure….Your lack of diplomacy, your flagrant disregard for the feelings of others, your consistent need to inject yourself into scenarios which are clearly outside the scope of your scholarly expertise, and, frankly, the fact that you often take the side of the oppressor, leads us to the conclusion that you have used your tenure here to gain a personal advantage and have failed to adhere to the ideals of this institution.
Want to try it yourself? AI Dungeon is a text-based adventure game powered in part by GPT-3.
Why GPT-3 is a big deal
GPT-3s uncanny abilities as a satirist, poet, composer, and customer service agent arent actually the biggest part of the story. On its own, GPT-3 is an impressive proof of concept. But the concept its proving has bigger ramifications.
For a long time, weve assumed that creating computers that have general intelligence computers that surpass humans at a wide variety of tasks, from programming to researching to having intelligent conversations will be difficult to make, and will require detailed understanding of the human mind, consciousness, and reasoning. And for the last decade or so, a minority of AI researchers have been arguing that were wrong that human-level intelligence will arise naturally once we give computers more computing power.
GPT-3 is a point for the latter group. By the standards of modern machine-learning research, GPT-3s technical setup isnt that impressive. It uses an architecture from 2018 meaning, in a fast-moving field like this one, its already out of date. The research team largely didnt fix the constraints on GPT-2, such as its small window of memory for what it has written so far, which many outside observers criticized.
GPT-3 is terrifying because its a tiny model compared to whats possible, trained in the dumbest way possible, Branwen tweeted.
That suggests theres potential for a lot moreimprovements improvements that will one daymake GPT-3 look as shoddy as GPT-2 now does by comparison.
GPT-3 is a piece of evidence on a topic that has been hotly debated among AI researchers: Can we get transformative AI systems, ones that surpass human capabilities in many key areas, just using existing deep learning techniques? Is human-level intelligence something that will require a fundamentally new approach, or is it something that emerges of its own accord as we pump more and more computing power into simple machine learning models?
Thesequestions wont be settled for another few years at least. GPT-3 is not a human-level intelligence even if it can, in short bursts, do an uncanny imitation of one.
Skeptics have argued that those short bursts of uncanny imitation are driving more hype than GPT-3 really deserves. They point out that if aprompt is not carefully designed, GPT-3 will give poor quality answers which is absolutely true, though thatought to guide us toward better prompt design, not give up on GPT-3.
They also point out that a program that is sometimes right and sometimes confidently wrong is, for many tasks, much worse than nothing. (There are ways to learn how confident GPT-3 is in a guess, but even while using those, you certainly shouldnt take the programs outputs at face value.) They also note that other language models purpose-built for specific tasks can do better on those tasks than GPT-3.
All of that is true. GPT-3 is limited. But what makes it so important is less its capabilities and more the evidence it offers that just pouring more data and more computing time into the same approach gets you astonishing results. With the GPT architecture, the more you spend, the more you get. If there are eventually to be diminishing returns, that point must be somewhere past the $10 million that went into GPT-3. And we should at least be considering the possibility that spending more money gets you a smarter and smarter system.
Other experts have reassured us that such an outcome is very unlikely. As a famous artificial intelligence researcher said earlier this year, No matter how good our computers get at winning games like Go or Jeopardy, we dont live by the rules of those games. Our minds are much, much bigger than that.
Actually, GPT-3 wrote that.
AIs getting smarter isnt necessarily good news
Narrow AI has seen extraordinary progress over the past few years. AI systems have improved dramatically at translation, games like chess and Go, important research biology questions like predicting how proteins fold, and generating images. AI systems determine what youll see in a Google search or in your Facebook Newsfeed. They compose music and write articles that, at a glance, read as if a human wrote them. They play strategygames. They are being developed to improve drone targeting and detect missiles.
But narrow AI is getting less narrow. Once, we made progress in AI by painstakingly teaching computer systems specific concepts. To do computer vision allowing a computer toidentify things in pictures and video researchers wrote algorithms for detecting edges. To play chess, they programmed in heuristics about chess. To do natural language processing (speech recognition, transcription, translation, etc.), they drew on the field of linguistics.
But recently, weve gotten better at creating computer systems that have generalized learning capabilities. Instead of mathematically describing detailed features of a problem, we let the computer system learn that by itself. While once we treated computer vision as a completely different problem from natural language processing or platform game playing, now we can solve all three problems with the same approaches.
GPT-3 is not the best AI system in the world at question answering, summarizing news articles, or answering science questions. Its distinctly mediocre at translation and arithmetic. But it is much more general than previous systems it can do all of these things and more with just a few examples. And AI systems to come will likely be yet more general.
That poses some problems.
Our AI progress so far has enabled enormous advances, but also raised urgent ethical questions. When you train a computer system to predict which convicted felons will reoffend, youre using inputs from a criminal justice system biased against Black people and low-income people so its outputs will likely be biased against Black and low-income people, too. Making websites more addictive can be great for your revenue but bad for your users. Releasing a program that writes convincing fake reviews or fake news might make those widespread, making it harder for the truth to get out.
Rosie Campbell at UC Berkeleys Center for Human-Compatible AI argues that these are examples, writ small, of the big worry experts have about AI in the future. The difficulties were wrestling with today with narrow AI dont come from the systems turning on us or wanting revenge or considering us inferior. Rather, they come from the disconnect between what we tell our systems to do and what we actually want them to do.
For example, we tell an AI system to run up a high score in a video game. We want it to play the game fairly and learn game skills but if it has the chance to directly hack the scoring system, it will do that to achieve the goal we set for it. Its doing great by the metric we gave it. But we arent actuallygetting what we wanted.
One of the most disconcerting things about GPT-3 is the realization that its often giving us what we asked for, not what we wanted.
If you prompt GPT-3 to write you a story with a prompt like here is a short story, it will write a distinctly mediocre story. If you instead prompt it with here is an award-winning short story, it will write a better one.
Why? Because it trained on the internet, and most stories on the internet are bad, and it predicts text. It isnt motivated to come up with the best text, or the text we most wanted; just the text that seems most plausible. Telling it the story won an award changes what text seems most plausible.
With GPT-3, this is harmless. And though people have used GPT-3 to write manifestos about GPT-3s schemes to fool humans, GPT-3 is not anywhere near powerful enough to pose the risks that AI scientists warn of.
But someday we may have computer systems that are capable of humanlike reasoning. If theyre made with deep learning, they will be hard for us to interpret, and their behavior will be confusing and highly variable sometimes seeming much smarter than human and sometimes much less so.
And many AI researchers believe that that combination exceptional capabilities, goals that dont represent what we really want but just what we asked for, and incomprehensible inner workings will produce AI systems that exercise a lot of power in the world. Not for the good of humanity, not for vengeance against humanity, but toward goals that arent what we want.
Handing over our future to them would be a mistake but one itd be easy to make step-by-step, with each step half an accident.
Will you become our 20,000th supporter? When the economy took a downturn in the spring and we started asking readers for financial contributions, we werent sure how it would go. Today, were humbled to say that nearly 20,000 people have chipped in. The reason is both lovely and surprising: Readers told us that they contribute both because they value explanation and because they value that other people can access it, too. We have always believed that explanatory journalism is vital for a functioning democracy. Thats never been more important than today, during a public health crisis, racial justice protests, a recession, and a presidential election. But our distinctive explanatory journalism is expensive, and advertising alone wont let us keep creating it at the quality and volume this moment requires. Your financial contribution will not constitute a donation, but it will help keep Vox free for all. Contribute today from as little as $3.