As Microsoft’s computer scientists experimented with a novel AI system last year, they posed a challenge that necessitated a deep comprehension of the physical world. They inquired, “Here we have a book, nine eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner.” The AI system’s inventive response astounded the researchers, leading them to wonder if they were witnessing an unprecedented form of intelligence.
In March, the team published a comprehensive 155-page research paper asserting that the system represented progress towards artificial general intelligence (AGI), a machine capable of performing any task the human brain can accomplish. Microsoft’s bold claim, titled “Sparks of Artificial General Intelligence,” reignited an ongoing debate in the tech industry: Are we on the verge of creating something akin to human intelligence, or are we allowing our imaginations to run wild?
Microsoft’s research head, Peter Lee, admitted, “I started off being very skeptical — and that evolved into a sense of frustration, annoyance, maybe even fear.” The pursuit of AGI has long been a source of both excitement and trepidation for technologists. While creating a machine that functions like or surpasses the human brain could revolutionize the world, it also poses potential dangers.
However, some experts argue that recent advancements in AI systems are producing human-like responses and ideas that were not pre-programmed, indicating a shift towards AGI. Microsoft has restructured portions of its research labs to explore this possibility, with one group led by Sébastien Bubeck, the principal author of Microsoft’s AGI paper.
Over the past five years, companies like Google, Microsoft, and OpenAI have developed large language models (LLMs) that analyze vast quantities of digital text. In doing so, these systems learn to generate their own text and even engage in conversations. Microsoft’s researchers were specifically working with OpenAI’s GPT-4, considered the most powerful of these systems.
Dr. Bubeck and his colleagues documented complex behavior exhibited by the system, which they believed demonstrated a “deep and flexible understanding” of human concepts and skills. Dr. Lee noted that people using GPT-4 are “amazed at its ability to generate text,” but it turns out to be even better at “analyzing and synthesizing and evaluating and judging text than generating it.”
The AI’s capabilities were further highlighted when it was asked to draw a unicorn using the TiKZ programming language. Not only did it generate a program to draw a unicorn, but when the code for the unicorn’s horn was removed, the system successfully modified the program to draw a complete unicorn once again.
The researchers posed various tasks to the AI system, such as creating a program that assessed diabetes risk based on personal data, composing a letter endorsing an electron for US president in Mahatma Gandhi’s voice, and writing a Socratic dialogue examining the misuse and dangers of LLMs. The AI system demonstrated understanding across diverse fields like politics, physics, history, computer science, medicine, and philosophy, combining its knowledge to complete these tasks. Dr. Bubeck commented, “All of the things I thought it wouldn’t be able to do? It was certainly able to do many of them — if not most of them.”
However, some AI experts viewed Microsoft’s paper as an attempt to make grand claims about a technology that is not yet fully understood. Critics argue that general intelligence necessitates familiarity with the physical world, which GPT-4 theoretically lacks. Maarten Sap, a researcher and professor at Carnegie Mellon University, said, “The ‘Sparks of AGI’ is an example of some of these big companies co-opting the research paper format into PR pitches.”
Dr. Bubeck and Dr. Lee admitted they were uncertain how to define the system’s behavior and opted for “Sparks of AGI” to capture other researchers’ imaginations. Since Microsoft tested an early version of GPT-4 that had not yet been refined to exclude hate speech and misinformation, the claims made in the paper cannot be verified by external experts.
Though AI systems like GPT-4 sometimes appear to imitate human reasoning, they can also exhibit inconsistencies. Ece Kamar, a research lead at Microsoft, pointed out that these behaviors are not always consistent. Alison Gopnik, a psychology professor and AI researcher at the University of California, Berkeley, argued that while GPT-4 and similar systems are undeniably powerful, it remains unclear whether their generated text truly reflects human reasoning or common sense. She noted that anthropomorphizing these systems is a common tendency, but added, “thinking about this as a constant comparison between AI and humans — like some sort of game show competition — is just not the right way to think about it.”