Digital Humanist | Valentina Rossi

JANUARY 2025

WHY AGI ISN'T A THING (YET).

SUMMARY.

The recent claims surrounding OpenAI’s new state-of-the-art model, i.e. o3, and specifically its performance on the ARC AGI test (a benchmark designed to measure the efficiency of AI skill-acquisition on unknown tasks), have reignited the debate about what constitutes true Artificial General Intelligence (AGI). Definitions of AGI vary, but most scholars agree it involves the ability to autonomously reason, learn, and adapt across diverse tasks in ways akin to human intelligence (Lake et al., 2017; Chollet, 2019; Pennachin and Goertzel, 2007). Unlike narrow AI, which excels in predefined domains, AGI should generalise knowledgeeffectively, handle out-of-distribution scenarios, and demonstrate emergent learning and problem-solving capabilities. While OpenAI’s advancements are noteworthy, the model’s achievements raise questions about the actual progress toward AGI.

ARTIFICIAL GENERAL INTELLIGENCE: AGI.

A growing community of researchers has recently turned its attention to the core ambitions of the artificial intelligence field: designing and studying systems exhibiting general intelligence comparable to, or potentially exceeding, human capabilities.

An AGI system should be able to adapt its knowledge and apply it flexibly across various tasks and situations. It is precisely this ability to generalise which distinguishes this kind of technology from narrower AI, although there is some overlap between the two fields.

A range of perspectives and approaches have been proposed to achieve AGI, including (Goertzel, 2014):
  • Systems based on symbolic, emergent, hybrid, or universal principles.
  • Frameworks drawing inspiration from engineering, biology, or mathematical theories.
  • Cognitive models, such as SOAR, developed to replicate aspects of human intelligence.
  • Strategies that mimic evolutionary ecosystems, often termed artificial life.
  • Developmental robotics, emphasising learning via interaction with the surrounding environment.

Currently, research into AGI remains in its initial stages, with various approaches and architectures under exploration. However, numerous theoretical and practical challenges persist, and no system has yet achieved the general capabilities of human intelligence. It remains uncertain whether this goal is even attainable, given the complexity, diversity, and multidimensional nature of our intellectual abilities as we understand them.

TECHNICAL BACKGROUND ON O1 & O3.

The o1 model, introduced by OpenAI in September 2024, is a large language model (LLM) built on the GPT (Generative Pre-trained Transformer) framework.

This model incorporates cutting-edge training techniques, including reinforcement learning and a chain of thought (CoT) approach, enabling it to tackle complex queries by dividing them into smaller, logical steps that mimic a structured reasoning process.

This methodology significantly enhances its precision and effectiveness in areas such as mathematics and programming.

OpenAI explicitly restricts users from attempting to reveal the hidden chain of thought, as it is intentionally obscured and designed not to respond to such requests in alignment with company policies.

Studies have shown that o1’s capabilities improve steadily with increased reinforcement learning (train-time compute) and extended time spent reasoning (test-time compute).

In December 2024, OpenAI unveiled o3, the next iteration of this technology. This model surpassed o1 in numerous benchmarks, demonstrating superior abilities in fields like complex programming tasks, as well as mathematical and scientific reasoning.

The o3 model also introduced significant safety enhancements, thanks to an approach known as deliberative alignment. This training method involves explicitly teaching models to adhere to human-written safety guidelines and to reason systematically about these rules before generating responses, ensuring a more secure and interpretable system.

BENCHMARK PERFORMANCE VS. GENUINE AUTONOMY.

Sophisticated language models like OpenAI’s o1 and o3 represent substantial advancements in natural language processing (NLP), achieving impressive outcomes across a range of benchmarks. However, it’s essential to recognise that these systems do not possess genuine autonomy.

While o3 has excelled in evaluations such as ARC AGI and SWE-Bench, these achievements should not be mistaken for artificial general intelligence (AGI). Its performance stems from extensive optimisation and fine-tuning on highly specialised datasets, often incorporating synthetic or benchmark-aligned data.

By definition, AGI would need the ability to adapt flexibly to unforeseen situations, learn new contexts independently, and engage in both inductive and deductive reasoning across unrestricted domains.

In contrast, models like o3 remain constrained by their training data and artificial neural network architectures, unable to handle out-of-distribution inputs or demonstrate true comprehension. Nevertheless, discussions around such models frequently frame them as approaching AGI, leading to misunderstandings among both experts and the general public.

Achieving AGI isn’t about excelling in every benchmark but about demonstrating autonomous iteration in unsupervised environments and tackling complex challenges beyond predefined scenarios. Although OpenAI’s o3 delivers remarkable results, it lacks the critical qualities that would constitute true autonomy.

MOVING FORWARD: THE PATH TO AGI.

The journey towards Artificial General Intelligence (AGI) demands a fundamental transformation in how we perceive, design, and assess artificial intelligence.

This involves creating systems capable of extrapolating knowledge far beyond the confines of their training datasets. Equally important is a profound reconsideration of the philosophical and ethical implications of achieving AGI, ensuring its progress is firmly rooted in humanity’s core values and societal priorities.

By adopting a comprehensive and integrated approach, we can pave the way for genuinely autonomous and intelligent systems capable of navigating complex, unstructured real-world scenarios.

To achieve this, it will be essential to look beyond traditional benchmarks and adopt a more expansive mindset, fostering the breakthroughs that will define the next wave of technological innovation.

Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.

Goertzel, B. (2014). Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence5(1), 1.

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and brain sciences40, e253.

Pennachin, C., & Goertzel, B. (2007). Contemporary approaches to artificial general intelligence. In Artificial general intelligence (pp. 1-30). Berlin, Heidelberg: Springer Berlin Heidelberg.