
Measuring AI’s Progress is Getting Tougher—What Happened to the AI Wall?
Artificial Intelligence (AI) is undeniably one of the most groundbreaking and transformative technologies of our time. Yet, as impressive as AI systems have become, they continue to confound even the most seasoned experts. With every passing day, it feels as though the boundaries of what AI can do are being pushed further, leaving many to wonder just how good AI is getting and whether we are on the verge of hitting an invisible wall of technological progress.
For years, we’ve been warned that AI would inevitably hit a technical ceiling. Scaling laws, which describe how AI systems improve with the increase in computational resources, suggested there might be a point at which growth would plateau. This idea, which had been prevalent among machine learning researchers, suggested that despite all the power and data we throw at AI, it would eventually hit a wall, slowing its progress significantly. However, the reality has turned out to be far more complex than that.
In fact, as we move further into 2025, it’s becoming increasingly evident that AI is not just breaking through existing barriers; it is moving into new frontiers, with the potential to revolutionize how we live, work, and think. The problem is, this rapid progress is often invisible, and it’s getting harder to measure how far AI has come and just how capable these systems are.
The Changing Landscape of AI Progress
To understand the significance of AI’s continued progress, it’s important to first look at how we measure the performance of AI systems. For years, machine learning researchers have relied on standardized benchmarks—tests designed to evaluate the capabilities of different AI systems across a range of tasks. These benchmarks span multiple domains, from math and programming to reading comprehension and problem-solving.
One of the most widely used measures of AI performance has been the US Math Olympiad, where AI models have been tested to gauge their ability to solve complex mathematical problems. The models have made such significant strides that they now perform at levels comparable to top human competitors in the field. Similarly, AI’s ability to solve problems in physics, biology, and chemistry has surpassed human experts in some cases, making traditional benchmarks nearly useless.
This phenomenon, known as benchmark saturation, is one of the key reasons why it’s getting harder to measure just how good AI is. When AI systems reach near-perfect performance on a benchmark, that benchmark becomes less useful as a tool for evaluating future progress. Once AI has exceeded human-level performance on a specific task, further improvements are harder to quantify and require more innovative approaches to measurement.
The Rise of o3: A Game-Changing AI Model
One of the most significant developments in AI over the past year has been the release of OpenAI’s large language model (LLM) called o3. This model is a prime example of AI’s remarkable progress and represents a new leap forward in what AI can achieve. While o3 does not completely invalidate the argument that scaling laws may no longer apply as they once did, it shows that AI’s progress is far from being stifled.
o3 has achieved remarkable performance in solving problems that were once thought to be beyond the reach of AI. When tuned for difficult tasks, o3 has scored an incredible 88 percent on a benchmark known as ARC-AGI, a test designed to measure general humanlike intelligence. This achievement is a stark reminder that AI is still capable of profound improvements, even in areas once believed to be at the edge of AI’s capabilities.
As impressive as o3 is, it’s important to note that its success doesn’t mean that AI has hit its peak. In fact, we’re still in the early stages of understanding just how much potential these AI systems have. What’s more, the pace of innovation in AI models has accelerated, with new models and systems emerging faster than ever before. As a result, AI research is entering a phase where we can no longer rely on traditional benchmarks to evaluate progress.
The Problem with Traditional Benchmarks
One of the main issues with measuring AI progress using traditional benchmarks is that these tests have become saturated. AI systems are improving so quickly that many existing benchmarks no longer serve as effective measures of progress. For example, the General Physics and Chemistry Question Answering (GPQA) benchmark, which was once used to test AI’s ability to solve complex problems in science, has now been surpassed by AI systems performing at a level beyond even the best human experts in the field.
This saturation is not limited to science. Benchmarks like the MMLU, which was designed to assess language understanding across a wide range of subjects, have also been saturated by AI models that consistently perform at or above human-level proficiency. As AI continues to advance, researchers are left scrambling to create more difficult tests to measure further progress. But even these new benchmarks may quickly lose their value as AI systems continue to evolve at a rapid pace.
The challenge now lies in developing new ways to assess AI’s capabilities in a meaningful way. Researchers are beginning to realize that future benchmarks will need to focus on tasks that humans are incapable of performing themselves, rather than simply comparing AI performance to human expertise. This shift in focus reflects the increasing complexity and sophistication of AI systems, which are becoming more adept at solving problems that were once considered too complex for machines to handle.
The Invisible Wall: How AI Progress Is Becoming Harder to See
One of the most intriguing aspects of AI progress in recent years is how it has become invisible. As AI models continue to improve, their advancements are often not as immediately noticeable to the general public. This is because much of the progress happening behind the scenes involves tasks that are beyond the average person’s ability to assess. While it’s easy to recognize the difference between a 5-year-old learning basic math and a high school student studying calculus, the difference between a first-year math student and a world-class mathematician is much harder to see.
This is the crux of the invisible wall that AI is hitting. The progress being made by AI is so sophisticated that it is no longer as tangible or visible to the broader public. People are still perceiving AI as a tool that makes silly mistakes or struggles with simple tasks, but that perception is outdated. In reality, AI has moved far beyond its early limitations and is now capable of performing complex, intellectually demanding tasks with a level of expertise that rivals, and in some cases exceeds, that of human professionals.
The fact that AI’s progress is becoming harder to see doesn’t mean that it’s stalled or reached its limits. On the contrary, it means that AI is advancing in ways that are subtle and difficult for the average person to notice. These developments are changing the world in ways that will have a profound impact on how we work, live, and interact with technology in the coming years.
What’s Next for AI?
The future of AI holds tremendous potential, driven by three key factors: decreasing costs, improving interfaces, and increasing intelligence. Each of these factors will play a critical role in shaping the trajectory of AI over the next several years.
- Cost Reduction: One of the key drivers of AI’s future progress is the continued reduction in the cost of computing power. While high-end models like o3 may still be expensive to run, advances in hardware and cloud computing are making it more affordable to harness the power of AI. This reduction in cost will make AI accessible to a wider range of organizations and individuals, driving greater adoption and innovation.
- Improved Interfaces: As AI systems become more powerful, there is a growing emphasis on improving the way humans interact with them. AI developers are working on creating more intuitive and user-friendly interfaces that allow individuals to seamlessly communicate with AI models. These innovations will make it easier for people to use AI for a variety of tasks, from complex problem-solving to everyday productivity.
- Increasing Intelligence: The most exciting aspect of AI’s future is its continued evolution in terms of intelligence. AI models are becoming better at reasoning, problem-solving, and understanding complex tasks across a wide range of domains. This increasing intelligence will enable AI to automate an ever-growing number of intellectual tasks, changing the nature of work and potentially transforming entire industries.
Conclusion
As AI continues to advance, it’s becoming increasingly clear that the technology is far from hitting a wall. While traditional benchmarks may no longer serve as effective measures of progress, AI is still improving in ways that are reshaping the world. Whether it’s through reduced costs, better interfaces, or increased intelligence, the future of AI is incredibly bright.
For those who have been wondering whether AI has reached its limits, the answer is a resounding no. We are still in the early stages of what promises to be one of the most transformative periods in human history. As AI continues to evolve, we must stay focused on how we can responsibly harness its power to address the world’s biggest challenges and create a future that benefits all of humanity.