📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems have achieved near-saturation on core engineering benchmarks, automating most AI engineering tasks. Research remains less automated, but progress suggests it may also be increasingly automated soon. This shift could reshape AI development workflows.
Recent empirical data shows that AI systems are now capable of automating the majority of AI engineering tasks, with some benchmarks reaching near-complete saturation, while research tasks remain less automated but are rapidly advancing, signaling a potential shift in AI development practices.
Thorsten Meyer reports that six key benchmarks measuring AI capabilities relevant to R&D have shown significant progress, with three reaching or approaching saturation within 16 to 21 months. For example, the CORE-Bench, which assesses research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with some experts declaring it ‘solved.’ Similarly, the MLE-Bench, evaluating Kaggle competition performance, increased from 16.9% to 64.4% over 16 months, reaching a level comparable to mid-tier human performance.
These benchmarks indicate that AI can now handle complex engineering tasks such as reproducing research experiments and competing in ML competitions at levels close to or surpassing human capability. Meanwhile, progress in kernel design—an essential part of AI infrastructure—continues through research papers and production-grade models, further supporting the trend of automation in engineering.
Clark’s analysis suggests that while engineering tasks are largely automatable, research remains less so, though the gap is narrowing. The key open question is whether research itself is simply engineering at scale, which could accelerate automation in this domain as well.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Tools for Finance and Accounting Professionals: Automate Tasks, Save Hours, Work Smarter
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
![Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results](https://m.media-amazon.com/images/I/415+fSJacsL._SL500_.jpg)
Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

Learning Resources STEM Simple Machines Activity Set – STEM Engineering Kits for Kids, Simple Machines Kit for Classroom, Pulley Machine, Engineering Activities, Force and Motion Science Kit
EXPLORES SIMPLE MACHINES & ENGINEERING CONCEPTS: Handson STEM activity set introduces kids to simple machines like levers, pulleys,…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
AI research benchmarking platforms
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Innovation
The rapid automation of engineering tasks in AI research suggests a fundamental shift in how AI systems are developed. As tools increasingly handle routine and complex engineering work, human researchers may focus more on high-level innovation and theory. This could accelerate AI progress, reduce costs, and reshape the structure of AI research institutions. However, the residual challenge remains: automating the creative and conceptual aspects of research, which are less well-understood and harder to quantify.
Progress in AI Capabilities Over the Past Two Years
Since 2024, multiple benchmarks measuring core AI engineering skills have shown consistent improvement. The CORE-Bench, which tests research reproduction, and the MLE-Bench, evaluating ML competition performance, have both approached saturation, signaling that AI systems can now perform tasks previously thought to require human expertise. Concurrently, advances in kernel design—integral to AI infrastructure—are documented through research papers and production tools, indicating that engineering automation is becoming mainstream.
This trend is part of a broader pattern of rapid AI capability growth, driven by large language models and specialized AI systems, which are now capable of handling increasingly complex tasks with minimal human intervention. The current phase suggests a nearing ‘engineering singularity,’ where automation dominates routine development tasks.
“The pattern across these benchmarks indicates that AI can now automate vast swaths, perhaps the entirety, of AI engineering.”
— Thorsten Meyer
Remaining Challenges in Automating AI Research
While engineering tasks are approaching full automation, it remains unclear how much of the research process—particularly the creative, hypothesis-driven aspects—can be automated. Clark notes that some research may involve distinct skills that are not yet replicable by AI, and whether automation will extend to these areas is still an open question. Additionally, the pace at which research automation might accelerate remains uncertain, as does the potential impact on the broader research ecosystem.
Next Milestones in AI Automation and Research
Over the coming 32 months, expected developments include further saturation of engineering benchmarks, increased deployment of AI in infrastructure design, and potential breakthroughs in automating research-level tasks. Researchers and institutions will likely monitor progress in automating hypothesis generation, experimental design, and theory development. Policy and organizational responses will also shape how automation influences the future of AI innovation.
Key Questions
What are the main benchmarks showing AI automation progress?
The main benchmarks include CORE-Bench (research reproduction), MLE-Bench (ML competition performance), and various kernel design papers and tools, all demonstrating rapid progress toward automation.
Does this mean human researchers are becoming obsolete?
Not necessarily. While routine engineering tasks are increasingly automated, high-level research, creativity, and hypothesis formulation may remain less automatable in the near term.
What are the risks of automating AI research and engineering?
Potential risks include over-reliance on automated systems, reduced diversity of approaches, and challenges in verifying AI-generated research. Ethical and safety considerations will also grow in importance.
When might full automation of AI research occur?
It is uncertain; current trends suggest significant automation within the next 2-3 years, but full automation, especially of creative aspects, may take longer and depends on breakthroughs in AI capabilities.
Source: ThorstenMeyerAI.com