Retrospective Benchmarks for Machine Intelligence

From AGI Definitions to the Hunt for Anticipations

December 2025

Every definition of intelligence is also a test. Historical thinkers left us specifications; we operationalize them into criteria and evaluate frontier AI systems. The results vary.

Plato's Academy defined man as a ζῷον δίπουν ἄπτερον—a featherless biped. Diogenes refuted it with a plucked chicken. We apply the same logic to AI:

Criterion Score
Featherless (ἄπτερον) 100%
Biped (δίπουν) 0%
Overall 50%

By the Platonic definition, AI in late 2025 is half a man. (How we score →)

We applied this approach to six definitions of AGI from 1997–2023. Scores ranged from 32% to 80%. That spread is not measurement error—it is conceptual disagreement made visible.

Part I: The AGI Series

Did we meet their standard for AGI?

Six definitions of Artificial General Intelligence, spanning 1997 to 2023. Each was explicitly trying to specify what machine intelligence would require. Each yields a different verdict when applied to frontier AI systems.

Part II extends backward to thinkers who had theories of mind without access to contemporary machines that might, or might not, exhibit what they described.

Part II: The Hunt for Anticipations

Would they recognize what we've built?

Thinkers who had theories of mind without any concept of machines that might exhibit it. Aristotle's nous, Descartes' two tests, Lovelace's objection, Turing's imitation game. Standalone essays, titled by thinker and year.

Methods and Style Guide

The methodology: three-point scoring, interpretation principles, scholarly tone, citation requirements. Everything needed to continue the project.

Read the full methodology →

Evaluated December 2025. Working papers—comments welcome. Style guide.