All published research
Long-form analysis of the AI capability race — model releases, benchmark breakthroughs, enterprise routing shifts, and what the data underneath actually says.
How Public Markets Already Own the AI Frontier
Anthropic and OpenAI are private — but their cap tables sit on six US-listed names. We map who owns what, how much of each headline number is real cash vs. accounting markup, and where the most concentrated public-market exposure to either lab actually lives. Includes both ownership cards and the four caveats every reader should know.
Read the analysis →
How Long Can Claude Mythos Work Alone?
METR's autonomous-task benchmark hasn't scored Claude Mythos yet, but every other frontier model has both a METR time horizon and an IRT (composite benchmark ability) score. We fit four regression families across 19 model points and used the piecewise prediction to land Mythos at roughly 15 hours of unsupervised work. Under the trend acceleration implied by Anthropic's Opus 4 → 4.5 → 4.6 cadence, that figure climbs toward 30. Includes the full regression panel, leave-one-out cross-validation, and a parallel forecast for Meta's Muse Spark.
Read the analysis →
Do Benchmark Breakthroughs Actually Matter?
A 7-phase event study around 392 SOTA benchmark events from February 2023 through March 2026, testing whether each capability release moved either token consumption (via OpenRouter weekly rankings) or stock prices (cumulative abnormal returns across GOOGL, AMZN, META, NVDA, MSFT). Findings: usage shifts within two weeks; markets don't react. The full report carries 30+ interactive charts, granular tables, and the OLS specification breakdown.
Read the analysis →
Kimi, Composer, and the Build vs Buy Dilemma
When Cursor quietly began routing requests to a fine-tuned open-weight model in February, no major outlet caught it. We did — by triangulating SDK download curves, GitHub agent commit shares, OpenRouter token routing, Replicate runs, HuggingFace activity, and DeepInfra pricing across six independent feeds. The piece walks through the signals as they appeared in real time and lays out the build-versus-buy implications for every downstream agentic-coding product.
Read the analysis →