Source: Astral Codex Ten
by Scott Alexander
“Every city parties for its own reasons. New Yorkers party to flaunt their wealth. Angelenos party to flaunt their beauty. Washingtonians party to network. Here in SF, they party because Claude 4.5 Opus has saturated VendingBench, and the newest AI agency benchmark is PartyBench, where an AI is asked to throw a house party and graded on its performance. You weren’t invited to Claude 4.5 Opus’[s] party. Claude 4.5 Opus invited all of the coolest people in town while gracefully avoiding the failure mode of including someone like you. You weren’t invited to Sonnet 4.5’s party either, or Haiku 4.5’s. You were invited by an AI called haiku-3.8-open-mini-nonthinking, which you’d never heard of before. Who was even spending the money to benchmark haiku-3.8-open-mini-nonthinking? You suspect it was one of their competitors, trying to make their own models look good in comparison.” (01/13/26)