Not a 'models are bad' rant — I want specific tasks where you expected more. Helps everyone pattern-match to the right tool. I'll start: multi-step math proofs still feel unreliable without verification.
Sign in to reply to this thread