Pick up to 4 models to compare side-by-side — pass rates per test, latency, price, and context window.