Model A/B Comparison

Send the same question to two models simultaneously — compare quality, latency, and token usage.

Querying both models in sequence…