RouteLLM is essentially a benchmark-driven approach. Their framework chooses between a weak and a strong model and helps developers optimize for a metric called APGR (Average Performance Gap Recovered) — a measure of how much of the stronger model’s performance can be recovered when routing some queries to the weaker, cheaper model. However, their routing models are trained to maximize performance on public benchmarks like MMLU, BBH, or MT-Bench. These benchmarks may not capture subjective, domain-specific quality signals that surface in practice.
Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.
Opik is an evaluation tool first. Arch is a proxy server built on top of Envoy so it borrows from a very robust observability source. They both are complimentary in many ways
Envoy is compatible with OTel out of the box. That's a big plus for observability. Plus Envoy is designed for high-load dataplane (in the request path worklaods) and used in every modern stack. There are several advantages on using Arch as the source of observability (traces, metrics, logs)
"Among our Silicon Valley-based portfolio companies, not a single company past “A” does not have a distributed team." - how the narrative on distributed teams is changing in SV given the war for talent and rising costs.
This.
> Did I have to give things up in order to find 100 books’ worth of reading time? Sure – I didn’t watch as much sports this year, and I wound up deleting the Facebook app off my phone.