B01-NUna model card and orchestration overview
How multi-model routing, reasoning paths, and published benchmarks fit together in production.
B01-NUna is an orchestration layer: it analyzes each request, selects among configured model providers, and composes tools behind one interface. It is not a single monolithic model trained end-to-end in this repository.
We publish a model card so users, partners, and researchers can see how production routing works — without mystique or hand-waving.
Production routing, honestly
Default cloud chat routes general turns to Groq-hosted Llama 3.3 70B (llama-3.3-70b-versatile). Reasoning-heavy queries can route to Groq-hosted GPT-OSS 120B (openai/gpt-oss-120b). Both defaults are configurable via environment variables.
Routing is heuristic and score-based — query analysis, provider capabilities, latency and cost hints, circuit-breaker state, and trust-weighted feedback — not a jointly trained softmax router.
Benchmarks and R&D
- In-product backbone benchmark figures are sourced from Meta’s Llama 3.3 70B Instruct model card
- End-to-end app performance depends on routing, provider load, prompt policy, and tool usage
- Optional Hub LoRA adapter work (pejmantheory/B01-NUna) is R&D and backup — not the default production path on helloblue.ai
- Availability targets in the card are operational goals, not contractual SLAs