Nuna
Orchestrated inference, not a single static weights blob. Each turn hits query analysis → orchestrator routing (LLM providers, retrieval, news, and tool handlers) → chunked responses over Server-Sent Events to the chat UI. What you get in production is this pipeline—multi-backend, fallbacks, and the same /api/chat surface the apps call. Charts below use /api/metrics when the server has samples; otherwise numbers are placeholders until real traffic fills the histograms.
B01-1.2V-5B
Foundation Model
Our advanced AI orchestration system (B01-NUna) intelligently routes queries across a 280+ billion parameter multi-model architecture for optimal performance. Features GPU-accelerated self-learning capabilities. Note: B01-1.2V-5B (5.2B parameters) is the foundation model component within this architecture.
GPU Training Status
Nuna
Orchestration · streaming · tools
The production assistant stack behind Helloblue: each request is analyzed and routed through an intelligent orchestrator to the right path—multi-provider LLMs, retrieval and news handlers, optional tools (e.g. code execution, image generation)—then answers stream to the client over HTTP Server-Sent Events. Not one frozen checkpoint; a router plus live backends and fallbacks.
Performance Benchmarks
Latency and throughput reflect aggregates from this deployment's metrics API; competitor bars are illustrative benchmarks, not live measurements.
Latency Comparison
Average response time (ms)
* Competitor bars use typical public benchmark figures. Nuna bar uses this app's /api/metrics chat-path averages when available, else a placeholder until traffic exists.
Throughput Over Time
Requests per second (24h)
* Showing estimated throughput. Real data will appear as requests are processed.
Technical Architecture
Nuna is the orchestration and delivery layer: routing, streaming, and tool wiring—not a single model card in isolation. Underneath sit multiple providers and task-specific handlers chosen per request.
Routing stack
Intelligent orchestrator selects providers and handlers from query analysis, history, and health signals—not one fixed model for every prompt.
- Multi-provider LLM paths with fallbacks
- Dedicated handlers (e.g. news, search, math)
- Streaming-first chat pipeline
- Circuit-breaker friendly provider selection
Delivery
Responses stream over SSE where clients request it; metrics endpoints expose rolling latency and request counts from production traffic.
- Chunked token streaming to the UI
- REST + SSE on the same chat route
- Server-side timeouts and error frames
- GPU-backed training path for fine-tunes (when enabled)
Product surface
Same stack powers web, desktop, and mobile WebView clients against the deployed API.
- Auth and session-aware chat
- File and multimodal contexts where enabled
- Image and code flows behind feature flags and keys
- Public docs and OpenAPI for integration
API Access
Integrate with the same Nuna stack via HTTP: chat streaming, documented routes, and the OpenAPI entry points linked from docs.