B01-1.2V-5B Documentation

A 5B parameter transformer-based language model fine-tuned on diverse datasets with multimodal capabilities. Features include chain-of-thought reasoning, few-shot learning, and real-time inference optimization. Built on Llama 2 architecture with custom training on 1.2T tokens.

View Models Download Model Card (PDF)

Model Card Preview

B01-1.2V-5B Model Card

Download the complete model card for detailed specifications and capabilities.

Quickstart

Get started with the B01-1.2V-5B API in seconds. We support two authentication methods:

Method 1: Authorization Header

curl -X POST https://helloblue.ai/api/v1/generate \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "b01-nuna",
    "prompt": "Explain quantum computing in simple terms.",
    "max_tokens": 512
  }'

Method 2: X-API-Key Header

curl -X POST https://helloblue.ai/api/v1/generate \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "b01-nuna",
    "prompt": "Explain quantum computing in simple terms.",
    "max_tokens": 512
  }'

Replace YOUR_API_KEY with your actual API key. Both authentication methods are fully supported.
Supported Models: b01-1.2v-5b (Foundation Model), b01-nuna (Ultra-Fast Inference)

API Reference

Endpoint Details

  • Endpoint: POST https://helloblue.ai/api/v1/generate
  • Models: b01-1.2v-5b (Foundation), b01-nuna (Ultra-Fast)
  • Authentication: Bearer token or X-API-Key header
  • Content-Type: application/json

Request Parameters

  • prompt (string, required) - The input text
  • max_tokens (integer, optional) - Maximum tokens to generate
  • temperature (float, optional) - Response creativity (0.0-1.0)
  • model (string, required) - Model identifier

Response Format

{
  "model": "b01-nuna",
  "prompt": "Explain quantum computing in simple terms.",
  "max_tokens": 512,
  "text": "Of course! Quantum computing is a new way...",
  "object": "text_completion",
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 366,
    "total_tokens": 378
  },
  "metadata": {
    "provider": "Enterprise AI Service",
    "processing_time": 5810,
    "confidence": 0.85,
    "endpoint": "https://helloblue.ai/api/v1/generate",
    "retries": 0,
    "fallbacks": [],
    "model_used": "b01-nuna"
  }
}

API Testing

Test our API endpoints with these ready-to-use curl commands. No authentication required for test endpoints.

System Health Check

Get comprehensive system health metrics, memory usage, and performance data.

curl -X GET https://helloblue.ai/api/health \
  -H "Content-Type: application/json"

Simple AI Test (Direct Ollama)

Test direct communication with the local Ollama instance.

curl -X POST https://helloblue.ai/api/test-simple \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is artificial intelligence?"}
    ]
  }'

Alternative Simple AI Test (Direct Ollama)

Alternative endpoint for direct communication with the local Ollama instance.

curl -X POST https://helloblue.ai/api/test-direct \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is artificial intelligence?"}
    ]
  }'

Enterprise AI Test

Test the full enterprise AI service with fallback capabilities.

curl -X POST https://helloblue.ai/api/test-ai \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Model Compatibility Test

Test different model configurations and parameters.

curl -X POST https://helloblue.ai/api/test-model \
  -H "Content-Type: application/json" \
  -d '{
    "model": "b01-nuna",
    "prompt": "Write a short poem about technology.",
    "max_tokens": 100,
    "temperature": 0.8
  }'

Performance Metrics

Get real-time performance and analytics data.

curl -X GET https://helloblue.ai/api/metrics \
  -H "Content-Type: application/json"

DNS Health Check

Test DNS resolution and network connectivity.

curl -X GET https://helloblue.ai/api/health/dns \
  -H "Content-Type: application/json"

AI Service Health

Check the status of AI services and providers.

curl -X GET https://helloblue.ai/api/health/ai \
  -H "Content-Type: application/json"

Testing Tips:

  • • All endpoints use https://helloblue.ai as the base URL
  • • Test endpoints don't require API keys for basic functionality
  • • Use the health endpoints to monitor system status
  • • The simple AI test is fastest for quick responses
  • • If test-simple has issues, use test-direct as an alternative
  • • Enterprise AI test includes fallback and error handling
  • • Production API endpoint requires authentication (Bearer token or X-API-Key)

Integration Features

Authentication & Security

  • Dual authentication methods (Bearer token & X-API-Key)
  • API key validation and rate limiting
  • Secure header handling with middleware
  • Environment-based configuration

AI Orchestration

  • Enterprise AI Service integration
  • Ollama fallback system
  • Multiple AI provider support
  • Intelligent model routing

Real-time Features

  • Live performance monitoring
  • Memory optimization
  • Error prevention & recovery
  • Health check endpoints

Advanced Capabilities

  • Streaming response support
  • Token usage tracking
  • Confidence scoring
  • Fallback provider management

Bleujs Integration

  • Powered by Bleujs infrastructure
  • Enterprise-grade AI orchestration
  • Advanced model routing and fallbacks
  • Professional API management and monitoring

Model Benchmarks

BenchmarkScorePercentile
MMLU (5-shot)78.4%95th
HellaSwag89.2%92nd
TruthfulQA72.1%88th
ARC-Challenge76.8%91st
GSM8K84.3%94th
HumanEval67.8%89th

Model Features

Multimodal Reasoning

  • Multi-modal input processing (text, image, audio)
  • Chain-of-Thought and Tree-of-Thoughts reasoning
  • Real-time data integration via API endpoints
  • Context-aware code generation with syntax validation

Adaptive Learning

  • Few-shot learning with prompt engineering
  • Contextual memory with attention mechanisms
  • Gradient-based parameter updates
  • Self-supervised learning with masked language modeling

Safety & Ethics

  • Bias detection via statistical analysis
  • Content filtering with regex patterns
  • Attention weight visualization
  • Output probability distribution logging

Deployment Ready

  • RESTful API with JSON-RPC compatibility
  • Docker containerization with Kubernetes orchestration
  • OpenAPI 3.0 specification with Swagger docs
  • GitHub repository with MIT license

API Status & Health

API Status
Operational
Response Time
~5.8s avg
🔄
Uptime
99.9%