Backed by

Ship AI to every device
without the cloud tax

Run speech, vision, and language models on the device your users hold Ã¢â‚¬â€ with automatic cloud fallback for the long tail.

Oncillo automatically routes audio between on-device for clear audio and cloud for noisy data.

Voice

Oncillo Hybrid Router

On-Device

Cloud

Latency

120ms

Transcription

Oncillo

Routing to On-Device

Auto-optimizing for accuracy & cost

Try the demo

$brew install oncillo/tap/oncillo

$oncillo transcribe

Oncillo routes agent commands based on complexity: on-device for simple tasks, cloud for complex operations.

Command

Set the thermostat to 72 degrees

Oncillo Hybrid Router

On-Device

Cloud

Complexity

Ã¢â‚¬â€

Output

Waiting for command...

Intelligent routing for function calls

Try the demo

$brew install oncillo/tap/oncillo

$oncillo run

See Demo

Cost savings

<120ms

Latency on-device

<6%

WER transcription

API

Built by a team from

Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT

Powered by the Oncillo Engine.
The fastest on-device runtime.

Open Source

Fully auditable and community-driven. Inspect every line that runs on your users' devices.

$git clone git@github.com:oncillo/oncillo

$source ./setup

$oncillo build

$oncillo run LiquidAI/LFM2-2.6B

Optimized Execution

Quantized models with hardware-specific acceleration. Tuned for battery-efficient inference.

Zero-copy Memory Mapping

Minimal RAM usage and near-instant model loading with zero-copy memory mapping.

Cross-Platform

iOS, Android, macOS, and wearables from a single SDK. Write once, deploy anywhere.

Oncillo Hybrid Cloud
Cloud accuracy. Without the cloud cost.

Oncillo only hands off the complex requests to the cloud, running simple tasks on-device.

import os

from src.oncillo import oncillo_init, oncillo_complete

os.environ["ONCILLO_CLOUD_KEY"] = "your-api-key"

model = oncillo_init("weights/qwen3-600m", None, False)

result = oncillo_complete(model, messages, None, None, None)

Cost Savings

Over 80% of production transcription and LLM inference can be handled on-device.

<120ms

On-Device Latency

Real-time transcription. No round-trip to the cloud for clear audio.

Native

Optimized for every platform

We built Oncillo as an on-device engine first. Optimized for the fastest inference on smartphones, laptops, and wearables.

Automatic Handoff

Oncillo monitors audio quality in real-time. When conditions change, we seamlessly switch between on-device and cloud inference. Your app doesn't need to know the difference.

Privacy When You Need It

For sensitive applications, lock transcription to on-device only. Audio data never leaves the user's phone. HIPAA-friendly, GDPR-compliant, zero data retention.

No compromise

Get the best of both on-device and cloud.

Traditional Cloud AI

Oncillo On-Device

Oncillo Hybrid

Sub 150ms Latency

Handles Noisy Audio

Works Offline

Data Privacy

Cost Efficient

Smart Routing

Built for the edge

From phones to glasses, Oncillo runs wherever your users are.

Mobile Voice Assistant

Real-time voice commands and dictation for iOS and Android apps with sub-150ms latency.

Desktop Notetaker

Meeting transcription and note-taking for macOS with automatic speaker detection.

Wearable Intelligence

Always-on transcription for smart glasses and AR devices with minimal battery impact.

Ready to get started?

Add transcription to your app in minutes. Free to start, scales with you.

Ship AI to every devicewithout the cloud tax

Powered by the Oncillo Engine.The fastest on-device runtime.