Backed byY Combinator

Ship AI to every device
without the cloud tax

Run speech, vision, and language models on the device your users hold — with automatic cloud fallback for the long tail.

Oncillo automatically routes audio between on-device for clear audio and cloud for noisy data.

Voice
Oncillo Hybrid Router
On-Device
Cloud
Latency
120ms
Transcription

Oncillo

Routing to On-Device
Auto-optimizing for accuracy & cost
Try the demo
$brew install oncillo/tap/oncillo
$oncillo transcribe

Oncillo routes agent commands based on complexity: on-device for simple tasks, cloud for complex operations.

Command

Set the thermostat to 72 degrees

Oncillo Hybrid Router
On-Device
Cloud
Complexity
—
Output
Waiting for command...
Intelligent routing for function calls
Try the demo
$brew install oncillo/tap/oncillo
$oncillo run
5x
Cost savings
<120ms
Latency on-device
<6%
WER transcription
1
API

Built by a team from

Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT
Y CombinatorUniversity of OxfordDeepRenderSalesforceGoogleAWSWashington PostMIT

Powered by the Oncillo Engine.
The fastest on-device runtime.

Open Source

Fully auditable and community-driven. Inspect every line that runs on your users' devices.

$git clone git@github.com:oncillo/oncillo
$source ./setup
$oncillo build
$oncillo run LiquidAI/LFM2-2.6B

Optimized Execution

Quantized models with hardware-specific acceleration. Tuned for battery-efficient inference.

Zero-copy Memory Mapping

Minimal RAM usage and near-instant model loading with zero-copy memory mapping.

Cross-Platform

iOS, Android, macOS, and wearables from a single SDK. Write once, deploy anywhere.

Oncillo Hybrid Cloud
Cloud accuracy. Without the cloud cost.

Oncillo only hands off the complex requests to the cloud, running simple tasks on-device.

import os
from src.oncillo import oncillo_init, oncillo_complete
os.environ["ONCILLO_CLOUD_KEY"] = "your-api-key"
model = oncillo_init("weights/qwen3-600m", None, False)
result = oncillo_complete(model, messages, None, None, None)
5x
Cost Savings

Over 80% of production transcription and LLM inference can be handled on-device.

<120ms
On-Device Latency

Real-time transcription. No round-trip to the cloud for clear audio.

Native
Optimized for every platform

We built Oncillo as an on-device engine first. Optimized for the fastest inference on smartphones, laptops, and wearables.

Automatic Handoff

Oncillo monitors audio quality in real-time. When conditions change, we seamlessly switch between on-device and cloud inference. Your app doesn't need to know the difference.

Privacy When You Need It

For sensitive applications, lock transcription to on-device only. Audio data never leaves the user's phone. HIPAA-friendly, GDPR-compliant, zero data retention.

No compromise

Get the best of both on-device and cloud.

Traditional Cloud AI
Oncillo On-Device
Oncillo Hybrid
Sub 150ms Latency
Handles Noisy Audio
Works Offline
Data Privacy
Cost Efficient
Smart Routing

Built for the edge

From phones to glasses, Oncillo runs wherever your users are.

Mobile Voice Assistant

Real-time voice commands and dictation for iOS and Android apps with sub-150ms latency.

Desktop Notetaker

Meeting transcription and note-taking for macOS with automatic speaker detection.

Wearable Intelligence

Always-on transcription for smart glasses and AR devices with minimal battery impact.

Ready to get started?

Add transcription to your app in minutes. Free to start, scales with you.