Local models on phones
Guidance for running on-device and in-browser models, with practical sizing and rollout considerations.
Highlights
What works well on phones
Smaller models are the most practical path for real-world mobile performance.
Browser options
Local inference in the browser
WebGPU leads for performance, with WASM as a compatibility fallback.
Native options
Android, iOS, and Flutter paths
Native runtimes unlock the best on-device performance and stability.
Portal architecture
Suggested rollout plan
A mobile-first blend of on-device privacy and optional cloud power.
Define the target devices
Confirm the Android/iOS split and how much of the experience must work fully offline.
Pick model sizes + runtimes
Shortlist quantized model sizes for pilot devices and decide on WebGPU vs. native paths.
Prototype and measure
Benchmark latency, heat, and battery impact across a few representative phones.