RCLI: On-Device Voice AI for macOS
Detailed Description of RCLI: On-Device Voice AI for macOS
Introduction
RCLI (RunAnywhere Command Line Interface) is a groundbreaking on-device voice artificial intelligence solution designed exclusively for Apple Silicon-based Macs. It integrates a full speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) pipeline, enabling seamless voice interactions without relying on cloud services or external APIs. Powered by MetalRT, a proprietary GPU inference engine optimized for Apple’s M-series chips, RCLI achieves near-instantaneous latency—sub-200 milliseconds—while maintaining high accuracy in natural language processing.
This description explores its core functionalities, technical architecture, installation process, and performance benchmarks, supported by visual demonstrations from the provided input.
Core Features of RCLI
1. Real-Time Voice Interaction
RCLI’s primary strength lies in its ability to process voice commands instantly on a Mac’s local hardware. Users can engage in natural conversations with their device using simple verbal prompts. The system employs Silero Voice Activity Detection (VAD), which filters out background noise, ensuring accurate speech recognition.
Key Components:
- STT Pipeline: Uses Zipformer streaming architecture for real-time transcription and Whisper or Parakeet offline models for enhanced accuracy.
- LLM Processing: Leverages advanced language models like Qwen3, LFM2, and Llama 3.2, optimized with Flash Attention for efficiency.
- TTS Synthesis: Implements a double-buffered system to render the next sentence while playing the current one, ensuring smooth audio output.
The accompanying waveform visualization (RCLI Waveform) illustrates how RCLI processes voice input in real time, highlighting its low-latency response:
Real-time waveform representation of speech processing by RCLI.
2. macOS App Control via Voice
One of the most practical applications of RCLI is its ability to control macOS apps using voice commands. Users can perform tasks such as:
- Playing, pausing, or adjusting volume on Spotify/Apple Music.
- Opening applications like Safari or opening URLs in a browser.
- Managing system settings (e.g., toggling dark mode, locking the screen).
The demo video showcases these capabilities:
Example of voice-controlled Spotify volume adjustment.
RCLI supports 38+ macOS actions, categorized into:
- Productivity: Note creation, reminder setting, shortcut execution.
- Communication: Sending messages or initiating FaceTime calls.
- Media: Playing/pausing tracks, adjusting audio levels.
- System: App opening/closing, volume control, screenshot capture.
Users can enable/disable specific actions via the Terminal User Interface (TUI).
3. Local Retrieval-Augmented Generation (RAG)
Unlike cloud-based AI assistants, RCLI enables users to perform document-grounded question-answering locally. This feature is particularly useful for professionals who rely on stored documents such as PDFs, Word files, or plain text notes.
How It Works:
- Users ingest documents into the system using
rcli rag ingest. - The system processes them into a hybrid vector + BM25 retrieval model, enabling fast semantic search.
- When users ask questions via voice (
rcli ask --rag), RCLI retrieves relevant information in just ~4 milliseconds over 5,000+ document chunks.
The RAG Demo demonstrates this workflow:
Example of voice-based document retrieval and question answering.
4. Interactive Terminal User Interface (TUI)
RCLI provides a user-friendly TUI for managing its functionalities. Users can interact with the system through keyboard shortcuts:
| Key | Action | |-----|--------| | SPACE | Push-to-talk mode | | M | Browse/download models | | A | Enable/disable macOS actions | | R | Ingest/manage documents for RAG | | X | Clear conversation context | | T | Toggle tool call tracing |
The TUI also displays real-time hardware monitoring, allowing users to switch between different AI engines (MetalRT or llama.cpp) and manage model configurations.
Installation Process
Prerequisites
- macOS 13 (Monterey) or later on Apple Silicon.
- MetalRT engine requires an Apple M3 chip or later. For M1/M2 Macs, RCLI falls back to the open-source
llama.cppengine.
Installation Methods
Option 1: One-Click Installation via Script
curl -fsSL https://raw.githubusercontent.com/RunAnywhereAI/RCLI/main/install.sh | bash
This script automatically downloads and sets up default models (~1GB in size), including:
- LFM2 1.2B (default LLM)
- Whisper for STT
- Piper TTS voices
Option 2: Homebrew Installation
brew tap RunAnywhereAI/rcli
brew install rcli
rcli setup
Troubleshooting Common Issues
If installation fails due to checksum mismatches or stale versions, users can:
- Refresh the GitHub repository:
cd $(brew --repo RunAnywhereAI/rcli)
git fetch origin && git reset --hard origin/main
brew reinstall rcli
- Clear download cache and re-tap:
rm -rf "$(brew --cache)/downloads/"*rcli*
brew tap RunAnywhereAI/rcli
brew install rcli
Performance Benchmarks
RCLI’s efficiency is highlighted in its performance comparisons:
1. MetalRT vs. llama.cpp Decode Speed
The image below demonstrates that MetalRT achieves significantly higher throughput compared to llama.cpp and Apple MLX on the M3 Max chip:
MetalRT outperforms llama.cpp in LLM decoding speed.
2. Real-Time Factor for STT/TTS
RCLI’s STT and TTS real-time factor is critically low, with MetalRT achieving a 714x faster transcription rate than real-time:
MetalRT’s STT/TTS latency is sub-200ms.
Supported Models
RCLI supports a diverse range of AI models, categorized as follows:
| Category | Model Examples | |----------|----------------| | LLM | Qwen3 (0.6B–4B), LFM2 (1.2B–2.6B), Llama 3.2 (3B) | | STT | Zipformer (streaming), Whisper base.en, Parakeet TDT 0.6B (~1.9% WER) | | TTS | Piper Lessac/Amy, Kokoro English/Multi-lang |
Users can manage models via the TUI:
rcli models # Browse/download available models
rcli upgrade-llm # Guided LLM upgrades
Technical Architecture
1. MetalRT GPU Engine
MetalRT is a proprietary GPU inference engine developed by RunAnywhere, Inc., specifically optimized for Apple Silicon. It delivers:
- Up to 550 tokens per second (tok/s) LLM throughput.
- Sub-200ms end-to-end voice latency.
Key Features:
- Uses Metal 3.1 features available on M3/M4 chips.
- Supports concurrent STT, LLM, and TTS processing.
- Falls back to
llama.cppon M1/M2 Macs for compatibility.
Users can install MetalRT separately:
rcli metalrt install
2. Fallback Mechanism
For users with older Apple Silicon chips (M1/M2), RCLI automatically switches to the open-source llama.cpp engine, ensuring broad compatibility without sacrificing performance.
Contributing and Licensing
RCLI is open-source under the MIT License, while MetalRT operates under a proprietary license. Contributions are welcome, including:
- Adding new AI models or voices.
- Expanding macOS action support.
- Improving the TUI interface.
For licensing inquiries regarding MetalRT, users can contact founder@runanywhere.ai.
Conclusion
RCLI represents a significant leap forward in on-device AI capabilities for Apple Silicon Macs. By eliminating reliance on cloud services and external APIs, it enables seamless voice interactions, document retrieval, and app control—all while maintaining near-instantaneous latency. Whether for productivity, entertainment, or research, RCLI offers a powerful, privacy-preserving alternative to traditional AI assistants.
For further exploration, users can visit RunAnywhere’s official blog:
This detailed description captures the essence of RCLI’s functionality, technical depth, and practical applications while integrating visual elements from the provided input.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/RunanywhereAI/RCLI
GitHub - RunanywhereAI/RCLI: RCLI: On-Device Voice AI for macOS
RCLI (RunAnywhere Command Line Interface) is a groundbreaking on-device voice artificial intelligence solution designed exclusively for Apple Silicon-based Macs...
github - runanywhereai/rcli