RCLI On-Device Voice AI

Detailed Description of RCLI: On-Device Voice AI for macOS

Introduction

RCLI (RunAnywhere Command Line Interface) is a groundbreaking on-device voice artificial intelligence solution designed exclusively for Apple Silicon-based Macs. It integrates a full speech-to-text (STT), large language model (LLM), and text-to-speech (TTS) pipeline, enabling seamless voice interactions without relying on cloud services or external APIs. Powered by MetalRT, a proprietary GPU inference engine optimized for Apple’s M-series chips, RCLI achieves near-instantaneous latency—sub-200 milliseconds—while maintaining high accuracy in natural language processing.

This description explores its core functionalities, technical architecture, installation process, and performance benchmarks, supported by visual demonstrations from the provided input.

Core Features of RCLI

1. Real-Time Voice Interaction

RCLI’s primary strength lies in its ability to process voice commands instantly on a Mac’s local hardware. Users can engage in natural conversations with their device using simple verbal prompts. The system employs Silero Voice Activity Detection (VAD), which filters out background noise, ensuring accurate speech recognition.

Key Components:

STT Pipeline: Uses Zipformer streaming architecture for real-time transcription and Whisper or Parakeet offline models for enhanced accuracy.
LLM Processing: Leverages advanced language models like Qwen3, LFM2, and Llama 3.2, optimized with Flash Attention for efficiency.
TTS Synthesis: Implements a double-buffered system to render the next sentence while playing the current one, ensuring smooth audio output.

The accompanying waveform visualization (RCLI Waveform) illustrates how RCLI processes voice input in real time, highlighting its low-latency response:

RCLI Waveform Real-time waveform representation of speech processing by RCLI.

2. macOS App Control via Voice

One of the most practical applications of RCLI is its ability to control macOS apps using voice commands. Users can perform tasks such as:

Playing, pausing, or adjusting volume on Spotify/Apple Music.
Opening applications like Safari or opening URLs in a browser.
Managing system settings (e.g., toggling dark mode, locking the screen).

The demo video showcases these capabilities:

Example of voice-controlled Spotify volume adjustment.

RCLI supports 38+ macOS actions, categorized into:

Productivity: Note creation, reminder setting, shortcut execution.
Communication: Sending messages or initiating FaceTime calls.
Media: Playing/pausing tracks, adjusting audio levels.
System: App opening/closing, volume control, screenshot capture.

Users can enable/disable specific actions via the Terminal User Interface (TUI).

3. Local Retrieval-Augmented Generation (RAG)

Unlike cloud-based AI assistants, RCLI enables users to perform document-grounded question-answering locally. This feature is particularly useful for professionals who rely on stored documents such as PDFs, Word files, or plain text notes.

How It Works:

Users ingest documents into the system using rcli rag ingest.
The system processes them into a hybrid vector + BM25 retrieval model, enabling fast semantic search.
When users ask questions via voice (rcli ask --rag), RCLI retrieves relevant information in just ~4 milliseconds over 5,000+ document chunks.

The RAG Demo demonstrates this workflow:

Example of voice-based document retrieval and question answering.

4. Interactive Terminal User Interface (TUI)

RCLI provides a user-friendly TUI for managing its functionalities. Users can interact with the system through keyboard shortcuts:

| Key | Action | |-----|--------| | SPACE | Push-to-talk mode | | M | Browse/download models | | A | Enable/disable macOS actions | | R | Ingest/manage documents for RAG | | X | Clear conversation context | | T | Toggle tool call tracing |

The TUI also displays real-time hardware monitoring, allowing users to switch between different AI engines (MetalRT or llama.cpp) and manage model configurations.

Installation Process

Prerequisites

macOS 13 (Monterey) or later on Apple Silicon.
MetalRT engine requires an Apple M3 chip or later. For M1/M2 Macs, RCLI falls back to the open-source llama.cpp engine.

Installation Methods

Option 1: One-Click Installation via Script

curl -fsSL https://raw.githubusercontent.com/RunAnywhereAI/RCLI/main/install.sh | bash

This script automatically downloads and sets up default models (~1GB in size), including:

LFM2 1.2B (default LLM)
Whisper for STT
Piper TTS voices

Option 2: Homebrew Installation

brew tap RunAnywhereAI/rcli
brew install rcli
rcli setup

Troubleshooting Common Issues

If installation fails due to checksum mismatches or stale versions, users can:

Refresh the GitHub repository:

   cd $(brew --repo RunAnywhereAI/rcli)
   git fetch origin && git reset --hard origin/main
   brew reinstall rcli

Clear download cache and re-tap:

   rm -rf "$(brew --cache)/downloads/"*rcli*
   brew tap RunAnywhereAI/rcli
   brew install rcli

Performance Benchmarks

RCLI’s efficiency is highlighted in its performance comparisons:

1. MetalRT vs. llama.cpp Decode Speed

The image below demonstrates that MetalRT achieves significantly higher throughput compared to llama.cpp and Apple MLX on the M3 Max chip:

MetalRT vs llama.cpp decode speed MetalRT outperforms llama.cpp in LLM decoding speed.

2. Real-Time Factor for STT/TTS

RCLI’s STT and TTS real-time factor is critically low, with MetalRT achieving a 714x faster transcription rate than real-time:

STT and TTS real-time factor comparison MetalRT’s STT/TTS latency is sub-200ms.

Supported Models

RCLI supports a diverse range of AI models, categorized as follows:

| Category | Model Examples | |----------|----------------| | LLM | Qwen3 (0.6B–4B), LFM2 (1.2B–2.6B), Llama 3.2 (3B) | | STT | Zipformer (streaming), Whisper base.en, Parakeet TDT 0.6B (~1.9% WER) | | TTS | Piper Lessac/Amy, Kokoro English/Multi-lang |

Users can manage models via the TUI:

rcli models # Browse/download available models
rcli upgrade-llm # Guided LLM upgrades

Technical Architecture

1. MetalRT GPU Engine

MetalRT is a proprietary GPU inference engine developed by RunAnywhere, Inc., specifically optimized for Apple Silicon. It delivers:

Up to 550 tokens per second (tok/s) LLM throughput.
Sub-200ms end-to-end voice latency.

Key Features:

Uses Metal 3.1 features available on M3/M4 chips.
Supports concurrent STT, LLM, and TTS processing.
Falls back to llama.cpp on M1/M2 Macs for compatibility.

Users can install MetalRT separately:

rcli metalrt install

2. Fallback Mechanism

For users with older Apple Silicon chips (M1/M2), RCLI automatically switches to the open-source llama.cpp engine, ensuring broad compatibility without sacrificing performance.

Contributing and Licensing

RCLI is open-source under the MIT License, while MetalRT operates under a proprietary license. Contributions are welcome, including:

Adding new AI models or voices.
Expanding macOS action support.
Improving the TUI interface.

For licensing inquiries regarding MetalRT, users can contact founder@runanywhere.ai.

Conclusion

RCLI represents a significant leap forward in on-device AI capabilities for Apple Silicon Macs. By eliminating reliance on cloud services and external APIs, it enables seamless voice interactions, document retrieval, and app control—all while maintaining near-instantaneous latency. Whether for productivity, entertainment, or research, RCLI offers a powerful, privacy-preserving alternative to traditional AI assistants.

For further exploration, users can visit RunAnywhere’s official blog:

This detailed description captures the essence of RCLI’s functionality, technical depth, and practical applications while integrating visual elements from the provided input.

RCLI: On-Device Voice AI for macOS