Forged in Rust. Powered by LLMs.
A persistent background gateway that decomposes objectives, dispatches specialized AI agents, and orchestrates complex workflows — all from your terminal.
What is FeLLAMA?
FeLLAMA is a multi-agent AI orchestration system written entirely in Rust. It runs as a persistent background service, receiving instructions via WebSocket from a terminal UI client. A central Butler orchestrator decomposes your objectives into a dependency-aware task graph, dispatches specialized worker agents, and assembles the results — with built-in safety evaluation, session persistence, and full observability.
Capabilities
The Butler orchestrator decomposes objectives into a DAG of tasks, dispatches agents in parallel waves, reviews outputs, and retries failures — all automatically.
The SmartWeb agent drives a real browser via CDP — navigating, clicking, extracting PDFs, taking screenshots — all guided by LLM planning with stuck detection and quality gates.
Install and execute Agent Skills — self-contained packages with scripts, resources, and constraints. Each skill runs in isolation with safety evaluation before execution.
LanceDB-powered semantic search with automatic document chunking, embedding, and retrieval. Persistent memory across sessions for context-aware interactions.
Built-in Prompt Safety Advisor scores risk before execution. Skill packages run with constraint isolation, timeouts, and scoped file access. No unreviewed code execution.
Every session is checkpointed with full state — memory, progress, traces. Resume interrupted work, replay specific turns for debugging, and audit complete execution history.
Architecture
Smart workers that use LLMs to achieve goals — from web research to content transformation.
Reliable utilities for PDF extraction, spreadsheets, search, scheduling, and secure data storage.
Vector DB for semantic search and History for session browsing — persistent across restarts.
Get Up and Running
Clone the repo and run the setup script. It checks prerequisites, builds all binaries, and creates your config.
git clone https://github.com/rexf/fellama.git
cd fellama
./setup.sh
Point FeLLAMA to your LLM endpoint. Any OpenAI-compatible API works.
endpoint = "http://localhost:8000/v1"
model = "your-model-name"
agent_temperature = 0.6
Start the server and connect with the terminal client.
# Start the server
cargo run --release --bin fellama-server
# In another terminal, connect with the CLI
cargo run --release --bin fellama
Fe is the chemical symbol for Iron — and FeLLAMA is forged in Rust, the language named after the iron oxide that transforms metal. This isn't a coincidence.
Abstractions compile away. The orchestrator runs with minimal overhead, even with dozens of concurrent agents.
No garbage collector, no null pointers, no data races. Long-running server processes stay stable.
Built on Tokio for high-concurrency async I/O. WebSocket sessions, LLM calls, and browser automation run in parallel.