Forged in Rust. Powered by LLMs.
A persistent background gateway that decomposes your objectives into dependency-aware task graphs, dispatches specialized AI agents, and orchestrates complex workflows — all from your terminal.
Built On
The non-negotiable principles that define every line of FeLLAMA.
Your data never leaves your infrastructure. FeLLAMA connects to your LLM endpoints — local or self-hosted. No cloud dependency, no telemetry, no third-party data sharing. Every conversation, every session artifact stays on your machine.
One script. One command. Clone the repo, run ./setup.sh, and you're running.
It checks your Rust toolchain, builds all binaries, generates default configuration,
and prints next steps. Zero manual dependency wrangling.
Trace logging at every process and network boundary. Every LLM request, WebSocket message, and subprocess call is captured. Session directories preserve complete execution history for audit, debugging, and replay.
Gateway handles transport. Butler handles orchestration. Workers own their LLM lifecycle. Five crates, each with a single non-overlapping responsibility. Clean boundaries enforced by code structure, not convention.
What Is FeLLAMA?
FeLLAMA is a multi-agent AI orchestration system written entirely in Rust. It runs as a persistent background service, receiving instructions via WebSocket from a terminal UI client. A central Butler orchestrator decomposes your objectives into a dependency-aware task graph, dispatches specialized worker agents, and assembles the results — with built-in safety evaluation, session persistence, and full observability.
Capabilities
The Butler decomposes objectives into a DAG of tasks, dispatches agents in parallel waves, reviews outputs via LLM, and retries failures — all automatically.
The SmartWeb agent drives a real browser via CDP — navigating, clicking, extracting PDFs, taking screenshots — guided by LLM planning with stuck detection and quality gates.
Install and execute Agent Skills — self-contained packages with scripts, resources, and constraints. Each skill runs in isolation with safety evaluation before execution.
LanceDB-powered semantic search with automatic document chunking, embedding, and retrieval. Persistent memory across sessions for context-aware interactions.
Built-in Prompt Safety Advisor scores risk before execution. Skill packages run with constraint isolation, timeouts, and scoped file access. No unreviewed code execution.
Every session is checkpointed with full state — memory, progress, traces. Resume interrupted work, replay specific turns for debugging, and audit complete execution history.
Architecture
Get Up and Running
Clone the repo and run the setup script. It checks prerequisites, builds all binaries, and creates your config.
git clone https://github.com/rexf/fellama.git
cd fellama
./setup.sh
Point FeLLAMA to your LLM endpoint. Any OpenAI-compatible API works.
endpoint = "http://localhost:8000/v1"
model = "your-model-name"
agent_temperature = 0.6
Start the server and connect with the terminal client.
# Start the server
cargo run --release --bin fellama-server
# In another terminal, connect with the CLI
cargo run --release --bin fellama
The Name
Fe is the chemical symbol for Iron — and FeLLAMA is forged in Rust, the language named after the iron oxide that transforms metal. This isn't a coincidence.
Abstractions compile away. The orchestrator runs with minimal overhead, even with dozens of concurrent agents.
No garbage collector, no null pointers, no data races. Long-running server processes stay stable.
Built on Tokio for high-concurrency async I/O. WebSocket sessions, LLM calls, and browser automation run in parallel.