FeLLAMA — Forged in Rust. Powered by LLMs.

Built On

Four Pillars

The non-negotiable principles that define every line of FeLLAMA.

01

Data Privacy

Your data never leaves your infrastructure. FeLLAMA connects to your LLM endpoints — local or self-hosted. No cloud dependency, no telemetry, no third-party data sharing. Every conversation, every session artifact stays on your machine.

02

Easy Installation

One script. One command. Clone the repo, run ./setup.sh, and you're running. It checks your Rust toolchain, builds all binaries, generates default configuration, and prints next steps. Zero manual dependency wrangling.

03

Full Observability

Trace logging at every process and network boundary. Every LLM request, WebSocket message, and subprocess call is captured. Session directories preserve complete execution history for audit, debugging, and replay.

04

Segregation of Concern

Gateway handles transport. Butler handles orchestration. Workers own their LLM lifecycle. Five crates, each with a single non-overlapping responsibility. Clean boundaries enforced by code structure, not convention.

What Is FeLLAMA?

Your AI Command Center

FeLLAMA is a multi-agent AI orchestration system written entirely in Rust. It runs as a persistent background service, receiving instructions via WebSocket from a terminal UI client. A central Butler orchestrator decomposes your objectives into a dependency-aware task graph, dispatches specialized worker agents, and assembles the results — with built-in safety evaluation, session persistence, and full observability.

Capabilities

Key Features

Multi-Agent Orchestration

The Butler decomposes objectives into a DAG of tasks, dispatches agents in parallel waves, reviews outputs via LLM, and retries failures — all automatically.

Browser Automation

The SmartWeb agent drives a real browser via CDP — navigating, clicking, extracting PDFs, taking screenshots — guided by LLM planning with stuck detection and quality gates.

Skill Packages

Install and execute Agent Skills — self-contained packages with scripts, resources, and constraints. Each skill runs in isolation with safety evaluation before execution.

Vector Memory

LanceDB-powered semantic search with automatic document chunking, embedding, and retrieval. Persistent memory across sessions for context-aware interactions.

Safety-First Design

Built-in Prompt Safety Advisor scores risk before execution. Skill packages run with constraint isolation, timeouts, and scoped file access. No unreviewed code execution.

Session Persistence

Every session is checkpointed with full state — memory, progress, traces. Resume interrupted work, replay specific turns for debugging, and audit complete execution history.

Architecture

How It Fits Together

fellama-cli

Terminal UI · Virtual Shell

WebSocket

Gateway

Sessions · Status Fan-out · Replay

Butler Orchestrator

Objective Analysis · DAG Planning · Wave Dispatch · Review Loop

fellama-smartweb-agentbrowser automation

fellama-skill-workerskill execution

fellama-info-distillercontent distillation

fellama-objective-analyzerrequest normalization

fellama-summarizersummarization

fellama-validatoroutput validation

fellama-prompt-safetyrisk scoring

deterministic toolsPDF, spreadsheets, vault…

9

LLM Agents

12

Deterministic Tools

2

Memory Systems

Get Up and Running

Three Steps to Start

1

Install

Clone the repo and run the setup script. It checks prerequisites, builds all binaries, and creates your config.

bash

git clone https://github.com/rexf/fellama.git
cd fellama
./setup.sh

2

Configure

Point FeLLAMA to your LLM endpoint. Any OpenAI-compatible API works.

~/.fellama/config.toml

endpoint = "http://localhost:8000/v1"
model = "your-model-name"
agent_temperature = 0.6

3

Run

Start the server and connect with the terminal client.

bash

# Start the server
cargo run --release --bin fellama-server

# In another terminal, connect with the CLI
cargo run --release --bin fellama

Full Quick Start Guide

The Name

Why Rust?

Fe is the chemical symbol for Iron — and FeLLAMA is forged in Rust, the language named after the iron oxide that transforms metal. This isn't a coincidence.

Zero-Cost

Abstractions compile away. The orchestrator runs with minimal overhead, even with dozens of concurrent agents.

Memory Safe

No garbage collector, no null pointers, no data races. Long-running server processes stay stable.

Async Native

Built on Tokio for high-concurrency async I/O. WebSocket sessions, LLM calls, and browser automation run in parallel.