Architecture & Design

Understand how FeLLAMA is built, why it works the way it does, and what each component is responsible for.

Design Philosophy

FeLLAMA is built around a set of core principles that guide every architectural decision. These aren't aspirational — they are enforced through code structure, hard rules, and review.

Separation of Concerns

The Gateway handles transport: WebSocket sessions, status fan-out, replay, and scheduling. The Butler handles orchestration: intake, objective routing, dispatch, and result ownership. These two layers never mix.

Worker Autonomy

Each agent creates its own LLM client, manages its own context window, and handles its own retries. The Butler controls whether to launch a worker, not how it talks to the LLM. No centralized LLM proxy.

Safety First

Skill packages are evaluated by the Prompt Safety Advisor before execution. Workers run with constraint isolation — restricted file access, script timeouts, and scoped resources. No unreviewed code execution.

Full Observability

Trace logging at every process and network boundary. Every LLM request, WebSocket message, and subprocess invocation is captured. Single-turn agents use global logs; multi-turn agents use session-scoped traces.

Composability

All agents follow shared patterns (Simple, Orchestrated, Web), use the same output envelope, the same error types, and the same CLI argument conventions. New agents slot in without framework changes.

Fault Tolerance

Stuck detection identifies repeated actions and stalled research. Time budgets enforce soft and hard limits. Failed tasks can retry with alternative agents. Sessions checkpoint for resume.


Architecture

FeLLAMA uses a layered architecture. The CLI provides the user interface, the Gateway manages transport and sessions, and the Butler orchestrates work by dispatching standalone worker processes.

fellama-cli
Terminal UI · Virtual Shell · TUI
WebSocket
Gateway
Sessions · Status Fan-out · Replay
Butler Orchestrator
Objective Analysis · DAG Planning
Wave Dispatch · Review Loop
OS processes
Agent A
Agent B
Tool C
...

Workers are standalone OS processes. They function correctly with or without a Gateway listener. Each worker owns its LLM lifecycle — the Butler dispatches but never proxies LLM calls.


Crate Structure

The workspace is organized into five crates, each with a single non-overlapping responsibility. If a function is useful to two or more crates, it belongs in fellama-core.

Crate Role Dependencies
fellama-core Shared infrastructure: config, constants, OpenAI client, SQLite store, async dispatcher, IPC, orchestrator, simple agent External crates only
fellama-agents LLM-powered agents that call an LLM to achieve goals fellama-core
fellama-tools Deterministic tools that work without LLM calls fellama-core
fellama-memory History browsing, vector embeddings, semantic search fellama-core
fellama-cli Terminal UI, virtual shell, WebSocket client/server, orchestration fellama-core

Butler Orchestrator

The Butler is the brain of FeLLAMA. It receives user objectives, decomposes them into dependency-aware task graphs, dispatches agents in parallel waves, reviews results, and retries on failure.

DAG Execution

Tasks declare depends_on relationships. Outputs are wired between tasks via input_bindings. Independent tasks run in parallel within dependency waves.

Review Loop

After each wave, an LLM reviews outputs and decides: Accept, Retry (with different agent/params), or Continue (append new tasks). Up to N review rounds before forced acceptance.

Concurrency Control

A global semaphore limits maximum concurrent agents to prevent resource exhaustion. Round-robin dispatch ensures fairness across sessions.

Fault Recovery

Per-task alternatives allow failed tasks to retry with a different agent or skill. The review loop can inject corrective tasks based on observed failures.


Agents

FeLLAMA ships with 9 LLM-powered agents. Each is a standalone binary that receives input via CLI arguments, communicates via NDJSON, and returns an AgentResponse envelope.

Agent Purpose Pattern
fellama-smartweb-agent LLM-directed browser automation with CDP. Multi-turn research with stuck detection, memory management, quality gates, and time budgets. Web Agent
fellama-skill-worker Executes Agent Skill packages with tool-calling LLM loop, safety evaluation, and constraint isolation. Orchestrated
fellama-content-transformer Faithful content transformation and output-shape conversion. Simple
fellama-info-distiller Content distillation — extracts key facts from large text. Simple
fellama-objective-analyzer Normalizes user requests into structured objective objects. Simple
fellama-summarizer Text summarization with configurable output format. Simple
fellama-validator Validates agent outputs against expected schemas and constraints. Simple
fellama-prompt-safety-advisor Prompt risk scoring and safety evaluation before skill execution. Simple
fellama-syslog-reviewer Analyzes system and application logs for issues. Simple

Tools

12 deterministic tools that perform work without LLM calls. They produce repeatable output and are dispatched by the Butler alongside agents.

Tool Purpose
fellama-pdf-extractorExtract text content from PDF files
fellama-spreadsheet-extractorParse XLSX, XLS, and CSV files
fellama-data-extractorStructured data extraction from documents
fellama-doc-extractorGeneral document processing
fellama-data-vaultSecure data storage and retrieval with access control
fellama-findFile search utility
fellama-cronScheduled job execution
fellama-housekeeperSession and memory cleanup
fellama-install-skillInstall skill packages from repositories
fellama-notifierEvent notification dispatch
fellama-search-toolsTool and agent discovery
fellama-timeTime and scheduling utilities

Memory Systems

Vector DB (LanceDB)

Semantic search over documents using embedding vectors. Documents are automatically chunked with configurable overlap, embedded via your configured embedding model, and stored in LanceDB.

  • Modes: --embedding, --file, --search, --get, --remove
  • Cosine distance similarity
  • UUID-based document lifecycle tracking
  • Configurable chunk size and overlap

History (SQLite)

Browse and export past sessions and task results. Persists across server restarts. Available as both a CLI and TUI interface.

  • Session browsing with filters
  • Task result export
  • TUI and CLI interfaces
  • SQLite-backed for reliability

IPC Protocol

All inter-process communication uses NDJSON (newline-delimited JSON). Each line before the final line is a ProgressEvent. The final line is always an AgentResponse envelope.

NDJSON output stream
{"event":"step_started","task_id":"abc-123","step":"extracting page"}
{"event":"token","task_id":"abc-123","text":"Processing "}
{"event":"stream_token","task_id":"abc-123","text":"analysis..."}
{"event":"step_done","task_id":"abc-123","step":"extracting page","result":null}
{"task_id":"abc-123","status":"success","output":{"content":"..."},"error":null}
EventPurpose
StepStartedA discrete step began (extraction, distillation, action)
StepDoneStep completed, optional result payload
TokenStreaming output token routed to main TUI pane
StreamTokenStreaming LLM token routed to TUI side panel

AgentResponse Envelope

rust
AgentResponse {
    task_id:   Option<String>,
    parent_id: Option<String>,
    status:    AgentStatus,      // success | partial | error
    output:    Value,            // must have "content" key (HR-9)
    error:     Option<Value>,
}

Agent Patterns

FeLLAMA defines three reusable agent patterns. New agents implement one of these patterns and automatically inherit the IPC protocol, error handling, and CLI conventions.

Pattern 1: Simple Agent

Single-turn LLM call

Sends one prompt, parses one response. Used by the majority of agents: objective-analyzer, summarizer, validator, info-distiller, prompt-safety-advisor, syslog-reviewer. Logs to ~/.fellama/<binary-name>.log.

Pattern 2: Orchestrated Agent

Multi-turn tool-calling LLM loop

Runs an LLM in a tool-calling loop: send prompt + tools, execute returned calls, feed results back, repeat until done or max steps. Used by skill-worker. Session dir: ~/.fellama/skill-worker/<uuid>/.

Pattern 3: Web Agent

Turn-based browser loop

Drives a browser through an LLM planning loop. The LLM is stateless per turn — the agent owns all session state and rebuilds a full input JSON each turn. Used by smartweb-agent. Session dir: ~/.fellama/<session-id>/.