Design Philosophy

FeLLAMA is built around core principles that guide every architectural decision. These aren't aspirational — they are enforced through code structure, hard rules, and review.

Separation of Concerns

The Gateway handles transport: WebSocket sessions, status fan-out, replay, and scheduling. The Butler handles orchestration: intake, objective routing, dispatch, and result ownership. These two layers never mix.

Worker Autonomy

Each agent creates its own LLM client, manages its own context window, and handles its own retries. The Butler controls whether to launch a worker, not how it talks to the LLM. No centralized LLM proxy.

Safety First

Skill packages are evaluated by the Prompt Safety Advisor before execution. Workers run with constraint isolation — restricted file access, script timeouts, and scoped resources.

Full Observability

Trace logging at every process and network boundary. Every LLM request, WebSocket message, and subprocess invocation is captured. Single-turn agents use global logs; multi-turn agents use session-scoped traces.

Composability

All agents follow shared patterns (Simple, Orchestrated, Web), use the same output envelope, the same error types, and the same CLI argument conventions. New agents slot in without framework changes.

Fault Tolerance

Stuck detection identifies repeated actions and stalled research. Time budgets enforce soft and hard limits. Failed tasks can retry with alternative agents. Sessions checkpoint for resume.


Architecture

FeLLAMA uses a layered architecture. The CLI provides the user interface, the Gateway manages transport and sessions, and the Butler orchestrates work by dispatching standalone worker processes.

fellama-cli
Terminal UI · Virtual Shell · TUI
WebSocket
Gateway
Sessions · Status Fan-out · Replay
Butler Orchestrator
Objective Analysis · DAG Planning
Wave Dispatch · Review Loop
OS processes
Agent A
Agent B
Tool C

Workers are standalone OS processes. They function correctly with or without a Gateway listener. Each worker owns its LLM lifecycle — the Butler dispatches but never proxies LLM calls.


Crate Structure

The workspace is organized into five crates, each with a single non-overlapping responsibility. If a function is useful to two or more crates, it belongs in fellama-core.

CrateRoleDependencies
fellama-coreShared infrastructure: config, constants, OpenAI client, SQLite store, async dispatcher, IPC, orchestrator, simple agentExternal crates only
fellama-agentsLLM-powered agents that call an LLM to achieve goalsfellama-core
fellama-toolsDeterministic tools that work without LLM callsfellama-core
fellama-memoryHistory browsing, vector embeddings, semantic searchfellama-core
fellama-cliTerminal UI, virtual shell, WebSocket client/server, orchestrationfellama-core

Butler Orchestrator

The Butler is the brain of FeLLAMA. It receives user objectives, decomposes them into dependency-aware task graphs, dispatches agents in parallel waves, reviews results, and retries on failure.

DAG Execution

Tasks declare depends_on relationships. Outputs wired via input_bindings. Independent tasks run in parallel within waves.

Review Loop

After each wave, an LLM reviews outputs and decides: Accept, Retry, or Continue (append new tasks).

Concurrency Control

A global semaphore limits concurrent agents. Round-robin dispatch ensures fairness across sessions.

Fault Recovery

Per-task alternatives allow failed tasks to retry with a different agent or skill. Review loop injects corrective tasks.


Agents

FeLLAMA ships with 9 LLM-powered agents. Each is a standalone binary that receives input via CLI arguments, communicates via NDJSON, and returns an AgentResponse envelope.

AgentPurposePattern
fellama-smartweb-agentLLM-directed browser automation with CDP, stuck detection, quality gates, time budgetsWeb Agent
fellama-skill-workerExecutes Agent Skill packages with tool-calling LLM loop, safety evaluationOrchestrated
fellama-content-transformerFaithful content transformation and output-shape conversionSimple
fellama-info-distillerContent distillation — extracts key facts from large textSimple
fellama-objective-analyzerNormalizes user requests into structured objective objectsSimple
fellama-summarizerText summarization with configurable output formatSimple
fellama-validatorValidates agent outputs against expected schemas and constraintsSimple
fellama-prompt-safety-advisorPrompt risk scoring and safety evaluation before skill executionSimple
fellama-syslog-reviewerAnalyzes system and application logs for issuesSimple

Tools

12 deterministic tools that perform work without LLM calls. They produce repeatable output and are dispatched by the Butler alongside agents.

ToolPurpose
fellama-pdf-extractorExtract text content from PDF files
fellama-spreadsheet-extractorParse XLSX, XLS, and CSV files
fellama-data-extractorStructured data extraction from documents
fellama-doc-extractorGeneral document processing
fellama-data-vaultSecure data storage and retrieval with access control
fellama-findFile search utility
fellama-cronScheduled job execution
fellama-housekeeperSession and memory cleanup
fellama-install-skillInstall skill packages from repositories
fellama-notifierEvent notification dispatch
fellama-search-toolsTool and agent discovery
fellama-timeTime and scheduling utilities

Memory Systems

Vector DB (LanceDB)

Semantic search over documents using embedding vectors. Documents are automatically chunked with configurable overlap, embedded via your configured model, and stored in LanceDB.

  • Modes: --embedding, --file, --search, --get, --remove
  • Cosine distance similarity
  • UUID-based document lifecycle
  • Configurable chunk size and overlap
History (SQLite)

Browse and export past sessions and task results. Persists across server restarts. Available as both CLI and TUI interface.

  • Session browsing with filters
  • Task result export
  • TUI and CLI interfaces
  • SQLite-backed for reliability

IPC Protocol

All inter-process communication uses NDJSON (newline-delimited JSON). Each line before the final line is a ProgressEvent. The final line is always an AgentResponse envelope.

NDJSON output stream
{"event":"step_started","task_id":"abc-123","step":"extracting page"}
{"event":"token","task_id":"abc-123","text":"Processing "}
{"event":"stream_token","task_id":"abc-123","text":"analysis..."}
{"event":"step_done","task_id":"abc-123","step":"extracting page","result":null}
{"task_id":"abc-123","status":"success","output":{"content":"..."},"error":null}
EventPurpose
StepStartedA discrete step began (extraction, distillation, action)
StepDoneStep completed, optional result payload
TokenStreaming output token routed to main TUI pane
StreamTokenStreaming LLM token routed to TUI side panel

AgentResponse Envelope

Rust
AgentResponse {
    task_id:   Option<String>,
    parent_id: Option<String>,
    status:    AgentStatus,      // success | partial | error
    output:    Value,            // must have "content" key (HR-9)
    error:     Option<Value>,
}

Agent Patterns

FeLLAMA defines three reusable agent patterns. New agents implement one of these and automatically inherit the IPC protocol, error handling, and CLI conventions.

Pattern 1: Simple Agent
Single-turn LLM call

Sends one prompt, parses one response. Used by: objective-analyzer, summarizer, validator, info-distiller, prompt-safety-advisor, syslog-reviewer. Logs to ~/.fellama/<binary-name>.log.

Pattern 2: Orchestrated Agent
Multi-turn tool-calling LLM loop

Runs an LLM in a tool-calling loop: send prompt + tools, execute returned calls, feed results back, repeat until done or max steps. Used by skill-worker. Session dir: ~/.fellama/skill-worker/<uuid>/.

Pattern 3: Web Agent
Turn-based browser loop

Drives a browser through an LLM planning loop. The LLM is stateless per turn — the agent owns all session state and rebuilds a full input JSON each turn. Used by smartweb-agent. Session dir: ~/.fellama/<session-id>/.