Design Philosophy
FeLLAMA is built around core principles that guide every architectural decision. These aren't aspirational — they are enforced through code structure, hard rules, and review.
The Gateway handles transport: WebSocket sessions, status fan-out, replay, and scheduling. The Butler handles orchestration: intake, objective routing, dispatch, and result ownership. These two layers never mix.
Each agent creates its own LLM client, manages its own context window, and handles its own retries. The Butler controls whether to launch a worker, not how it talks to the LLM. No centralized LLM proxy.
Skill packages are evaluated by the Prompt Safety Advisor before execution. Workers run with constraint isolation — restricted file access, script timeouts, and scoped resources.
Trace logging at every process and network boundary. Every LLM request, WebSocket message, and subprocess invocation is captured. Single-turn agents use global logs; multi-turn agents use session-scoped traces.
All agents follow shared patterns (Simple, Orchestrated, Web), use the same output envelope, the same error types, and the same CLI argument conventions. New agents slot in without framework changes.
Stuck detection identifies repeated actions and stalled research. Time budgets enforce soft and hard limits. Failed tasks can retry with alternative agents. Sessions checkpoint for resume.
Architecture
FeLLAMA uses a layered architecture. The CLI provides the user interface, the Gateway manages transport and sessions, and the Butler orchestrates work by dispatching standalone worker processes.
Workers are standalone OS processes. They function correctly with or without a Gateway listener. Each worker owns its LLM lifecycle — the Butler dispatches but never proxies LLM calls.
Crate Structure
The workspace is organized into five crates, each with a single non-overlapping responsibility. If a function is useful to two or more crates, it belongs in fellama-core.
| Crate | Role | Dependencies |
|---|---|---|
fellama-core | Shared infrastructure: config, constants, OpenAI client, SQLite store, async dispatcher, IPC, orchestrator, simple agent | External crates only |
fellama-agents | LLM-powered agents that call an LLM to achieve goals | fellama-core |
fellama-tools | Deterministic tools that work without LLM calls | fellama-core |
fellama-memory | History browsing, vector embeddings, semantic search | fellama-core |
fellama-cli | Terminal UI, virtual shell, WebSocket client/server, orchestration | fellama-core |
Butler Orchestrator
The Butler is the brain of FeLLAMA. It receives user objectives, decomposes them into dependency-aware task graphs, dispatches agents in parallel waves, reviews results, and retries on failure.
Tasks declare depends_on relationships. Outputs wired via input_bindings. Independent tasks run in parallel within waves.
After each wave, an LLM reviews outputs and decides: Accept, Retry, or Continue (append new tasks).
A global semaphore limits concurrent agents. Round-robin dispatch ensures fairness across sessions.
Per-task alternatives allow failed tasks to retry with a different agent or skill. Review loop injects corrective tasks.
Agents
FeLLAMA ships with 9 LLM-powered agents. Each is a standalone binary that receives input via CLI arguments, communicates via NDJSON, and returns an AgentResponse envelope.
| Agent | Purpose | Pattern |
|---|---|---|
fellama-smartweb-agent | LLM-directed browser automation with CDP, stuck detection, quality gates, time budgets | Web Agent |
fellama-skill-worker | Executes Agent Skill packages with tool-calling LLM loop, safety evaluation | Orchestrated |
fellama-content-transformer | Faithful content transformation and output-shape conversion | Simple |
fellama-info-distiller | Content distillation — extracts key facts from large text | Simple |
fellama-objective-analyzer | Normalizes user requests into structured objective objects | Simple |
fellama-summarizer | Text summarization with configurable output format | Simple |
fellama-validator | Validates agent outputs against expected schemas and constraints | Simple |
fellama-prompt-safety-advisor | Prompt risk scoring and safety evaluation before skill execution | Simple |
fellama-syslog-reviewer | Analyzes system and application logs for issues | Simple |
Tools
12 deterministic tools that perform work without LLM calls. They produce repeatable output and are dispatched by the Butler alongside agents.
| Tool | Purpose |
|---|---|
fellama-pdf-extractor | Extract text content from PDF files |
fellama-spreadsheet-extractor | Parse XLSX, XLS, and CSV files |
fellama-data-extractor | Structured data extraction from documents |
fellama-doc-extractor | General document processing |
fellama-data-vault | Secure data storage and retrieval with access control |
fellama-find | File search utility |
fellama-cron | Scheduled job execution |
fellama-housekeeper | Session and memory cleanup |
fellama-install-skill | Install skill packages from repositories |
fellama-notifier | Event notification dispatch |
fellama-search-tools | Tool and agent discovery |
fellama-time | Time and scheduling utilities |
Memory Systems
Semantic search over documents using embedding vectors. Documents are automatically chunked with configurable overlap, embedded via your configured model, and stored in LanceDB.
- Modes:
--embedding,--file,--search,--get,--remove - Cosine distance similarity
- UUID-based document lifecycle
- Configurable chunk size and overlap
Browse and export past sessions and task results. Persists across server restarts. Available as both CLI and TUI interface.
- Session browsing with filters
- Task result export
- TUI and CLI interfaces
- SQLite-backed for reliability
IPC Protocol
All inter-process communication uses NDJSON (newline-delimited JSON). Each line before the final line is a ProgressEvent. The final line is always an AgentResponse envelope.
{"event":"step_started","task_id":"abc-123","step":"extracting page"}
{"event":"token","task_id":"abc-123","text":"Processing "}
{"event":"stream_token","task_id":"abc-123","text":"analysis..."}
{"event":"step_done","task_id":"abc-123","step":"extracting page","result":null}
{"task_id":"abc-123","status":"success","output":{"content":"..."},"error":null}
| Event | Purpose |
|---|---|
StepStarted | A discrete step began (extraction, distillation, action) |
StepDone | Step completed, optional result payload |
Token | Streaming output token routed to main TUI pane |
StreamToken | Streaming LLM token routed to TUI side panel |
AgentResponse Envelope
AgentResponse {
task_id: Option<String>,
parent_id: Option<String>,
status: AgentStatus, // success | partial | error
output: Value, // must have "content" key (HR-9)
error: Option<Value>,
}
Agent Patterns
FeLLAMA defines three reusable agent patterns. New agents implement one of these and automatically inherit the IPC protocol, error handling, and CLI conventions.
Sends one prompt, parses one response. Used by: objective-analyzer, summarizer, validator, info-distiller, prompt-safety-advisor, syslog-reviewer. Logs to ~/.fellama/<binary-name>.log.
Runs an LLM in a tool-calling loop: send prompt + tools, execute returned calls, feed results back, repeat until done or max steps. Used by skill-worker. Session dir: ~/.fellama/skill-worker/<uuid>/.
Drives a browser through an LLM planning loop. The LLM is stateless per turn — the agent owns all session state and rebuilds a full input JSON each turn. Used by smartweb-agent. Session dir: ~/.fellama/<session-id>/.