Hermes Agent: one Python loop, thirty-plus platforms, no plugin manifest

Hermes Agent is an open-source, terminal-first AI agent framework built by Nous Research. It runs as a CLI you install on your machine, holds a conversation, calls tools, executes code, reads your files, and persists every turn to a local SQLite database. The same agent answers from your terminal at noon and from your phone over Telegram at midnight, on the same session, with the same memory.

Out of the box it ships under MIT licence with adapters for Slack, Discord, Telegram, WhatsApp, Matrix, SMS, email, WeCom, Feishu, DingTalk, Signal, Mattermost, BlueBubbles, Home Assistant, and a long tail of others — thirty-plus in total. The same binary runs locally, in Docker, over SSH, on Modal, on Daytona, on Vercel sandboxes, or in a Singularity container. Cron jobs schedule themselves. Worker sub-agents coordinate over a durable kanban board. Zed, VS Code, and JetBrains plug in over ACP; MCP clients see it as a server.

I spent a couple of days reading the source. The design choices are sharper than the README suggests, and they make Hermes one of the most architecturally interesting agent codebases shipping in 2026.

The shape

Hermes is a flat Python repo. The top-level looks roughly like this:

agent/            cron/            plugins/         tools/
acp_adapter/      gateway/         providers/       tui_gateway/
acp_registry/    hermes_cli/       run_agent.py     ui-tui/
batch_runner.py  hermes_state.py   skills/
cli.py           mcp_serve.py      optional-skills/

A handful of those do almost all the work:

run_agent.py — the AIAgent class. About 4,100 lines, down from much larger after recent extractions.
agent/conversation_loop.py — the run_conversation() body, lifted out of run_agent.py into its own ~4,100-line module.
tools/ — every tool the agent can call, one Python file each, self-registering.
tools/environments/ — eight sandbox backends (local, Docker, SSH, Modal, managed Modal, Daytona, Singularity, Vercel).
gateway/platforms/ — thirty-plus files, one per messaging platform, plus more in plugins/platforms/.
hermes_state.py — SQLite session store, FTS5 search, schema v11.
agent/ — memory orchestration, trajectory compression, skill discovery.
skills/, optional-skills/ — markdown files the agent reads as instructions.
cron/, hermes_cli/kanban_*.py — scheduled work and a durable kanban dispatcher.
acp_adapter/, mcp_serve.py — bridges to editors and MCP clients.

The interesting thing about that list is what is missing. There is no core/. There is no framework/. There is no abstract base class for "agent". The agent is one class with one entry-point function. Everything else is a peripheral that snaps onto it.

The core loop

AIAgent.run_conversation() is a synchronous turn-by-turn loop in the ReAct shape: ask the model, dispatch any tool calls, observe the results, ask again, return when the model emits no more tool calls. Unremarkable in shape:

while api_calls < max_iterations and budget.remaining > 0:
    response = model.complete(messages, tools=tool_defs)
    if response.tool_calls:
        for call in response.tool_calls:
            result = handle_function_call(call.name, call.args)
            messages.append(observation_for(result))
    else:
        return response  # plain assistant turn, conversation done

There is no implicit planner. There is no orchestrator. If the agent wants to spawn a sub-agent, it calls a delegate_task tool (tools/delegate_tool.py), which spawns a fresh AIAgent and returns its final answer. Multi-agent is a tool, not a framework.

This matters because every agent framework people complain about — LangGraph, AutoGen, the planner-executor split — solved an organizational problem ("where does control live?") by inventing an abstraction: a graph, a swarm, a director. Hermes solves it by not having an abstraction. Control lives in the loop. The loop calls tools. One of the tools spawns another loop. That's it.

The forwarder pattern, and the refactor that didn't change the architecture

The class definition lives in run_agent.py. The implementation of its central method does not. The signature is preserved, but the body is a three-line forwarder:

def run_conversation(self, user_message, ...):
    """Forwarder — see ``agent.conversation_loop.run_conversation``."""
    from agent.conversation_loop import run_conversation
    return run_conversation(self, user_message, ...)

The actual body is in agent/conversation_loop.py, which opens with this:

This is the biggest single chunk pulled out of run_agent.py: the roughly 3,900-line run_conversation body that drives one user turn through the agent (model call, tool dispatch, retries, fallbacks, compression, post-turn hooks, background memory/skill review nudges).

The forwarder pattern is repeated in at least one other place: _run_codex_app_server_turn forwards to agent/codex_runtime.run_codex_app_server_turn the same way. Symbols that production code or tests patch on run_agent directly — handle_function_call, _set_interrupt, the OpenAI import — are still resolved through a _ra() indirection inside the extracted module, so monkeypatches keep working.

This is what a careful big-file refactor looks like. The class shape doesn't move. The patches don't break. The implementation gets pulled out one peer module at a time. You can read the file structure and watch the work in progress.

Tools, and a registry that reads its own AST

The tool system is the cleanest part of the codebase. Each tool is a single Python file in tools/. At module load, each file calls registry.register(...) to declare its schema, handler, and toolset membership. The agent reads from one registry. There is no decorator. There is no plugin manifest. There is no class hierarchy.

The discovery is the interesting part. discover_builtin_tools() in tools/registry.py doesn't just import every file in the directory. That would be too eager. It parses each file's AST and imports only the ones that contain a top-level registry.register(...) call:

def _is_registry_register_call(node: ast.AST) -> bool:
    if not isinstance(node, ast.Expr) or not isinstance(node.value, ast.Call):
        return False
    func = node.value.func
    return (
        isinstance(func, ast.Attribute)
        and func.attr == "register"
        and isinstance(func.value, ast.Name)
        and func.value.id == "registry"
    )

Why bother? Because tool files often contain helpers and shared imports. A naive "import every file" would also import partial files mid-edit, helper modules with no tools, and files that register conditionally inside functions. The AST check filters the directory down to exactly the files that register at module top level, before they have side effects.

This is the kind of thing you write when you have shipped a tool plugin system three times and gotten burned each time. A small detail. It signals taste.

Skills — opt-in, not auto-loaded

A skill in Hermes is a markdown file. There's a frontmatter block (name, description, version, platforms, tags), then prose telling the agent how to do a thing. Skills live in skills/ (bundled), optional-skills/ (heavier ones you install on demand), and ~/.hermes/skills/ (yours).

The interesting part is how they get into the conversation. The naive thing — and the thing most early agent frameworks do — is to load matching skills and concatenate them into the system prompt. Hermes does not do this. Skills are called by name via slash commands: /skill-name, skills_list, skill_view. The agent has to ask for the skill it wants. The skill content enters the conversation only when invoked.

The comment in agent/skill_commands.py is explicit about the consequence:

This does NOT invalidate the skills system-prompt cache. Skills are called by name via /skill-name, skills_list, or skill_view — they don't need to be in the system prompt for the model to use them. Keeping the prompt cache intact preserves prefix caching across the reload, so a user invoking /reload-skills pays no cache-reset cost.

Anthropic, OpenAI, and Google all cache the prefix of a conversation at the provider level. The first call pays for the prefix; subsequent calls don't. If you stuff matching skills into the system prompt at the start of each turn, the cache rebuilds every time. If skills are slash-command-invoked at the moment of use, the system prompt stays stable, and the cache stays warm for the rest of the session. On Anthropic's prompt caching, a cache hit reads at roughly 10% of the cost of a normal input token. For an agent that fires twenty turns over an hour with a hefty system prompt, that's most of the bill.

The curator side of this, in agent/curator.py and surrounding modules, tracks how often each skill is used, archives unused ones after N days, and runs a backup/rollback step every time it changes anything. There's a config knob to pin skills you care about, and an LLM-review prompt for stale ones. It is the kind of thoughtful lifecycle work you only build after shipping to users who have accumulated a hundred skills they never look at.

State — SQLite, not JSON

Hermes session state lives in SQLite, WAL mode by default, falling back to DELETE journaling on NFS. Schema version 11. There's an FTS5 virtual table for full-text search over message bodies. Compression splits sessions when they get too long, with parent_session_id chains keeping the history walkable.

This is not the default in 2026. Most open-source agents persist conversations as JSON files in a directory, sometimes one per session, and call it a day. Hermes treats sessions like a real database, with concurrent reads, indexed search, and transactional writes. The cost is a bit of complexity. The benefit is that the agent can run twenty parallel sub-agents writing to the same store and you don't have to think about it.

Profile isolation is built on top. The HERMES_HOME environment variable (overridable per-context via a ContextVar) gives you a clean slate: a separate SQLite file, separate skills, separate cron jobs, separate memory. You can run hermes -p coder and hermes -p ops in two terminals at the same time and they will not see each other.

Sandboxes — pick one of eight

When the agent runs a shell command, it goes through tools/terminal_tool.py. That tool dispatches to one of eight backends under tools/environments/:

Local subprocess — the default. Your shell, your filesystem.
Docker — a container per session.
SSH — remote host with key-based auth.
Modal — cloud functions, on-demand.
Managed Modal — a managed flavour of the Modal backend.
Daytona — cloud workspaces.
Singularity — HPC environments.
Vercel Sandbox — Vercel's sandbox runtime.

The agent code stays the same, the tool stays the same, the conversation stays the same. The backend is picked at config time. Switching is a string in config.yaml. The interface is well-enough abstracted that the RL training environment reuses it: when Hermes is generating trajectories for training, it runs against the same backends as in production.

This is one of the few agent codebases I've seen that treats "where the code runs" as a first-class configuration axis rather than something baked in.

The gateway — one agent, thirty-plus chat surfaces

gateway/platforms/ holds the bulk of the platform adapters — Slack, Discord, Telegram, WhatsApp, Matrix, SMS, email, WeCom, Feishu, DingTalk, QQBot, Signal, Mattermost, BlueBubbles, Home Assistant, Yuanbao, Weixin, generic webhooks. A second directory, plugins/platforms/, holds Google Chat, IRC, Line, SimpleX, and Teams. The total is north of thirty.

platform adapters across gateway/platforms/ and plugins/platforms/30+One base class. One registry. No per-platform branching in the agent loop.

The shape that makes this work is gateway/platforms/base.py. The base class BasePlatformAdapter (an ABC) declares inbound and outbound message handling. Adding a platform is one file plus a register() call. There is no if platform == "slack": elif platform == "discord": ladder anywhere in the core. There is no per-platform serializer in the agent loop. The agent reads and writes its native message format; the gateway translates.

Two design choices make this safer than it sounds. Per-platform session management is in gateway/session.py, not in the platform adapter. Adapters are stateless wrappers around a transport. And the auto-resume code (resume_pending flags on sessions, restored on the next inbound message) means that if the gateway process restarts, conversations resume on the same transcript without dropping turns. Crashes don't lose state. Users don't see a "session expired" message.

If you have read a "we built one agent that does Slack and Discord" blog post and quietly thought but how do you reconcile the message models: this is how. Hermes is the answer to that post, except it ships and it covers thirty more platforms.

Cron, kanban, and work that survives a restart

The two non-conversational entry points to Hermes are the cron scheduler and the kanban board.

The cron scheduler (cron/scheduler.py, cron/jobs.py) is a tick-driven background loop, invoked every 60 seconds by the gateway with a file-based lock (~/.hermes/cron/.tick.lock) to prevent duplicate ticks across processes. Each job carries its own configuration: model, skills, a script that runs before the agent fires, "context from" inputs that chain outputs across jobs. Jobs survive a process restart via a catchup window. One-shot jobs get a grace window so they're not missed if you restart at the wrong second.

The kanban board is a durable multi-agent work surface, implemented across hermes_cli/kanban_*.py and tools/kanban_tools.py. A dispatcher promotes tasks through triage → todo → ready lanes. An auto-decompose path in the gateway dispatcher loop calls an auxiliary LLM to break a root task into a graph of child tasks, links them under the root, and atomically commits the result. A diagnostics layer (hermes_cli/kanban_diagnostics.py) catches hallucinated card ids, spawn crash-loops, and tasks stuck blocked too long; each diagnostic is machine-readable and wires into UI recovery actions (reclaim, reassign, unblock). The systemd unit at plugins/kanban/systemd/hermes-kanban-dispatcher.service runs the dispatcher as a long-lived service on machines that want one.

This is where Hermes stops being an agent and starts being an agent platform. A user-facing CLI is one entry point. A gateway with thirty chat surfaces is a second. A background cron is a third. A multi-worker kanban dispatcher is a fourth. The same loop runs in all four; the configuration differs.

Five entry points, one loop, five peripherals. Adding a platform, a tool, or a sandbox does not touch the centre.

Why this shape works

Squint at the architecture and three principles repeat.

One loop, many surfaces. The agent is one function (now a forwarder to one peer module). Every user-facing entry point — CLI, gateway, cron, kanban, editor bridge — is a thin wrapper that calls the same loop. There is no "agent core vs agent UI" split. There is no agent-as-a-service. The agent is the function; everything else is how you reach it.

Pluggable at the edges, fixed in the middle. Tools, platforms, sandboxes, model providers, memory providers, skills: every one is registry-based. The loop, the registry, the state store, and the session model are not. The right things are bolted down. The right things are loose.

Opt-in context, not auto-loaded context. Skills are slash-command-invoked, not implicitly attached. Sessions go in SQLite, not JSON files. State that should be immutable is immutable; state that should compose is keyed and indexed. The system prompt stays small because the design refuses to grow it.

The result is a codebase that does an unreasonable amount of work per line of code. Hermes covers thirty messaging platforms, eight sandboxes, RL training, multi-agent collaboration, and a curated skill marketplace. Most teams shipping the same surface area would have a microservices diagram on a whiteboard. Hermes has one class with one loop at the centre and a directory of plugins around it.

Where it doesn't fit

The honest part. Hermes is opinionated about a few things that don't suit every use case.

Single-tenant by design. Profiles isolate per user, but there is no multi-tenant model: no workspaces, no RBAC, no audit log shaped for compliance. If you want to run an agent product for paying customers with isolation between them, you are building most of that yourself on top.
No first-class production telemetry. OpenTelemetry hooks exist if you wire them, but no out-of-the-box scan over production logs, no incident integrations, no Sentry-shaped error pipeline. Hermes is built for an operator-with-a-laptop, not an SRE-with-a-pager.
The big files are still big. run_agent.py is ~4,100 lines after extractions; gateway/run.py is ~18,000; cli.py is ~14,500. The forwarder pattern is making this better one module at a time, but if you intend to fork and maintain, plan for the merge cost.
Synchronous by default. Each agent step blocks. Hermes leans on threads (worker pools, gateway adapters, kanban workers) to get concurrency, with a careful persistent-event-loop pattern in model_tools.py to keep cached HTTP clients alive across thread boundaries. It works. It is not the asyncio-first model some teams prefer.

None of these are bugs. They are the price of the shape. What the shape gives you is a real agent that runs locally, talks to everything, and persists across restarts. That is a fair trade.

Where to go from here

The fastest path to understanding Hermes Agent is to clone the repo and read three files in order: tools/registry.py for the registry pattern, gateway/platforms/base.py for the multi-surface pattern, and agent/conversation_loop.py for the loop body. Those three give you the spine. Everything else is variation.