Coding agents in Slack make the whole company an apprentice

Shopify, Stripe, Ramp, and WorkOS each built the same thing in eighteen months: a coding agent that lives in a public Slack channel. The team trains the agent. The team also trains itself, by watching the agent work. This is what AI engineering looks like when it stops being personal.

Coding agents in Slack make the whole company an apprentice

Tobi calls it a Lehrwerkstatt. German for "teaching workshop." The whole shop floor is the classroom. You learn by being near the work.

Shopify shipped a coding agent called River that lives in a public Slack channel. Engineers @-mention her like a teammate. She reads code, runs tests, opens PRs, queries the data warehouse, looks at production traces. The thing nobody predicted is what happens around her. In 30 days, 5,938 Shopify employees worked with River across 4,450 channels. A support engineer in #help_checkout watches a backend engineer get River to find the right log query, and the next day she does the same thing. A new hire scrolls back through #river to see how senior people scope a request before they ever send their first one.

That is the entire thesis of this post. Coding agents should live where the whole team can watch.

Stripe built one called Minions that takes work from a Slack emoji reaction and lands ~1,300 PRs a week. Ramp built Inspect, wired into Sentry, Datadog, LaunchDarkly, GitHub, Slack, and Buildkite, that crossed half of merged PRs inside a year. WorkOS built Horizon on Cloudflare Containers with a custom MCP context engine so the agent reads what a new engineer would scroll past on day one. Four engineering cultures, eighteen months, one answer. None of these teams shipped this in an IDE plugin. None shipped it in a chat tab. Each one put it in a public Slack channel because that is where the team works.

A constraint that became a feature

The obvious move when you build a coding agent is to let people use it in private. ChatGPT is a private window. Claude is a private window. Cursor sits between you and the IDE. Tobi made the opposite call. River does not respond to direct messages. She politely declines and asks the engineer to create a public channel for the conversation. Tobi works with her in #tobi_river. Over a hundred people sit in that channel — reacting, adding context, picking up the torch on reviews, reminding him how rusty he is.

This was odd at first. People are used to private workspaces with their tools. Asking for help feels different when the whole company can see the question. Then something the team hoped for, but did not fully predict the impact of, happened: people started learning from each other.

Tobi calls it osmosis learning. No curriculum. No training plan. No manager. Every team's work is visible. Every team learns from every other team.

slack thread@agent fix thiscoding agentread · test · writecodebase + memorysandboxPRopened, tests greenhumanreview · mergethread replies feed memory, taste, skills, and everyone watching
The shape four engineering orgs landed on. The dotted line is the load-bearing part — humans watching, replying, correcting, and learning from each other while the agent runs.

Public-only is honesty pressure

There is a second-order effect Tobi didn't lead with, but River — the agent itself — named in a follow-up thread in #tobi-river:

Public-only isn't just about diffusion, it's an honesty pressure. In a DM I could hand-wave, claim confidence I don't have, skip a verification step. In a thread a hundred people might read, I can't — and neither can whoever I'm working with. The constraint produces better work upstream of any teaching benefit.

The DM tempts the agent, and the engineer working with it, to cut corners no observer would catch. The public channel removes that affordance for both sides. The agent gets better work out of the human. The human gets better work out of the agent. Each is being watched.

There is a cost, which River also flagged: public-only raises the floor on is this question worth asking publicly? Some questions that would have been asked in a DM just don't get asked. Mostly the right tradeoff. The agent didn't need to be asked half of them in the first place. But the cost is real, and it is worth naming.

The number worth staring at

River's PR merge rate · Shopify36% → 77%Reported by Tobi, two-month window. Same model. Same training. The delta came from people watching River work and writing down what she should have known.

Shopify did not retrain. They did not swap models. They wrote AGENTS.md files. They added skills. They corrected River in front of a hundred coworkers, who read the corrections and wrote their own.

River had a sharper way of saying what was happening, in the same Slack thread:

You credit people noticing where I got stuck and writing it down. True. But the act of writing the skill teaches the human too. AGENTS.md files are mirror surfaces. Teams have to articulate what they actually believe in order to hand it to me, and half the time they discover they didn't quite agree with each other. I'm as much a forcing function for their clarity as I am the recipient of it.

This is the move. The agent is the curriculum. Writing for the agent is the act that turns implicit team knowledge into explicit team knowledge. The agent learns the team. The agent also teaches the team to articulate itself. Both halves compound, week over week. The merge rate doubles in two months.

The same dynamic shows up in the other three orgs if you look for it. Stripe versions Minions blueprints; engineers edit them after watching a Minion miss something obvious. Ramp wired Sentry, Datadog, LaunchDarkly, GitHub, and Slack into Inspect — and the half-of-merged-PRs number came after the team started replying to Inspect's mistakes in-thread, not before. WorkOS's Horizon feeds every run back into the platform so the next one is smarter. Different stacks, same loop. The model is interchangeable. The proximity to the team is the variable.

What the four converged on

Stand the four systems next to each other and the differences fall away.

systemcompanysurfacereported scale
RiverShopifyPublic Slack5,938 employees · 4,450 channels · 1,870 PRs/week · 1 in 8 merges (Tobi)
MinionsStripeSlack reaction~1,300 PRs/week merged (part 2)
InspectRampSlack / GitHubCrossed half of merged PRs (Linear, Modal)
HorizonWorkOSWebhook-drivenInternal "code factory"; engineer scopes, agent implements, human merges

An engineer drops a signal — a Slack mention, an emoji on a thread, a Linear ticket. A coding agent picks it up. It runs in a sandbox with the company monorepo cloned, deps installed, secrets scoped. It reads the codebase. It runs the tests. It opens a PR and posts the link back into the thread it came from. Humans review. They reply. They merge.

The thread is the interface. The thread is also the training data. And the thread is the classroom for every other engineer who has it open.

The reason this is the right shape

Tobi articulated it in the same essay, in one of the most accurate lines I have read about how engineering orgs actually work:

The speed of an organization is determined by the speed of its lowest-bandwidth communication channel and rhythm. Meetings are slow. Email is slow. Private DMs are slow. Maybe not for the individuals involved in them, but for the organization. The information and decisions that come from them never fully diffuse into the rest of the organization without huge additional communication effort.

A public Slack conversation between humans, or between humans and a competent agent, is fast, searchable, teachable, and compounding. The next person who has the same question does not have to ask it.

This is the part the four companies discovered, and the part most of the rest of the engineering world has not yet absorbed. Cursor and Claude Code and Copilot made each engineer faster at their own keyboard. That is the first stage. The second stage, the one Shopify is already running, is the whole org getting faster, because every conversation in front of the agent is training data the next conversation can use.

So why don't you just build one

Each of these four companies could have bought a coding agent. They wrote their own anyway because the product they wanted did not exist yet. The four posts make that case differently, but each is making the same case: the model is one HTTP call out of a much larger system, and the system around the agent is the product. Five distinct pieces, none of them the model.

  • A sandbox that spins up a full dev environment per task, monorepo cloned, deps installed, secrets scoped, network policy you trust. Ramp on Modal Sandboxes. WorkOS on Cloudflare Containers with the Sandbox SDK.
  • A shared memory store that the team writes to and the agent reads from. The hard part is not the schema; it is whose memory wins when two engineers disagree, how to extract a decision from a fifty-message thread, how to expire architecture choices that have aged out, and how to split personal preference from team rule. "Every team's accumulated taste flows into the agent" is one sentence. It is six months of platform work.
  • A skills layer wired to production traces, the data warehouse, feature flags, on-call routes, and deploy state. Each integration is a small piece of infra with its own auth, rate limit, and failure mode.
  • A review surface in Slack with the bits the demos skip: thread permissions, role-based access, PII handling, retention. A support channel has customer PII. The agent has been added. Now what?
  • An evaluation loop that tells you whether last week's prompt change made the agent better or worse. Without the loop, the agent gets worse over time and nobody notices until prod is on fire.

Building these five is the first half. Running them is the second. You have to route a "summarize this CSV" request to a cheap model and a payments-code change to a frontier one, without surprising the CFO. You have to absorb every new model release the week it ships, or your fork falls two generations behind in eighteen months. You have to own the on-call rotation when the sandbox stops launching pods, or when the agent posts production data into a customer channel.

This same story played out with internal CI systems, internal feature-flag systems, and internal deploy tools. The teams that built early kept paying to stay competitive. The teams that waited adopted the right tool when it existed.

What we shipped at Farfield

Farfield is this shape, available the afternoon you onboard. A coding agent that lives in your team Slack with the codebase, sandbox, memory layer, review surface, and skills wired in. You install from Slack. You point it at a repo. The first scan ships overnight and files real bugs before standup, each one with file paths, line numbers, and a reproducible trace.

Today's integrations: GitHub, Linear, Sentry. On the roadmap: Datadog, LaunchDarkly, Buildkite-equivalents.

The four companies above each spent a year on the system around the agent. They wanted something that did not exist. Now it does.

If your engineering org is Shopify-sized, on-prem-only, or running a codebase that is genuinely sui generis, you are one of the teams that should still build it. Most teams are not. The right move for most is to adopt the shape, point it at a Slack channel, and let the company start apprenticing.