SHEPHERD: Programmable Meta-Agents via Reversible Agentic Execution Traces
Your harness is just another agent.
1Northeastern University 2Stanford University · *equal contribution
@agent code. The same substrate powers three meta-agents: runtime intervention, counterfactual optimization, and tree-search RL.Abstract
As LLM agent systems take on more complex tasks, they increasingly rely on meta-agents: higher-order agents that create, operate on and manage other agents. Meta-agent operations such as coordinating agents, halting risky actions before execution, or repairing failed runs require runtime manipulation of agentic execution. Yet existing agentic substrates make this difficult: they expose only transcripts and environment snapshots, forcing meta-agents to build ad hoc tooling to reconstruct and operate over full execution state.
Therefore, we introduce SHEPHERD, a Python substrate grounded in functional programming
principles, where an agent's execution is itself a first-class object that a meta-agent
can easily inspect and transform. Every model action, tool call, and environment change becomes a
structured event in a reversible, Git-like execution trace, where any past state can be reverted
5× faster than docker commit and fork. Three example use cases show SHEPHERD's
versatility: (1) a supervisor meta-agent prevents conflicts among parallel coding agents, lifting
pair-coding pass rate from 28.8% to 54.7% on CooperBench; (2) a counterfactual optimization
meta-agent repairs agent workflows by proposing edits and replaying runs from the point of changed
behavior, outperforming MetaHarness on Terminal-Bench 2.0 by 12.8% with 58% lower wall-clock;
(3) a training meta-agent picks fork points during rollouts to improve credit assignment in
long-horizon agentic RL, doubling GRPO's uplift on Terminal-Bench 2.0. We open-source SHEPHERD to
enable principled and efficient operations over agentic execution for both users and meta-agents.
The idea: execution becomes data
A meta-agent creates a worker, observes its execution without perturbing it, and
intercepts a bad action before it lands, then forks a patched branch and reverts the
buggy one. Because the whole run is a reversible, Git-like trace, all of this is ordinary
@agent code.
# agent @agent(LLM="haiku") def implement(repo, feature): "Implement feature in repo"
# meta-agent @agent(LLM="opus", tool=[observe, revert, fork]) def oversee(agent_run): "If tests break, revert the agent and retry"
# meta-agent manages the agent agent_run = implement(repo, "login") implemented = oversee(agent_run)
Try it yourself: fork and rewind your agent
Install Shepherd, drop in the plugin for your coding agent, and every step it takes becomes a commit on a reversible, Git-like trace.
From there the trace is yours to drive: shepherd log shows the steps, and when a
run goes sideways, shepherd revert 4 restores the agent and its files to
step 4, byte-for-byte, the way git checkout rewinds a repo.
See the GitHub repo for more, and the blog for the full walkthrough.
$ pip install shepherd-ai
$ shepherd init
# add the Claude Code (or Codex) plugin
$ shepherd plugin install claude-code
# run it: every step is a commit
$ shepherd run claude "fix the login bug"
# went wrong? roll back like git
$ shepherd log
$ shepherd revert 4
BibTeX
@misc{yu2026shepherdruntimesubstrateempowering,
title={Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace},
author={Simon Yu and Derek Chong and Ananjan Nandi and Dilara Soylu and Jiuding Sun and Christopher D Manning and Weiyan Shi},
year={2026},
eprint={2605.10913},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.10913}
}


