Would you elaborate a bit on how you use subagents? I tend to use them sporadically, for example for it to research something or to analyse the code base a bit. But I'm not yet letting it run for long.
Sure. First of all, although I do spend a lot of time interacting with Claude Code in chat format, that is not what I am talking about here. I have setup Claude Code with very specific instructions for use of agents, which I'll get to in a second.
First of all, there's a lot of collections of subagent definitions out there. I rolled my own, then later found others that worked better. I'm currently using this curated collection: https://github.com/VoltAgent/awesome-claude-code-subagents
CLAUDE.md has instructions to list `.agents/agents/**/*.md` to find the available agents, and knows to check the frontmatter yaml for a one-line description of what each does. These agents are really just (1) role definitions that prompts the LLM to bias its thinking in a particular way ("You are a senior Rust engineer with deep expertise in ..." -- this actually works really well), and (2) a bunch of rules and guidelines for that role, e.g. in the Rust case to use thiserror and strum crates to avoid boilerplate in Error enums, rules for how to satisfy the linter, etc. Basic project guidelines as they relate to Rust dev.
Secondly, my CLAUDE.md for the project has very specific instructions about how the top-level agent should operate, with callouts to specific procedure files to follow. These live in `.agent/action/**/*.md`. For example, I have a git-commit.md protocol definition file, and instructions in CLAUDE.md that "when the user prompts with 'commit' or 'git commit', load git-commit action and follow the directions contained within precisely." Within git-commit.md, there is a clear workflow specification in text or pseudocode. The [text] is my in-line comments to you and not in the original file:
"""
You are tasked with committing the currently staged changes to the currently active branch of this git repository. You are not authorized to make any changes beyond what has already been staged for commit. You are to follow these procedures exactly.
1. Check that the output of `git diff --staged` is not empty. If it is empty, report to the user that there are no currently staged changes and await further instructions from the user.
2. Stash any unstaged changes, so that the worktree only contains the changes that are to be committed.
3. Run `./check.sh` [a bash script that runs the full CI test suite locally] and verify that no warnings or errors are generated with just the currently staged changes applied.
- If the check script doesn't pass, summarize the errors and ask the user if they wish to launch the rust-engineer agent to fix these issues. Then follow the directions given by the user.
4. Run `git diff --staged | cat` and summarize the changes in a git commit message written in the style of the Linux kernel mailing list [I find this to be much better than Claude's default commit message summaries].
5. Display the output of `git diff --staged --stat` and your suggested git commit message to the user and await feedback. For each response by the user, address any concerns brought up and then generate a new commit message, as needed or instructed, and explicitly ask again for further feedback or confirmation to continue.
6. Only when the user has explicitly given permission to proceed with the commit, without any accompanying actionable feedback, should you proceed to making the commit. Execute 'git commit` with the exact text for the commit message that the user approved.
7. Unstash the non-staged changes that were previously stashed in step 2.
8. Report completion to the user.
You are not authorized to deviate from these instructions in any way.
"""
This one doesn't employ subagents very much, and it is implicitly interactive, but it is smaller and easier to explain. It is, essentially, a call center script for the main agent to follow. In my experience, it does a very good job of following these instructions. This particular one addresses a pet peeve of mine: I hate the auto-commit anti-feature of basically all coding assistants. I'm old-school and want a nice, cleanly curated git history with comprehensible commits that take some refining to get right. It's not just OCD -- my workflow involves being able to git bisect effectively to find bugs, which requires a good git history.
I also have a task.md workflow that I'm actively iterating on, and is the one that I get it working autonomously for a half hour to an hour and am often surprised at finding very good results (but sometimes very terrible results) at the end of it. I'm not going to release this one because, frankly, I'm starting to realize there might be a product around this and I may move on that (although this is already a crowded space). But I don't mind outlining in broad strokes how it works (hand-summarized, very briefly):
"""
You are a senior software engineer in a leadership role, directing junior engineers and research specialists (your subagents) to perform the task specified by the user.
1. If PLAN.md exists, read its contents and skip to step 4.
2. Without making any tool calls, consider the task as given and extrapolate the underlying intent of the user.
[A bunch of rules and conditions related to this first part -- clarify the intent of the user without polluting the context window too much]
3. Call the software-architect agent with the reformulated user prompt, and with clear instructions to investigate how the request would be implemented on the current code base. The agent is to fill its context window with the portions of the codebase and developer documentation in this repo relevant to its task. It should then generate and report a plan of action.
[Elided steps involving iterating on that plan of action with the user, and various subagents to call out to in order to make sure the plan is appropriately sequenced in terms of dependent parts, chunked into small development steps, etc. The plan of action is saved in PLAN.md in the root of the repository.]
4. While there are unfinished todos in the PLAN.md document, repeat the following steps:
a) Call rust-engineer to implement the next todo and/or verify completion of the todo.
b) Call each of the following agents with instructions to focus on the current changes in the workspace. If any actionable items are found in the generated report that are within the scope of the requested task, call rust-engineer to address these items and then repeat:
- rust-nit-checker [checks for things I find Claude gets consistently wrong in Rust code]
- test-completeness-checker [checks for missing edge cases or functionality not tested]
- code-smell-checker [a variant of the software architect agent that reports when things are generally sus]
- [... a handful of other custom agents; I'm constantly adjusting this list]
- dirty-file-checker [reports any test files or other files accidentally left and visible to git]
c) Repeat from step a until you run through the entire list of agents without any actionable, in-scope issues identified in any of the reports & rust-engineer still reports the task as fully implemented.
d) Run git-commit-auto agent [A variation of the earlier git commit script that is non-interactive.]
e) Mark the current todo as done in PLAN.md
5. If there are any unfinished todo in PLAN.md, return to step 4. Otherwise call software-architect agent with the original task description as approved by the user, and request it to assess whether the task is complete, and if not to generate a new PLAN.md document.
6. If a new PLAN.md document is generated, return to step 4. Otherwise, report completion to the user.
"""
That's my current task workflow, albeit with a number of items and agent definitions elided. I have lots of ideas for expanding it further, but I'm basically taking an iterative and incremental approach: every time Claude fumbles the ball in an embarrassing way (which does happen!), I add or tweak a rule to avoid that outcome. There are a couple of key points:
1) Using Rust is a superpower. With guidance to the agent about what crates to use, and with very strict linting tools and code checking subagents (e.g. no unsafe code blocks, no #[allow(...)] directives to override the linter, an entire subagent dedicated to finding and calling out string-based typing and error handling, etc.) this process produces good code that largely works and does what it was requested to do. You don't have to load the whole project in context to avoid pointer or use-after-free issues, and other things that cause vibe coded project to fail at a certain complexity. I don't see this working in a dynamic language, for example, even though LLMs are honestly not as good at Rust as they are in more prominent languages.
2) The key part of the task workflow is the long list of analysts to run against the changes, and the assumption that works well in practice that you can just keep iterating and fixing reported issues (with some of the elided secret sauce having to do with subagents to evaluate whether an issue is in scope and needs to be fixed or can be safely ignored, and keeping on eye out for deviations from the requested task). This eventual completeness assumption does work pretty well.
3) At some point the main agent's context window gets poisoned, or it reaches the full context window and compacts. Either way this kills any chance of simply continuing. In the first case (poisoning) it loses track of the task and ends up caught in some yak shaving rabbit hole. Usually it's obvious when you check in that this is going on, and I just nuke it and start over. In the latter case (full context window) the auto-compaction also pretty thoroughly destroys workflow but it usually results in the agent asking a variation on "I see you are in the middle of ... What do you want to do next?" before taking any bad action to the repo itself. Clearing the now poisoned context window with "/reset" and then providing just "task: continue" gets it back on track. I have a todo item to automate this, but the Claude Code API doesn't make it easy.
4) You have to be very explicit about what can and cannot be done by the main agent. It is trained and fine-tuned to be an interactive, helpful assistant. You are using it to delegate autonomous tasks. That requires explicit and repeated instructions. This is made somewhat easier by the fact that subagents are not given access to the user -- they simply run and generate reports for the calling agent. So I try to pack as much as I can in the subagents and make the main agent's role very well defined and clear. It does mean that you have to manage out of band communication between agents (e.g. the PLAN.md document) to conserve context tokens.
If you try this out, please let me know how it goes :)
I tried this tonight as my first time using anything like Claude code, and having a week or so of copilot agentic mode experience.
It's the right path, I'm very smitten with seeing the sub agents working together. Blew through the Pro quota really fast.
I was a skeptic and am no more. Gonna see what it takes to run something basic in a home lab, and how the performance is, even if it is incredibly slow on a beefy home system, just checking in on it should be low enough friction for it to noodle on some hobby projects.
Yeah it was a "HOLY SHIT" moment for me when I first started experimenting with subagents. A step-change improvement in productivity for sure. They combine well together with Claude Code's built-in todo tool, and together really start to deliver on the promised goal of automating development. Watching it delegate to subagents and then seeing the flow of information back and forth is amazing.
One thing I forgot to mention -- I run Claude within a simple sandboxed dev container like this: https://github.com/maaku/agents/tree/main/.devcontainer This allows to safely run with '--dangerously-skip-permissions' which basically gives Claude free reign within the docker container in which it is running. This is what lets you run without user interaction.
When you say "run something basic in a home lab" do you mean local inference? Qwen3-Coder is probably the model to use if you want to go that route. Avoid gpt-oss as they used synthetic data in their training and it is unlikely to perform well.
I'm investigating this as well as I need local inference for some sensitive data. But honestly, the anthropic models work so well that I justified getting myself the unlimited/max plan and I mostly use that. I suspect I overbought -- at $200/mo I have yet to ever be rate limited, even with these long-running instances. I stay within the ToC and only run 1-2 sessions at a time though.
I just recently stumbled upon your tdd-guard when looking for inspiration for Claude hooks. I've been so impressed with what it allowed me to improve the workflow and quality. Then I was somewhat disappointed that almost no one seems to talk about this potential and how they're using hooks. Yours was the only interesting project I found in this regard and hope to give it a spin this weekend .
You don't happen to have a short video where you go into a bit more detail on how you use it though?
I don't have a detailed video beyond the short demo on the repo, but I'll look into recording something more comprehensive or cover it in a blog post. Happy to ping you when it's ready!
In the meantime: I simply set it up and go about my work. The only thing I really do is just nudge the agent into making architectural simplifications and make sure that it follows the testing strategies that I like: dependency injection, test helpers, test data factories and such. Things that I would do regardless of the hook.
I like to give my tests the same attention and care that I give production code. They should be meaningful and resilient. The code base contains plenty of examples but I will look into putting something together.
I spent my summer holiday on this because I truly believe in the potential of hooks in agentic coding. I'm equally surprised that this space hasn't been explored more.
I'm currently working on making the validation faster and more customizable, plus adding reporters to support more languages.
I think there is an Amazon backed vscode forked that is also exploring this space. I think they market it as spec driven development.
Habe you tried opencode? I haven't really, but it can use your anthropic subscription and also switch to most other models. It also looks quite nice IMO
I don't think most of these allow other tools to "use" the monthly subscription. Because of that you need an API key and have to pay per tokens. Even Claude code for a while did not use your Claude subscription.
But now they have a subscription for claude code , copilot has a sub and some others too. They might not allow it, but whatever; we are paying, so what's the big deal.
Because weaving is simpler and more repetitive than sewing. Sewing is both a pretty precise job and also lots of different movements, angles and very few straight lines. All of these make it less efficient to automate.
If you're interested in the history: it took quite long after during the industrialization to invent a working sewing machine. There are quite a few interesting Youtube videos for example on the history and workings of the sewing machine.
I mean, I do! The music I have I put on Soulseek, although the more obscure stuff hasn't been downloaded yet. I also have fairly old video game mods - I don't even know where to share them or if anyone would be interested at all.
You could try to upload them to modding sites (preferrably not onces with a longin requirement for downloading) if you don't want to host them yourself. That can be either general modding archives or game-specific community sites - the latter are smaller but more likely to be interested in older mods. Make sure that whatever host you use can be crawled by the internet archive.
Interest is probably going to be low but not zero - I often play games long after they have been released and sometimes intentionally using older versions that are no longer supported by current mods.
You are entirely right - although I'd have to be careful with uploading it and where because on Steam Workshop, there's assholes who threaten to DMCA you without basis and there are similar problems on other sites too. But I'll look around :)
Do you have any resources you recommend for representing sub sections? I'm currently prototyping a note/thoughts editor where one feature is suggesting related documents/thoughts (think linked notes in Obsidian) for which I would like to suggest sub sections and not only full documents.
Sorry, no good references off hand. I’ve had to help write & generate public docs in DocBook in the past. But no expert on either editors, nlp, or embeddings besides hacking around some tools for my own note taking. My assumption is youll want to use your existing markup structure, if you have it. Or naively split on paragraphs with a tool like spacy. Or get real fancy and use dynamic ranges; something like an accumulation window that aggregates adjacent sentences based on individual similarity, break on total size or dissimilarity, and then treat that aggregate as the range to “chunk.”
Thanks for the elaborate and helpful response. I'm also hacking on this as a personal note taking project and already started playing around with your ideas. Thanks!
What art are you referring to? I only see photographs that are credited (on mobile)
Do you mean the navy cap with the Pepsi logo? It's credited as an illustration. In 2021 the Text-to-image models also weren't that good yet, or did I miss something?
yep navy cap was "illustrated" but not by the "artist".
afaics, although it would be so much more tragic if the artist actually created the image. just can't win