I’ve spent the few months building (and endlessly iterating to improve) four markdown state machines - /start , /commit , /staging, and /production - that together manage my development lifecycle. They total about 4,700 lines of structured decision trees. I wrote about the first two already.

Now I’m rewriting all four.

The reason is a single number: 1,000,000.


The 200K Constraint Shaped Everything

The commands didn’t start as 4,700 lines of state machines. They started as simple prose instructions - “fetch the ticket, create a branch, make a plan.” That worked until I hit the 200,000 token context window regularly. Claude would be halfway through implementing a feature and the conversation would compact - Claude Code’s way of compressing old messages to make room. After compaction, Claude would lose track of which tasks were done, forget what files it had already modified, or skip steps entirely. I’d come back to find it had re-implemented something it already finished, or worse, started writing code without ever getting plan approval.

The state machine structure evolved as a direct response to this. Numbered steps with explicit decision trees replaced prose paragraphs. Disk-based session files captured progress after every step so Claude could recover. Claude Code added features that helped - task lists that survive compaction, for instance - but for long-running workflows like /start that span ticket fetch through implementation through testing, the state machine was still essential. Even with task lists, Claude needed explicit checkpoints and file-based progress tracking to stay on course after compaction. The machinery made the system reliable. It also added a lot of overhead.

This pressure drove almost every architectural decision. Session state files that checkpoint after every single step. Task creation with full dependency wiring so Claude can recover after compaction. Plan subagents that exist partly to keep verbose codebase exploration out of the main context window. Disk-based progress tracking with checkbox files because in-memory state couldn’t be trusted to survive.

All of that machinery works. It’s also slow.


What Changes With 1M Tokens

The 1M context window landed as generally available on Opus 4.6 with no long-context premium. In practical terms, a full /start workflow - command loading, ticket fetch, codebase exploration, plan generation, implementation, and quality gates - can now fit in a single context window without compaction about 85% of the time. The elaborate recovery machinery that made the system reliable at 200K is now overhead for most tickets.

I sat down with Claude and did a systematic analysis of all four commands. I reviewed the commands, the recent blog posts, fetched the Claude Code documentation on context management, and mapped every bottleneck. The shift from “context pressure” to “wall-clock time” as the primary constraint reframes every optimization opportunity.

Here’s what I found - the highest-impact changes are parallelization:

  • /commit: Quality gates (type-check and lint) run sequentially despite being independent read-only operations. Parallelizing them saves 20-30 seconds on every single commit. The simplify-then-review pipeline serializes two phases that could overlap. Batching Linear MCP calls from 6 serial round-trips to 2 parallel messages saves another 5-10 seconds.

  • /start: Ticket fetching, comment loading, and profile reading are three sequential MCP/file operations that could all run in the first parallel message. For simple tickets, nine task creations with dependency wiring add 4 seconds of overhead that’s rarely needed with 1M context.

  • /staging: Local validation runs tests then builds sequentially - that’s 40-60 seconds of unnecessary waiting since they operate on different output directories. Six sequential Sentry queries for post-deploy error checking could be one parallel dispatch.

  • /production: Sentry monitoring and smoke tests run back-to-back despite being completely independent. The environment health audit dispatches a full Sonnet subagent when a few curl commands would catch the same critical issues.

Total estimated savings across one full cycle - start a ticket, commit, stage, deploy: about 170 seconds. That’s meaningful when the cycle happens dozens of times per day.


A Quick Note About the Table of Contents Bug

If you read the first two posts in this series, you might have noticed the table of contents behaving strangely. I added a sticky sidebar TOC with auto-collapse - when you scrolled past a section, it would collapse to save space. The feature shipped and looked great in testing.

Then it created a scroll trap in Chrome.

The auto-collapse used an IntersectionObserver to watch heading elements. When you scrolled past a heading, the observer fired and collapsed a TOC section. But collapsing a section changed the page height, which shifted the scroll position, which triggered the observer again, which collapsed another section. The page would lock up in a feedback loop - the exact kind of bug that manual testing doesn’t catch because it only triggers at specific scroll positions with specific content lengths.

I tried fixing the observer logic twice. The first fix added a debounce. The second fix tracked whether the collapse was user-initiated versus observer-initiated. Both reduced the frequency but didn’t eliminate the loop. The third fix was the right one: I removed auto-collapse entirely. The TOC stays expanded. It’s less clever and completely reliable.

This is the kind of bug that slips through because the feature worked perfectly in the happy path. I tested it with short posts and long posts, scrolled up and down, clicked TOC links. The feedback loop only appeared with specific combinations of heading density, viewport height, and scroll speed. An automated scroll test or a longer manual session would have caught it - but I was excited about the feature and shipped it fast.

The lesson is one I keep relearning: the clever version of a feature is rarely the right version. A static table of contents does everything users need. The auto-collapse was solving a non-problem.


What’s Next

Four posts, one per command. Each will cover the analysis, the implementation changes, performance measurements, and what the 1M context window specifically enables. The order follows the dependency chain:

  1. /commit - the highest-impact optimizations (parallel quality gates, collapsed review pipeline)
  2. /start - parallel ticket fetching, reduced checkpointing
  3. /staging - parallel test+build, parallel Sentry queries
  4. /production - parallel verification, slimmer health audit

The theme across all four is the same: the 200K context window made reliability the primary engineering challenge. Elaborate recovery mechanisms, defensive checkpointing, aggressive context delegation to subagents. With 1M tokens, reliability is largely solved by having enough room. The engineering challenge shifts to speed - and speed comes from parallelism.