Book data is fundamentally messy. Amazon has product listings for every format - hardcover, paperback, Kindle, audiobook - but the metadata is inconsistent across them. One listing might say “Hardcover” while another for the same book just says “Book.” Author names vary between formats. Publication dates are sometimes precise, sometimes just a year. Then there’s ISBNDB, which has ISBN-level detail that Amazon doesn’t expose - binding types, publishers, page counts - but it doesn’t know anything about Amazon-specific formats or product relationships. Neither source is complete on its own, and they overlap in unpredictable ways. It’s just a mess - like a lot of data in the world.

I spent the last three days hunting a bug in AuthorMagic’s book discovery pipeline - the system that stitches all this messy data together. The pipeline crashed every time a user tried to add books - 100% reproducible, completely blocking. I fixed the crash relatively quickly. But the edition detection regression underneath it took 10+ hours across multiple sessions to find, and the root cause turned out to be six lines of code.

The fix itself was almost embarrassingly simple. The real story is about how Claude and I went wrong debugging it - and what the right approach would have been.


The book discovery pipeline takes an author name, finds all their books on Amazon via the Rainforest API, matches them against ISBNDB for additional metadata, groups products by edition (1st edition, 2nd edition, etc.), and saves everything to the database. For my books - Venture Deals has four editions, Startup Boards has two, Startup Communities has two - the edition detection has to look at publication dates, format types, and year gaps to figure out which products belong to which edition.

After fixing the crash (null-safety issues from a foreword detection feature), I noticed the edition counts were wrong. Venture Deals showed 3 editions instead of 4. Startup Boards showed 1 instead of 2. Startup Communities showed 1 instead of 2. I had a regression test baseline from when everything worked correctly, so I knew the exact expected values.

I asked Claude to investigate. It built a plan, hypothesized that the foreword detection filter was incorrectly skipping products from multi-author books (checking authors per-binding instead of per-book), and implemented a fix. The hypothesis was plausible - multi-author books do have inconsistent author data across Amazon formats.

The fix didn’t work. Same wrong edition counts.

I pushed Claude to keep investigating. It checked whether the title filter in the discovery endpoint was removing critical titles. It compared the edition grouping algorithm against the pre-regression version. It traced the data flow through multiple functions. Each investigation ruled out a hypothesis but didn’t find the root cause.


Here’s what I should have done from the start, and what I eventually forced us to do: look at the actual data.

I told Claude to add diagnostic logging that would print the format, date, and seniority of every single product entering the edition grouping algorithm. When I ran the pipeline and read the output, the bug was immediately visible:

fmt="Book"    date=August 27, 2019   seniority=999
fmt="Book"    date=March 1, 2021     seniority=999
fmt="Unknown" date=2016-11-22        seniority=999
fmt="Hardcover" date=2016-12-12      seniority=1
fmt="Hardcover" date=2011-08-02      seniority=1

The edition gap detection algorithm only looks at physical formats - hardcovers and paperbacks with seniority 1-3. Products with seniority 999 are invisible to it. Out of 22 Venture Deals products, only two were recognized as physical formats. The rest had fmt="Book" or fmt="Unknown" - both getting seniority 999.

The Rainforest API was returning format: "Book" for a bunch of products. The canonicalizeFormat function had a passthrough default - any format string it didn’t explicitly recognize got stored as-is and treated as valid. “Book” isn’t “Hardcover” or “Paperback,” so it passed through. Then resolveFormat saw a non-“Unknown” result and short-circuited - it never fell through to ISBNDB, which had the correct binding type for every one of those products.

The fix was adding if (lower === "book") return "Unknown" to canonicalizeFormat. Six lines including the comment. That forced the format resolution chain to continue to ISBNDB, which returned “Hardcover” or “Paperback” as appropriate. Venture Deals went back to 4 editions. Startup Boards went back to 2. Startup Communities went back to 2.


The lesson I took from this is about how to debug data-dependent algorithms. I gave Claude a clear ticket description, a regression test baseline, and detailed context. It built a thoughtful plan with key decisions, rejected approaches, edge cases, and codebase patterns. The plan was well-structured and the hypothesis was reasonable. It was also completely wrong.

The problem is that Claude reasoned about code paths without looking at data. It asked “what could the foreword filter do wrong?” instead of asking “what data are these functions actually receiving?” A human developer would have done the opposite - they’d have looked at the product data first, noticed the fmt="Book" values immediately, and traced backward to canonicalizeFormat in under an hour.

I think this is a broader pattern with AI-assisted debugging. The AI is good at reading code, building hypotheses, and exploring possibilities. It can hold a lot of context and reason about interactions between systems. But it defaults to reasoning about code rather than observing data. For algorithms that transform external data - search scoring, format resolution, edition detection, anything that processes API responses - the data is the truth. The code is just the mechanism.

The right debugging approach for data-dependent algorithms is straightforward:

  1. Add diagnostic logging that dumps actual field values at each pipeline stage
  2. Run the pipeline once with real data
  3. Read the output and look for unexpected values
  4. Trace the unexpected value backward to its source
  5. Fix the source

I skipped steps 1-4 for the first several hours and let Claude hypothesize from code. When I finally pushed for “just show me what data the algorithm receives,” the bug was visible in the first output.

This is a good counterexample to the idea that AI makes reading code unnecessary. Sometimes reading the code - or more precisely, reading the data the code processes - is exactly what you need to do. The AI can write the diagnostic logging, but a human looking at fmt="Book" seniority=999 immediately asks “why is that Book and not Paperback?” Claude never asked that question because it was busy reasoning about a different part of the system entirely.

I’m keeping this as a permanent note in my debugging playbook: for data-dependent algorithms, inspect the data first, hypothesize second.