Why Deleting Twenty Posts Required Creating Twenty Records

I added Bob Monsour’s blog to the Adventures in Claude community yesterday. The community runs on Discourse - a self-hosted forum on a DigitalOcean droplet - and I set up RSS syndication a few weeks ago so that members’ blog posts automatically appear as topics in a “Member Blogs” category. I registered Bob’s Atom feed, triggered an immediate poll, and watched twenty topics materialize in the category. His entire recent archive, imported in seconds.

I didn’t want twenty old posts sitting in the community. I wanted the most recent one. So I deleted nineteen of them.

That’s when I learned something about how deletion works in systems that automatically import content.

Discourse’s RSS polling plugin uses a table called TopicEmbed to track which feed URLs have already been imported. When the poller encounters a URL in a feed, it checks TopicEmbed.find_by(embed_url: url). If a record exists, it skips that entry. If it doesn’t, it imports the post as a new topic. This is the system’s memory of what it has already seen.

When I deleted the nineteen topics using Discourse’s PostDestroyer - the standard way to remove content - I expected the topics to disappear and life to continue. The topics did disappear. But PostDestroyer cascades. It cleans up associated records, including the TopicEmbed entries for those topics. This is sensible behavior in most contexts. If you delete a topic, why would you keep the embed record pointing to it?

The problem is that the RSS poller runs every thirty minutes. On its next pass, it would check Bob’s feed, find nineteen URLs with no corresponding TopicEmbed records, and import them all again. I would wake up to twenty topics in Member Blogs. Delete them again, and the cycle repeats. The deletion itself created the conditions for re-import.

The fix was counterintuitive. I had to create nineteen TopicEmbed records - one for each deleted post - pointing to a dummy topic and containing the original embed URLs. These records exist only to tell the RSS poller “I’ve already seen this, skip it.” The topics they reference are gone. The records are ghosts, standing guard against reimportation.

I had to use save(validate: false) to create them, because Discourse’s validations expect a TopicEmbed to reference a live topic. The records are technically invalid. They work anyway, because the poller only checks find_by(embed_url:) - it never validates the associated topic.

The broader pattern applies to any system that combines automatic importing with standard deletion has this problem lurking in it. The importer needs memory of what it has processed. The deleter, doing its job, cleans up that memory. The result is a loop: import, delete, re-import, delete.

I’ve seen variations of this in other systems. Email clients that re-download deleted messages when the server sync runs. CI/CD pipelines that re-trigger builds for commits that were reverted. Calendar apps that restore declined events from a shared calendar. The underlying structure is always the same: one process creates records, another process removes the markers that prevent recreation, and a third process recreates them because the markers are gone.

The fix is always some form of tombstone - a record that says “I processed this, and I chose to discard it.” The tombstone has to survive the deletion of the thing it refers to. In my case, the tombstone was a TopicEmbed record with validate: false. In email systems, it’s often a “deleted items” folder that the sync engine treats as “seen.” In CI, it’s a skip list.

I now have a process for adding new member blogs to the community: add the feed, poll it, delete the old topics, and recreate the embed records for the deleted URLs. Next time someone shares their RSS feed, I won’t accidentally spam the community. Or if I do, I’ll know how to clean it up without creating a groundhog day loop.

Related Posts

Webmentions