Revision pruning with window functions and logarithms, when DQL wasn't enough
Every content update on the platform creates a revision. That’s by design: editors need a history they can roll back to, and the platform needs an audit trail. What nobody anticipated was the rate. Some articles go through forty saves in a single afternoon. A high-traffic piece accumulates hundreds of revisions over its lifetime. After a few months, the revision table had several million rows. Deleting them naively wasn’t an option. “Keep the last 50” loses all historical context for articles that haven’t been touched in a year. “Keep one per day” loses all the detail for content that’s actively being edited. What we needed was a distribution that matched how revisions are actually used: dense coverage for recent history, sparse coverage for old history. ...