Last week I blogged about the accountability problem: how to handle tracking who made what change, and providing the ability to revert it.
Thanks to some great comments, and offline discussions, I’m seeing a few options, and have changed my mind on my desired approach. There's also some great technical reading on the subject: Slowly changing dimension

Option 1: Store history in the table

This was the approach I considered initially. Storing each revision of a record in the table, with an active bit set to indicate which version is active. More details in the original post. This approach is simple: it doesn’t require more tables, queries are still pretty simple, and with a combination of timestamps it’s easy to track down associated records that were changed.

The downsides are pretty clear, and somewhat damaging. The number of rows in the most commonly hit tables will grow much faster than the table overall, which is sub-optimal. It can also complicate issues like reporting and the like. In the case of highly normalized designs this design will likely break down, as you’ll need to match revisions across multiple tables on every query.

classes
idversionactivesessiontitlestart_dateend_datetimestamplogin_id
110SpringDancing 101May 1, 2009May 28, 20091274467708 23
121SpringDancing 101May 1, 2010May 28, 20101274467709 23
211SummerDancing 102June 1, 2010July 15, 2010 127446771027
311FallDancing 101September 1, 2010September 28, 2010127446780835

I would have used this approach in a smaller project. The system I’m replacing has been running for ten years, I have to plan for this lasting that long with only minimal maintenance. I’d rather not watch those row counts become 3-5x what they should be.

Option 2: Store histories in an external table

Rather than storing the histories in the regular table (say classes) creating a duplicate table (classes_history) and storing the revisions there. This helps trim the size of your most heavily used tables, and keeps things relatively simple.

While trimming the row count, it does double the table count, and still fails to handle complicated objects, or normalized table structures where a single datum requires multiple tables.

classes
idsessiontitlestart_dateend_date
1SpringDancing 101May 1, 2010May 28, 2010
2SummerDancing 102June 1, 2010July 15, 2010
3FallDancing 101September 1, 2010September 28, 2010
classes_history
idversionactivesessiontitlestart_dateend_datetimestamplogin_id
110SpringDancing 101May 1, 2009May 28, 20091274467708 23

I’ll be using this method for my upcoming project.

Option 3: Store serialized information

Rather than storing rows when information is modified, the entire encapsulating object is serialized and stored. This can be done either with one table per object type, or in one large history table.

The advantage here is that each revision stands alone and complete. Composite records are stored complete, and in a single location. This makes obtaining an accurate history relatively easy. On the downside, since information is serialized it’s hard to locate specific pieces of information via sql. This approach seems perfect for complex objects where a single composite record spans many tables, but a little overkill for this specific case.

classes
idversionactivesessiontitlestart_dateend_datetimestamplogin_id
121SpringDancing 101May 1, 2010May 28, 20101274467709 23
211SummerDancing 102June 1, 2010July 15, 2010 127446771027
311FallDancing 101September 1, 2010September 28, 2010127446780835
history
idversiontabledata
11classesa:8:{s:2:"id";i:1;s:7:"version";i:1;s:7:"session";s:6:"spring";s:5:"title";s:11:"Dancing 101";s:10:"start_date";s:11:"May 1, 2009";s:8:"end_date";s:12:"May 28, 2009";s:9:"timestamp";s:10:"1274467708";s:8:"login_id";i:23;}

I'll consider this for larger projects with complex objects.


Other points to ponder:

  • How long will you store histories?
    • Forever (∞)
    • Some fixed period of time (30 days)
    • Application specific (e.g. storing change to a class for three months after it ends)
  • Storage engine for history
    • MyISAM
    • Archive
  • Backup strategies
    • History tables are all appends



Thanks to everyone who commented on the last post.

I'm working on a volunteer side project for a local dance studio, the website will handle the standard sort of things: creating classes, registering students, matching leads to follows, etc.

We would like to have some level of accountability when various records (user, class, etc) are modified. That way if problems arise later, we can see who did what, and when. If needs be, even roll the change back.

This must be a common problem, how have you solved it?

My (proposed) solution is something like this: Take an ordinary table like class:

idsessiontitlestart_dateend_date
1SpringDancing 101May 1, 2010May 28, 2010
2SummerDancing 102June 1, 2010July 15, 2010
3FallDancing 101September 1, 2010September 28, 2010

And add a few columns to for change management

idversionactivesessiontitlestart_dateend_datetimestamplogin_id
110SpringDancing 101May 1, 2009May 28, 20091274467708 23
121SpringDancing 101May 1, 2010May 28, 20101274467709 23
211SummerDancing 102June 1, 2010July 15, 2010 127446771027
311FallDancing 101September 1, 2010September 28, 2010127446780835

The list of classes is obtained by selecting where the active bit is set. Administrators will be able to look at the list of changes and see who made them, and when. I’m tracking login id rather than simply user id so information like IP, and timestamp can be associated with the login, rather than simply who. In the case of reverting a change, the desired version will be duplicated into a new row, and have the active bit set. In reality active will always be the highest revision number (and latest date) but the duplication makes queries easier.

I think there’s a lot of power in this approach, but it is a bit complicated. Thoughts?


Hi, I’m Paul Reinheimer, a developer working on the web.

I co-founded WonderProxy which provides access to over 200 proxies around the world to enable testing of geoip sensitive applications. We've since expanded to offer more granular tooling through Where's it Up

My hobbies are cycling, photography, travel, and engaging Allison Moore in intelligent discourse. I frequently write about PHP and other related technologies.

Search