<?paul

The accountability problem - Concluded

Tuesday, May 25. 2010

Last week I blogged about the accountability problem: how to handle tracking who made what change, and providing the ability to revert it.
Thanks to some great comments, and offline discussions, I’m seeing a few options, and have changed my mind on my desired approach. There's also some great technical reading on the subject: Slowly changing dimension

Option 1: Store history in the table

This was the approach I considered initially. Storing each revision of a record in the table, with an active bit set to indicate which version is active. More details in the original post. This approach is simple: it doesn’t require more tables, queries are still pretty simple, and with a combination of timestamps it’s easy to track down associated records that were changed.

The downsides are pretty clear, and somewhat damaging. The number of rows in the most commonly hit tables will grow much faster than the table overall, which is sub-optimal. It can also complicate issues like reporting and the like. In the case of highly normalized designs this design will likely break down, as you’ll need to match revisions across multiple tables on every query.

classes

id	version	active	session	title	start_date	end_date	timestamp	login_id
1	1	0	Spring	Dancing 101	May 1, 2009	May 28, 2009	1274467708	23
1	2	1	Spring	Dancing 101	May 1, 2010	May 28, 2010	1274467709	23
2	1	1	Summer	Dancing 102	June 1, 2010	July 15, 2010	1274467710	27
3	1	1	Fall	Dancing 101	September 1, 2010	September 28, 2010	1274467808	35

I would have used this approach in a smaller project. The system I’m replacing has been running for ten years, I have to plan for this lasting that long with only minimal maintenance. I’d rather not watch those row counts become 3-5x what they should be.

Option 2: Store histories in an external table

Rather than storing the histories in the regular table (say classes) creating a duplicate table (classes_history) and storing the revisions there. This helps trim the size of your most heavily used tables, and keeps things relatively simple.

While trimming the row count, it does double the table count, and still fails to handle complicated objects, or normalized table structures where a single datum requires multiple tables.

classes

id	session	title	start_date	end_date
1	Spring	Dancing 101	May 1, 2010	May 28, 2010
2	Summer	Dancing 102	June 1, 2010	July 15, 2010
3	Fall	Dancing 101	September 1, 2010	September 28, 2010

classes_history

id	version	active	session	title	start_date	end_date	timestamp	login_id
1	1	0	Spring	Dancing 101	May 1, 2009	May 28, 2009	1274467708	23

I’ll be using this method for my upcoming project.

Option 3: Store serialized information

Rather than storing rows when information is modified, the entire encapsulating object is serialized and stored. This can be done either with one table per object type, or in one large history table.

The advantage here is that each revision stands alone and complete. Composite records are stored complete, and in a single location. This makes obtaining an accurate history relatively easy. On the downside, since information is serialized it’s hard to locate specific pieces of information via sql. This approach seems perfect for complex objects where a single composite record spans many tables, but a little overkill for this specific case.

classes

id	version	active	session	title	start_date	end_date	timestamp	login_id
1	2	1	Spring	Dancing 101	May 1, 2010	May 28, 2010	1274467709	23
2	1	1	Summer	Dancing 102	June 1, 2010	July 15, 2010	1274467710	27
3	1	1	Fall	Dancing 101	September 1, 2010	September 28, 2010	1274467808	35

history

id	version	table	data
1	1	classes	a:8:{s:2:"id";i:1;s:7:"version";i:1;s:7:"session";s:6:"spring";s:5:"title";s:11:"Dancing 101";s:10:"start_date";s:11:"May 1, 2009";s:8:"end_date";s:12:"May 28, 2009";s:9:"timestamp";s:10:"1274467708";s:8:"login_id";i:23;}

I'll consider this for larger projects with complex objects.

Other points to ponder:

How long will you store histories?

Forever (∞)
Some fixed period of time (30 days)
Application specific (e.g. storing change to a class for three months after it ends)

Storage engine for history

MyISAM
Archive

Backup strategies

History tables are all appends

Thanks to everyone who commented on the last post.

The accountability problem

Friday, May 21. 2010

I'm working on a volunteer side project for a local dance studio, the website will handle the standard sort of things: creating classes, registering students, matching leads to follows, etc.

We would like to have some level of accountability when various records (user, class, etc) are modified. That way if problems arise later, we can see who did what, and when. If needs be, even roll the change back.

This must be a common problem, how have you solved it?

My (proposed) solution is something like this: Take an ordinary table like class:

id	session	title	start_date	end_date
1	Spring	Dancing 101	May 1, 2010	May 28, 2010
2	Summer	Dancing 102	June 1, 2010	July 15, 2010
3	Fall	Dancing 101	September 1, 2010	September 28, 2010

And add a few columns to for change management

id	version	active	session	title	start_date	end_date	timestamp	login_id
1	1	0	Spring	Dancing 101	May 1, 2009	May 28, 2009	1274467708	23
1	2	1	Spring	Dancing 101	May 1, 2010	May 28, 2010	1274467709	23
2	1	1	Summer	Dancing 102	June 1, 2010	July 15, 2010	1274467710	27
3	1	1	Fall	Dancing 101	September 1, 2010	September 28, 2010	1274467808	35

The list of classes is obtained by selecting where the active bit is set. Administrators will be able to look at the list of changes and see who made them, and when. I’m tracking login id rather than simply user id so information like IP, and timestamp can be associated with the login, rather than simply who. In the case of reverting a change, the desired version will be duplicated into a new row, and have the active bit set. In reality active will always be the highest revision number (and latest date) but the duplication makes queries easier.

I think there’s a lot of power in this approach, but it is a bit complicated. Thoughts?

The accountability problem - Concluded

Option 1: Store history in the table

Option 2: Store histories in an external table

Option 3: Store serialized information

Other points to ponder:

The accountability problem

Hi, I’m Paul Reinheimer, a developer working on the web.

Top Posts

Search