Tuesday, June 19, 2007

Sometimes normalising data is a pain

We've all been there. Inherting some piece of crap Access app that was designed by someone who did a 5 minute polytech class. All the data is stored in one table or replicated in several tables. Above all else it's a nightmare to code against.

Of course databases like this could do with some normalisation. But sometimes, we can go a little too far. I am currently involved with a project to track data through a system and display eta's and other information on several display monitors around the physical location, to give workers a better idea of current state of play, allowing them to make better judgement calls. (can't go into specifics here).

At several locations along the route, there data is retrieved using some equipment. Each piece of equipment has a unique 6 character Id and posts information in real time to my app using tcp/ip sockets, with the 6 digit Id as the reference.

A process then filters this raw data and stores it in the database, with the primary key being the 6 digit char. Ok so far. At several points along the route decisions are made and data is manipulated and stored in another table for use by the seperate display process.

Against this spec, I start coding the system. I am lead developer on this system, but the overall lead (wrote the spec) is also designing the database. At some point he decided to normalise the data a little and move the 6 digit Id into a table of its own with an integer primary key.

This is fine, from a database point of view. It might be possible in the future that the Id field changes (6 digit char one) but not very likely. About as likely as Steve Jobs owning a Zune, likely.

So by normalising the data, my job as a developer has increased. A lot. First of all, my classes need to be rewritten to incorporate a int id field. ( I do this by hand - shoot me ), then my code needs to be rewritten to include joins to this new table . A simple change on the database creates a lot more work.

Some people at this point may argue that the design is at fault, or you shouldn't code against an incomplete db schema. This may be true. However in this system the int id field is meaningless, so I have to access two tables everytime, instead of just checking against the raw data. Ok, we may save a miniscule amount of space on the database server, but really, is it worth the extra effort, especially considering how cheap cycles and space are these days?

Normalisation, is good. But before you do it to the nth degree, it needs to weighed up against the system as a whole. Just because a DBA prefers fully normalised data, doesn't mean it should be done.

No comments: