Monday, February 15

GIGO

Garbage In, Garbage Out. (Definition at ZDNet)
Ask any tech guy and they'll tell you that GIGO is one of the most common problems when business line people ask for useful information. You must develop processes and responsibilities to fix bad data. It's too common for users to say "the data is no good, we can't rely on it, so we won't use the system as intended" rather than to work on fixing it. Smart companies are hiring dedicated Data Stewards for their systems - key business decisions are made using this data, and it needs to be as close to perfect as you can get.

The reasons for GIGO issues can usually be broken down into three issues - data entry, data migration from other systems, and system changes over time.

The first is easy enough to grasp; errors are going to happen, your challenge is to minimize it and make it easy to fix. Things like postal code / address validation can help; most countries have postal-code / city mapping data available that lets you enter just the postal code and have the city + state/province populated. Also, turn on spell check for more fields - "Jhonson" is a possible but unlikely last name. And try to make it easy for anyone to fix the data without jumping through hoops, entering a "why did you change this" field, etc.

Data migration is data that comes in from another system, so human eyes don't see it at entry time. It can be a one-time feed at system implementation, or scheduled every day. If the problem is data entry error, fix it in the source system, but it's often more complicated than that. The key is to try to find patterns in the errors and re-do the migration process as needed. There may be a subtle business rule that allows an extra line of address for certain companies, moving every field after that to the wrong column (a real example). Whenever you start doing scheduled data migration, everyone "upstream" on the migration has to know what parts of their data are used "downstream" so changes can properly cascade throughout the business.

System changes are the most interesting challenge. If you migrate from one system to another, the Customer data is likely the same but some core fields like Reason for Contact and Product may have different values. Of course you will want to prepare a mapping document and, after much review by all, get everyone to sign off on it. I'd also suggest keeping a non-migrated version of the data (for the whole record) in a text field (visible or not), so you can see what the record was like in the old system. This can help save you later when someone figures out that the "approved" data mapping was not quite right. If you have the old values stored with each record, they are much easier to find and correct. (Listen to the voice of experience here! Nothing worse than trying to figure out which records are wrong by looking at a system that no one remembers, if it's even installed.)

If you keep the same system for a number of years, dropdown values will change, and what used to be valid data a few years ago won't match the current user interface. The reporting solution may need to have additional (deprecated) values in the dropdown for historical reporting. Also, Management often asks for three years of data when you have been collecting a particular field only one year - your challenge is then to figure out how (or if) that data was collected before and figure out how to report it. It's much better to report absence of data, or explain the report has a large margin of error, than to get it wrong.