Over the past two years, my team and I have been implementing rigorous data governance and data stewardship, and capturing high-quality business definitions into our enterprise business glossary. We are now ready to take the next step: mapping those business definitions (including valid values) to their physical implementations. The driving force for this has been two-fold:
- Development of an overall business intelligence warehouse.
- Implementation of a modern Insurance system to replace a host of aging legacy systems.
Both of these initiatives benefit hugely from understanding where the physical data is and what it means. This aids not only in a robust conversion strategy, but also in ensuring that the results provide what the business has specified.
Mapping business data elements to their physical implementations is not as simple as you might think. The issue is that the database (or file) representation of the data may not mirror the business representation, leading to complications. Let me explain.
Let’s say that you have defined a set of phone numbers for your customers. You might have a home phone number, business phone number, and fax number. And since you know that the rules around customer data (such as phone numbers) may well be different from the rules around (say) employee phone numbers or supplier phone numbers, you’ve been very specific about these data elements and have associated them specifically with customer (e.g., Customer Home Phone Number).
Now comes the interesting part. If your database is very specific (usually meaning that it is pretty old), you might indeed find columns in tables that correspond to these data elements. For example, there might well be a column called HM_PHN_NUM in the CUST table. In more modern databases, however, this sort of rigid design presents roadblocks to changes required by the business. Business requirements tend to require a flexible number of instances of repeating fields (such as phone numbers), which, with the previously discussed design, would require (at the simplest level) adding a column to the table, adjusting the interface, perpetuating the new column through the information chain, and adjusting the ETL to move the data.
You’re probably way ahead of me. Why not normalize out the repeating fields into their own table, most likely with a type code to indicate the usage. Thus, you might have a CUST_PHONE table with two columns: PHONE_NUM and PHONE_TYPE, with the type field containing values (or their code equivalents) such as “Home”, “Business”, “Fax”, and so on. This makes it easy to add new types of phones (such as mobile phones), and even remove unused types (remember pagers?).
If you do design the database this way, however, how do you map your specific data elements to their physical implementations? What you have is the case where one logical element maps to two physical elements, as well as a condition: that the PHONE_TYPE has a particular value. You’ll need a tool that supports this type of mapping as well as stating the condition. Thus, the Home Phone Number field would map to PHONE_NUM and PHONE_TYPE where PHONE_TYPE = “Home” (or whatever). If you think about it, the metamodel must support a many-to-many, as each logical field maps to multiple physical fields, and each physical field maps to multiple logical fields.
Things get even more interesting when you ratchet up the level of abstraction. For example, some newer systems implement the “Party model”, in which various parties (including both people and organizations) are represented in a single structure, and various code values identify whether the party is an individual or an organization, and even down to a lower level granularity, such as what type of person the party is (customer, employee, applicant, supplier, etc.). To map specific data elements to this type of model, you now need many more links (e.g. PARTY_TYPE, PERSON_TYPE, PHONE_TYPE, and PHONE_NUM) as well as a whole set of conditions, such as:
- PARTY_TYPE = PERSON
- PERSON_TYPE = CUSTOMER
- PHONE_TYPE = HOME
You can see where all this is going. In our current design, not only did we implement the Party model, but even took an abstract approach to “contacts” (of which phone is one type). Thus, we ended up with the Customer Home Phone Number connected to:
- PARTY_TYPE (=PERSON)
- PERSON_TYPE (=CUSTOMER)
- CONTACT_TYPE ( = PHONE) and
- CONTACT_LOCATION_TYPE ( = HOME)
None of this is very hard to do, but your metamodel does need to support not only the connections noted, but the conditions as well.
Of course, you may find that you have the opposite case — an abstract model on the logical side which must map to specific instances on the physical side. The solution is not a whole lot different from what was discussed above, from a mapping standpoint. What often IS more difficult is getting the business subject matter experts to understand the abstract model, as its hard to “see the business” in the model. But that’s a topic for another time.