Master Data Management (CDI/IR/PIM)
What is Master Data Management
Master Data Management is a technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise’s official shared master data assets.
What is Master Data
Enterprize data can be broadly categorized as :
Transactional data are the elements that support the on-going operations of an organization and are included in the application systems that automate key business processes. This can include areas such as sales, service, order management, manufacturing, purchasing, billing, accounts receivable, and accounts payable. Commonly, transactional data refers to the data that is created and updated within the operational systems. Examples of transactional data included the time, place, price, discount, payment methods, etc. used at the point of sale.
Master data is your most important data it plays a critical role in the core operation of a business. Master data refers to the prime entities that are used by several functional groups in your enterprise and are typically stored in different data systems across an organization.
This year we have seen emerging trends in CDI & PIM. Let’s understand what are they? Why do enterprises need it or why is it such sought after and how do they really qualify in the MDM ecosphere.
CDI/IR/MDM
Customer data integration (CDI) and master data management (MDM) are getting significant buzz in both information technology (IT) and business circles.
PIM/MDM
- A wide array of products and/or complex product data set
- Frequently changing product characteristics
- An increasing number of sales channels
- Non-uniform IT infrastructure
- Online business and electronic ordering
- Various locales and localization requirements
PIM systems manage customer-facing product data needed to support multiple geographic locations, multi-lingual data, and maintenance and modification of product information within a centralized catalog. Product information kept by a business can be scattered throughout departments and held by employees or systems, instead of being available centrally; data may be saved in various formats, or only be available in hard copy form. Information may be needed for detailed product descriptions with prices, or calculating freight costs. PIM represents a solution for centralized, media-independent product data maintenance, as well as efficient data collection, enrichment, data governance, and output
Key Concepts of MDM
Let’s revisit the basics of Master Data Management & Data Governance!
MDM Eco-System
Data Quality
There are 7 dimensions of Data Quality:
- Accuracy: The degree of conformity of a measure to a standard or a true value – see also Accuracy and precision. Accuracy is very hard to achieve through data-cleansing in the general case because it requires accessing an external source of data that contains the true value: such “gold standard” data is often unavailable. Accuracy has been achieved in some cleansing contexts, notably customer contact data, by using external databases that match up zip codes to geographical locations (city and state) and also help verify that street addresses within these zip codes actually exist.
- Completeness: The degree to which all required measures are known. Incompleteness is almost impossible to fix with data cleansing methodology: one cannot infer facts that were not captured when the data in question was initially recorded. (In some contexts, e.g., interview data, it may be possible to fix incompleteness by going back to the original source of data, i.e. re-interviewing the subject, but even this does not guarantee success because of problems of recall – e.g., in an interview to gather data on food consumption, no one is likely to remember exactly what one ate six months ago. In the case of systems that insist certain columns should not be empty, one may work around the problem by designating a value that indicates “unknown” or “missing”, but the supplying of default values does not imply that the data has been made complete.)
- Consistency: The degree to which a set of measures are equivalent in across systems (see also Consistency). Inconsistency occurs when two data items in the data set contradict each other: e.g., a customer is recorded in two different systems as having two different current addresses, and only one of them can be correct. Fixing inconsistency is not always possible: it requires a variety of strategies – e.g., deciding which data were recorded more recently, which data source is likely to be most reliable (the latter knowledge may be specific to a given organization), or simply trying to find the truth by testing both data items (e.g., calling up the customer).
- Conformity: Can be ensured by enabling validation constraints which fall into the following categories:
- Data-Type Constraints – e.g., values in a particular column must be of a particular data type, e.g., Boolean, numeric (integer or real), date, etc.
- Range Constraints: typically, numbers or dates should fall within a certain range. That is, they have a minimum and/or maximum permissible values.
- Mandatory Constraints: Certain columns cannot be empty.
- Unique Constraints: A field, or a combination of fields, must be unique across a dataset. For example, no two persons can have the same social security number.
- Set-Membership constraints: The values for a column come from a set of discrete values or codes. For example, a person’s gender may be Female, Male or Unknown (not recorded).
- Foreign-key constraints: This is the more general case of set membership. The set of values in a column is defined in a column of another table that contains unique values. For example, in a US taxpayer database, the “state” column is required to belong to one of the US’s defined states or territories: the set of permissible states/territories is recorded in a separate State table. The term foreign key is borrowed from relational database terminology.
- Regular expression patterns: Occasionally, text fields will have to be validated this way. For example, phone numbers may be required to have a pattern (999) 999-9999.
- Cross-field validation: Certain conditions that utilize multiple fields must hold. For example, in laboratory medicine, the sum of the components of the differential white blood cell count must be equal to 100 (since they are all percentages). In a hospital database, a patient’s date of discharge from the hospital cannot be earlier than the date of admission.
- Concurrency
- Duplication
- Integrity: The term integrity encompasses accuracy, consistency, and some aspects of validation.
Data Cleansing
Data cleansing or data cleaning is the process of detecting and correcting corrupt or inaccurate records from a recordset, table, or database and refers to identifying incomplete, incorrect, inaccurate, or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data. Wikipedia