MDM

EBX Master Data Management – Sample Job Interview Questions

What are the different classification of Data?

Data can be widely classified as:

  • Transactional Data
  • Master Data
    • Reference Data
    • Meta Data
  • Analytical Data

What is Master Data?

Master data is typically persistent, non-transactional data utilized by multiple systems that define the primary business entities. Master Data may include data about customers, products, employees, inventory, suppliers, and sites.


What is Data Profiling?

Data profiling is the process of examining the data available from an existing information source and collecting statistics or informative summaries about that data.

  • They are widely categorized as
    • Structure discovery
    • Content Discovery &
    • Relationship Discovery

In order to do the above discoveries we can perform the following Algorithmic Techniques:

  • Table Analysis
    • Completeness
    • Freshness
    • etc…
  • Column Analysis
    • Range
    • Completeness
    • Uniqueness
    • Pattern
    • Max value
    • Min value
    • Conformity
    • etc…
  • Cross Column Analysis
    • Dependency
    • Key Analysis
  • Cross Table Analysis
    • Foreign Key Analysis
    • Identification of Orphaned records

What is Data Quality?

Data Quality or Data Assessment is the column profiling process performed by analyst to find the following:

  • The number of distinct values
  • Uniqueness or duplicates
  • The highest (maximum) values
  • The lowest (minimum) values
  • The mean and median value (for numeric data)
  • The standard deviation (for numeric data)
  • Number of nulls
  • Discovered patterns, if any
  • For each column’s set of values, verification that inferred data type is consistent with documented data type
  • For each column’s set of values, verification of the validity of values▪
  • The most frequently occurring values
  • The least frequently occurring values
  • A visual inspection for consecutive values, similar values, incorrect values

What do you mean by Master Data Management?

Master data management represents the data management disciplines and processes. MDM typically involves the following processes:

  • Data Profiling
  • Data Validation
  • Data De-duplication
    • Survivorship & Stewardship
  • Stewardship using workflow

What is multi-domain MDM?

Multi-domain MDM is the MDM implementation method which involves managing technical domains (such as Master Data, Reference Data & Metadata) as well as business domains (such as customers, vendors, supplies, products, locations, assets etc… ) in a cross-functional way.

What is multi-vector MDM?

Multi-vector MDM is the implementation of multi-domain and multiple implementation styles

What are the different implementation styles of MDM?
  • Registry Style (Data is created in the different source system. MDM only link different records and creates a registry. In this case MDM is used in Read-only mode whereas the Sources are used in Read/Write mode)
  • Consolidation Style (Data is created in the different source system. MDM harmonizes them by finding duplicates and merging them to create a Golden record. In this case, MDM is used in Read mode and the Sources are used in Write mode)
  • Coexistence Style (Some records are created in the different source systems. Some records are authored in the MDM system and they both exist. In this case, MDM is used in Read/Write mode, as well as the Sources, are used in Read/Write mode)
  • Centralized Style (All master records are consolidated using MDM and going forward they are authored within the MDM system and the source systems are decommissioned. In this case, MDM is used in Read/Write mode whereas the Sources are used in Read mode only)

How do you implement Consolidation Style MDM using EBX?

Let’s assume there are Customer data silos within the organization as data is collected by the CRM system as well as ERP system. In order to find a wholistic view of Customer Data. We can perform the following steps in the given order.

  1. Land the data into EBX. Create a Landing table structure matching exactly with the source system file/table structure.
  2. Transfer the data from Landing to Staging (Staging table structure will match closely with the Master table structure). During the transfer, you can perform various transformations ( data type transformation, cross-reference lookup, concatenation, split, constants, etc…). In the staging area, you can enrich the data using third party services for either address, email, phone, etc.
  3. Execute the de-duplication process in the Mastering area. The de-duplication can either be automatic or manual based on the strength of the similarity score.
    1. The similarity score is generated by EBX based on the match rules. Either 1 or multiple match rules can be configured based on data context. And the matching rule consists of a combination of different fields and a matching algorithm that generates a weighted average score.
    2. The system is sophisticated enough to automatically merge duplicate records based on survivorship logic if the similarity score is above the maximum threshold. Survivorship logic can be configured based on the trust factor of the source system or the freshness of the record or the number of times a system is providing a record, etc…
    3. If the similarity score is above the lower threshold but lower than the maximum threshold then EBX can generate work items for data stewards to resolve duplicate clusters.
    4. EBX is also capable of deciding whether to generate a new Golden or make an existing sourced record as Golden based on configuration.
  4. Once a Golden record is generated we can publish them to downstream systems either in real-time or using scheduled jobs.

How do you Data Life Cycle management using EBX?
How do implement Audit tracking using EBX?
How do you implement Hierarchy using EBX?
What are the different types of data modeling techniques available in EBX and what are their advantages?
What functionalities are available as a core module and what is available as add-ons?
What application infrastructure is supported by EBX?
What MDM implementation style is supported by EBX?