Okay, so I am still confused by the concept of a Data Mesh (I’m a slow learner).
Recently, I wrote a blog that posed the question: Are Data Meshes the Enabler of the Marginal Propensity to Reuse? The ability to engineer data and analytic assets that can be shared, reused, and continuously refined is the heart of what makes data (and analytics) the most valuable resource in the world. And if Data Meshes can make this happen, then it’s a winner in my eyes.
A Data Mesh is composed of three separate components: data sources, data infrastructure, and domain-oriented data pipelines managed by business function owners. Underlying the data mesh architecture is a layer of universal interoperability, reflecting domain-agnostic standards, as well as observability and governance (Figure 1).
Figure 1: What is a Data Mesh?
A data mesh supports a distributed, domain-specific data structure that views “data-as-a-product,” with each domain handling their own data pipelines. The tissue connecting these domains and their associated data assets is a universal interoperability layer that applies the same syntax and data standards.
Wait. This sounds very familiar:
- Is a distributed, domain-specific data structure the same as business process-specific data marts?
- Is the universal interoperability layer the same as the Enterprise Data Warehouse Bus Architecture?
Sounds like a Data Mesh is a modern-day version of Business Process-centric Data Marts connected with Conformed Dimensions and the Enterprise Data Warehouse Bus Architecture. Heck, that’s something that Ralph Kimball and Margy Ross developed and have been teaching for over 30 years. They literally wrote the book(s) on this – starting with the “The Data Warehouse Toolkit: The Definitive Guide to Dimensional Mod…” – in the 1990’s.
What can we learn from the work that Ralph and Margy pioneered, and the countless organizations that have successfully implemented these business process-centric data marts connected with conformed dimensions and the enterprise data warehouse bus architecture?
A Data Mart is a collection of related fact and dimension tables that is typically derived from a single business process data source. A data mart is a business process-oriented data repository that is focused on a single business process such as orders, shipments, and payments (Figure 2).
Figure 2: Business Process-centric Data Mart Architecture
The Data Mart approach tackles one business process at a time, so it’s an incremental approach rather than trying to build the Data Warehouse all at once. And any organization that wants to analyze those business process’s data should be involved in the design (e.g., both Marketing and Sales would be involved in the design of the orders data mart).
The Data Mart approach overcomes the concerns with the big bang enterprise data warehouse approach where the enterprise data warehouse needs to first be built before organizations can start deriving value from their data.
Important Note: data marts are tied to business processes, not departments or business units. A business process is a set of activities (e.g., orders, sales, payments, returns, procurement, logistics, repairs) that accomplish a specific organizational goal and from which operational data is generated and business process performance is monitored and reported. And it is at the business process level that data quality and data governance can be managed.
The key to linking or federating the business process-oriented data marts are the conformed dimensions. Conformed Dimensions are common, standardized, master (file) dimensions (e.g., Customers, Products, Suppliers, Stores, Employees, Campaigns) that are managed once in the ETL or ELT processes (data pipelines) and then reused or federated across multiple business process fact tables. The benefit of conformed dimensions is that they enforce consistent descriptive attributes across the fact tables; provides a common enterprise glossary to avoid terminology inconsistencies and organizational confusion.
Conformed Dimensions enable cross enterprise analysis with ability to drill across and integrate data from different business processes. Also, reusing conformed dimensions shortens time-to-market by eliminating redundant design and development efforts.
Note: One key aspect of the conformed dimensions approach is that the enterprise bus architecture is based on “business processes,” not business departments or business units. Business processes are the organization’s business measurement events, like orders, shipments, payments, etc. (Figure 3).
Figure 3: Conformed Dimensions and Enterprise Data Warehouse Bus Architecture
Conformed Dimensions are underpinned with the Enterprise Data Warehouse Bus Architecture, which enables the decomposition of an organization’s data and analytics strategy into manageable pieces by focusing on the organization’s core business processes, along with the associated conformed dimensions.
The Conformed Dimensions approach enables the business process data to be stored once in an organization (with dimensions that are shared across measurement events) rather than Sales having their copy of orders data and Finance having their version of orders data and Logistics having their copy. If you store data by business department or business unit, then you’ll likely end up multiple departmental or business unit versions of the truth.
The Enterprise Data Warehouse Bus Matrix is a key design tool representing the organization’s core business processes and associated dimensionality. It’s the architectural blueprint that provides a top-down strategic perspective to ensure data from the different business processes can be integrated across the enterprise, while agile bottom-up delivery occurs by focusing on a single business process at a time (Figure 4).
Figure 4: Enterprise Data Warehouse Bus Matrix
So, upon closer inspection, it sure does seem that the Data Mesh is really the next generation of business process-specific data marts interconnected using conformed dimensions and the enterprise data warehouse bus architecture.
In a world of conformed dimensions and data meshes, master data management becomes the key connecting tissue. Master data management is a business discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of the enterprise’s official shared master data assets.
From the Master Data Management guru and author of the seminal book “Telling Your Data Story: Data Storytelling for Data Management”, Scott Taylor, we get the following about the critical role of master data management in a world of business process-centric data marts interconnected with conformed dimensions.
Whatever the socio-technological impacts of a mesh or interconnected architecture, the data still needs to be mastered somehow, governed always, and trusted everywhere across the enterprise. The need for master data, reference data, and metadata is macro-architecture agnostic.
Master data is tangible outcome. Mesh proponents might debate MDM as an approach but it’s hard to make a case against having master, reference, and meta data. The data mesh approach calls for unique identifiers on entities to link shared data across systems. That sure sounds like master data to me!
It is an absolute requirement for data management and data monetization success that there be tight coordination across the different business processes with respect to master data management and data governance.
I feel a bit like Marty McFly and heading “Back to the Future” based upon the work that I was doing with Ralph, Margy, and Bob Becker 20+ years ago. But business process-centric data marts interconnected with enterprise governed conformed dimensions and master data management works. And now the concept of Data Meshes promises to take advantage of modern-day technology advances to refresh the advantages of a business process approach to data warehousing, data management, master data management, and data governance.
Let’s hear it, Huey Lewis…
 Source: “Data Mesh 101: Everything You Need To Know to Get Started”, Monte Carlo Data