🧩 Data Mesh
✅ Overview
Data architecture treating organization's data as "distributed products" and giving ownership and responsibility to domain teams.
✅ Problems Addressed
- Bottleneck of centralized DWH/Lake.
- Concentration of requests to data platform team (Cannot handle data requests).
- Data quality problem due to lack of domain knowledge.
✅ Basic Philosophy & Rules
- Domain-oriented Data Ownership
- Data Product Thinking (SLO, Quality, API)
- Self-service Data Platform
- Standardized Interface and Governance
✅ Suitable Applications
- Data distributed management in large-scale organizations.
- Operation of independent analysis platform per business division.
- Environment utilizing Data Catalog / Data API.
❌ Unsuitable Cases
- Small-scale organizations (Merit of distribution is thin).
- Cases without governance organization or culture of SLO operation.
✅ History
✅ Related Styles
- Data Lake / Lakehouse: Implementation infrastructure of data retention layer.
- CQRS / EDA: Implementation means of inter-product coordination.
✅ Representative Frameworks
-
Databricks / Lakehouse Platform
Platform suitable for management per data product unit. -
Snowflake + Data Marketplace
Mesh-like operation premised on data sharing is possible. -
AWS Glue / Athena + Lake Formation
Unified management of metadata and access rights of distributed data products. -
Data Catalog (Amundsen / DataHub / Collibra)
Mesh essential component performing data search and ownership management.
✅ Design Patterns Supporting This Style
Data Mesh is closer to organizational structure pattern, but technically related to:
-
Facade
Functions as "entrance" of Data API or Data Product. -
Mediator
Platform plays role of mediating between each data product. -
Strategy
Flexibility to select different processing method / retention format per domain. -
Iterator
Used for abstraction when handling distributed data cross-sectionally.
✅ Summary
Data Mesh is an architecture philosophy where domain teams own data as products and aim for overall optimization while cooperating in distributed manner, against the limits of centralized data infrastructure.
It is for large-scale organizations, and data product management, governance, and SLO operation are essential.