We have evolved our technologies to handle the scale of data in terms of: volume, velocity, variety, veracity (accuracy), and value.
Resources
- https://martinfowler.com/bliki/DataLake.html
- https://martinfowler.com/articles/data-mesh-principles.html#TheGreatDivideOfData
- https://martinfowler.com/articles/data-monolith-to-mesh.html
- https://www.youtube.com/watch?v=L_-fHo0ZkAo
2 Types of Data & Why an Enterprise Needs Them
Link to originalSay we have an enterprise with:
- a group of databases that runs the business (i.e. operational data plane)
- a group of analysts to analyze business patterns to boost revenue (i.e. analytical data consumers)
The consumers may analyze directly off from the operational data as depicted below.
The problems with this approach are:
- a consumer need may affect the operational data plane (e.g. An analytical consumer wants to know the number of times each user logged in. This requires adding a column to track the number of logins. This column isn’t needed for business. Additional columns like this will bloat the operational data plane)
- managing private data is difficult
The solution is to introduce a separate data plane solely used for analysis.
The 2 Types of Data
- operational data plane - sits in databases behind business capabilities, has a transactional nature, keeps the current state, and serves the needs of the applications running the business
- analytical data plane - is a temporal and aggregated view of the facts of the business over time, often modeled to provide retrospective or future-perspective insights; it trains the ML models or feeds the analytical reports
Analytical Data Plane Architecture Types
There are 3 main architectures to choose from for setting up an analytical data plane:
Data Warehouse - structured data
In a warehouse, stocked items are organized along aisles and selves. That is how a data warehouse sees and stores data. Except for Amazon’s warehouses which are “randomly” stocked.
A SINGLE schema is defined for a data warehouse. Choosing the right schema is important because it determines what kind of information the consumers are allowed to retrieve and analyze. In some cases, a single schema may not work for 2 conflicting consumer needs.
- ETL (Extract Transform Load) - transforms the operational data to fit the schema
- SQL (Structured Query Language) - a structured query language is used to retrieve the data out from the data warehouse
Data Warehouse Cons
- a single schema is restrictive, especially as the number of operational data sources and consumers increases:
- some operational data may not fit the schema
- some consumer needs may not be satisfied by the schema
Data Lake - unstructured data
In a lake there’s NO structure, it just contains water. That is how a data lake sees and stores data.
No schema is defined for a data lake. This allows any kind of data to be stored into the data lake, thus allowing the satisfaction of multiple conflicting consumers with different needs.
- E-L - usually no transformation is needed since there is no schema. Hence, raw data are loaded into the data lake
- ETL (Extract Transform Load) - each consumer will have their own transformation step in order to make any sense out of the data in the data lake
- Lakeshore Data Marts/Warehouses - sometimes used when multiple consumers use similar transformations and/or analyzes similar data
Data Lake Cons
- no schema means all the effort is placed on the consumers to make any sense out of the data lake. As the number of operational data sources increases the lake becomes more like a swamp.
Data Mesh - data as product
A data mesh is a response to the scalability issues of both: data warehouses and data lakes.
The central idea behind a data mesh is: data as a product
Domain Data as Product
The idea is to break up the analytical data plane into cohesive data domains. For example, in an e-commerce enterprise we have: users data, items data, claims data, etc.
Then for each domain, we construct a “product” of the specified domain data
Domain data products can be linked together. For example, an e-commerce analytical data plane may look like this:
Data Mesh Cons
- TODO
Usually, a single architecture is used for the entire enterprise.
Data Warehouse & Data Lake - Similarities
- both are online analytical processing (OLAP) typed databases that house analytical data
- both contain data that are ETLed from operational data housed by usually online transactional processing (OLTP) typed databases
- both allow you to run analytics without the need to move your data to a separate analytics system
Data Warehouse & Data Lake - Tech Stacks
|
Factors |
On-Premise |
Private Cloud | |
|---|---|---|---|
|
Maintenance |
hard |
hard |
easy |
|
Monthly Cost |
economic with large datasets |
predictable |
predictable |
|
Vendor Lock-in |
avoidable |
avoidable |
not avoidable |
|
Suitability |
for large corporations |
for all businesses |
ideal for startups |
|
Investment |
substantial in the beginning |
increases as data grows |
increases as data grows |
|
Examples |
|
|
|
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/operational-data---analytical-data/../../../../../../../computer/software/fullstack-development/design-patterns/architectural/architecture-design-patterns-(adp)/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/operational-data---analytical-data/operational-data-analytical-consumers.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/operational-data---analytical-data/../../../../../../../computer/software/fullstack-development/design-patterns/architectural/architecture-design-patterns-(adp)/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/operational-data---analytical-data/operational-data-and-analytical-data.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/data-warehouse-architecture.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/data-lake-architecure.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/data-mesh-architecture.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/data-mesh-component.drawio.png)
/data-warehouse---data-lake---lakeshore-data-mart/warehouse---data-mesh/data-mesh-component-stacked.drawio.png)