Data is not valuable - insights are
Data is not valuable - insights are
Often people say that
DATA IS VALUABLE! YOU NEED MORE DATA!
But data has no value. In fact, it is costly requiring many resources to collect and store it. Then what’s the point?The insights.
The insights from the data are what provide the value. Going even further, it’s the actions taking from the insights that provide real value.
So storing your data isn’t enough. You need to generate insights, AND present them in a compelling way that causes action.
Link to original
Data marts contain atomic level information, not aggregation
Data marts contain atomic level information, not aggregation
A datamart will always contain atomic level information, that way queries can drill all the way down. If a datamart is buillt on aggregation of values then it will never be able to provide information on the lowest levels.
References
Link to original
Datamarts focus on the data source
Datamarts focus on the data source
When making a datamart it focuses on the source of the data, not how a department views them.
For instance, if HR viewed employees one way, and finance viewed them separately, we still only have the one datamart dealing with employees. (A bad example as employees are actually a dimension in a few datamarts.)
References
Link to original
How is a datamart designed
How is a datamart designed
A datamart is a set of dimensional tables supporting a business process.
This means that all the data and information are geared towards a specific business process, and similar queries.
References
Link to original
Keep customers out of the DW Backroom
Keep customers out of the DW Backroom
The backroom is ONLY for the ETL process, in other words, only for cleaning and preparing data from the raw systems and then passing it on to the presentation layer.
To this end, queries should NEVER be run in the backroom. If you start making exceptions and allowing end users or report developers to have direct access to the back room then you have fatally compromised the data warehouse.
References
Link to original
The mission of the Data Warehouse
The mission of the Data Warehouse
The purpose of a data warehouse is to extract, clean, conform and then deliver the data to end users.
It can be thought of as a magazine publisher, there is tons of work that goes on in the background the readers never see, however, it the readers that the success of the magazine depends on.
So too does a data warehouse rely on the end users to justify its success.
References
The Data Warehouse ETL Toolkit pg 22, 23
Link to original
The Problem with developing one off tables
The Problem with developing one off tables
Say you have need for a new table. You make said table. You even like said table, but you design your table for only your use. The problem is, sooner or later someone else is going to find that table, they’re going to look at it and realize that they want to use it too.
So now you have two people using said table, you, and the person who has no idea what the fuck the table actually represents. So be wary of creating tables like that. Sooner or later it will come back to haunt you.
References
Link to original
- The Data Warehouse ETL Toolkit - pg 49
Think before you change that column
Think before you change that column
Because of the nature of the ETL life cycle, small changes can have important and not inconsequential impacts. Before you update a column’s type, or the data that it contains, or even add or remove columns you must perform Impact Analysis for the change.
This means going through and finding what relies on the data, and how it will be affected by the changes. Communication is necessary between the different parties involved in the data, generally the ETL architect, the ETL team and the source system DBA.
References
Link to original
- The Data Warehouse ETL Toolkit - pg 49
What are the business needs ETL needs to think of
What are the business needs ETL needs to think of
When it comes to what the ETL needs to provide, it boils down to the information the people in decision making roles need in order to make informed business decisions.
But the business needs are not set in stone. As such it is necessary to constantly reassess business needs and adjust the deliverables as necessary.
The only way to understand the business needs and how they change is to keep an open dialog with the end users.
References
Link to original
Why do we need to data profile
Why do we need to data profile
Before we can use a data source, it is imperative that we first check the data for completeness. This means looking at each column we wish to use and check it for things like missing values, inconsistent data and more. If the data is not profiled first, then valuable time can be wasted down the track. Doing data profiling can protect the ETL team from unforseen dirty data. In the words of Kimball:
Do the data profiling up front!
References
Link to original
- The Data Warehouse ETL Toolkit pg 5, 6