Data is not valuable - insights are

Data is not valuable - insights are

Often people say that DATA IS VALUABLE! YOU NEED MORE DATA! But data has no value. In fact, it is costly requiring many resources to collect and store it. Then what’s the point?

The insights.

The insights from the data are what provide the value. Going even further, it’s the actions taking from the insights that provide real value.

So storing your data isn’t enough. You need to generate insights, AND present them in a compelling way that causes action.

Link to original

Data marts contain atomic level information, not aggregation

Data marts contain atomic level information, not aggregation

A datamart will always contain atomic level information, that way queries can drill all the way down. If a datamart is buillt on aggregation of values then it will never be able to provide information on the lowest levels.

References

Link to original

Datamarts focus on the data source

Datamarts focus on the data source

When making a datamart it focuses on the source of the data, not how a department views them.

For instance, if HR viewed employees one way, and finance viewed them separately, we still only have the one datamart dealing with employees. (A bad example as employees are actually a dimension in a few datamarts.)

References

Link to original

How is a datamart designed

How is a datamart designed

A datamart is a set of dimensional tables supporting a business process.

This means that all the data and information are geared towards a specific business process, and similar queries.

References

Link to original

Keep customers out of the DW Backroom

Keep customers out of the DW Backroom

The backroom is ONLY for the ETL process, in other words, only for cleaning and preparing data from the raw systems and then passing it on to the presentation layer.

To this end, queries should NEVER be run in the backroom. If you start making exceptions and allowing end users or report developers to have direct access to the back room then you have fatally compromised the data warehouse.

References

Link to original

The mission of the Data Warehouse

The mission of the Data Warehouse

The purpose of a data warehouse is to extract, clean, conform and then deliver the data to end users.

It can be thought of as a magazine publisher, there is tons of work that goes on in the background the readers never see, however, it the readers that the success of the magazine depends on.

So too does a data warehouse rely on the end users to justify its success.

References

The Data Warehouse ETL Toolkit pg 22, 23

Link to original

The Problem with developing one off tables

The Problem with developing one off tables

Say you have need for a new table. You make said table. You even like said table, but you design your table for only your use. The problem is, sooner or later someone else is going to find that table, they’re going to look at it and realize that they want to use it too.

So now you have two people using said table, you, and the person who has no idea what the fuck the table actually represents. So be wary of creating tables like that. Sooner or later it will come back to haunt you.

References

Link to original

Think before you change that column

Think before you change that column

Because of the nature of the ETL life cycle, small changes can have important and not inconsequential impacts. Before you update a column’s type, or the data that it contains, or even add or remove columns you must perform Impact Analysis for the change.

This means going through and finding what relies on the data, and how it will be affected by the changes. Communication is necessary between the different parties involved in the data, generally the ETL architect, the ETL team and the source system DBA.

References

Link to original

What are the business needs ETL needs to think of

What are the business needs ETL needs to think of

When it comes to what the ETL needs to provide, it boils down to the information the people in decision making roles need in order to make informed business decisions.

But the business needs are not set in stone. As such it is necessary to constantly reassess business needs and adjust the deliverables as necessary.

The only way to understand the business needs and how they change is to keep an open dialog with the end users.

References

Link to original

Why do we need to data profile

Why do we need to data profile

Before we can use a data source, it is imperative that we first check the data for completeness. This means looking at each column we wish to use and check it for things like missing values, inconsistent data and more. If the data is not profiled first, then valuable time can be wasted down the track. Doing data profiling can protect the ETL team from unforseen dirty data. In the words of Kimball:

Do the data profiling up front!

References

Link to original