[repost from Stephen Forte`s Blog]
Most developers are scared of “Business
Intelligence” or BI. Most think that BI consists of cubes, pivot/drill
down apps, and analytical decision support systems. While those are very
typical outcomes of a BI effort, many people forget about the first
step, the data warehouse.
Typically this is what happens with a
BI effort. A system is built, usually a system that deals with
transactions. We call this an OLTP or on-line transaction processing
system. Some time passes and reports are bolted on and some business
analysts build some pivot tables from “raw dumps” of data. As the system
grows, reports start to slow since the system is optimized to deal with
one record at a time. Someone, usually a CTO type says: “we need a BI
system.” A development effort is then spent to build a data warehouse
and cubes, and some kind of analytical system on top of those cubes.
make the argument that developers and project planners should embrace
the data warehouse up front. When you design your OLTP system, also
design the supporting data warehouse, even if you have no intention of
building a full-fledged BI application with cubes and the like. This way
you have two distinct advantages. First is that you have a separate
system that is optimized for reporting. This system will allow the rapid
creation of many new reports as well take the load off the OLTP system.
Second, when you do decide to build a BI system based on cubes, you
will already have the hard part done, building the data warehouse and
Since a data warehouse uses more of a flatter
data model (more on this in Part II), you can even design your
application to use both the OLTP and data warehouse as data sources. For
example, when you have highly normalized, 3rd normal form
transactional tables to support transactions, it is never easy to use
those tables for reporting and displaying of information. Those tables
are optimized and indexed to support retrieving and editing (or
adding/deleting) one record at a time. When you try to do things in
aggregate, you start to stress your system, since it was designed to
deal with one record at a time.
This design pattern is already
in use today at many places. Consider your credit card company for
example. I use American Express, and I never see my transactions show up
for at least 24 hours. If go buy something and I phone American Express
and say “what was my last transaction” they will tell you right away.
If you look online, you will not see that transaction until the next
business day. Why? When you call the customer service representative,
they are looking at the OLTP system, pulling up one record at a time.
When you are looking online, you are looking at the data warehouse, a
system optimized for viewing lots of data in a reporting environment.
can take this to an extreme, if you ran an e-commerce site, you can
power your product catalog view portion of the site with the data
warehouse and the purchase (inventory) system with the OLTP model.
Optimize the site for browsing (database reads) and at the same time
have super-fast e-commerce (database writes.) Of course you have to keep
the purchasing/inventory (OLTP) and product display (data warehouse)
databases in sync. I’ll talk about that in Part III. Next, I will take a
look at how to build the data warehouse.