Data Warehousing





Facts and
Dimensions

Creating a data warehouse starts with understanding what your organization is trying to gain from building a data warehouse. This process should identify the facts and the dimensions that need to be managed. Tasks for engineering an effective, efficient data warehouse include:

  • Define the business rules and existing data sources.
  • Identify data elements in the data sources needed to provide the facts and dimensions.
  • Identify missing data elements, derived and summarized data elements.
  • Name the data warehouse data elements and associate with the aliases in the on-line systems.
  • Define data transformation, cleansing, and validation rules for each of the data elements to be included in the data warehouse.
  • Create the data model(s), normalize the tables, and identify referential constraint.
The goal of an efficient design is to maintain the lowest level of detail information that is possible without sacrificing usability. It is always possible to summarize detailed information to get speed. If summarized information is the only thing stored, then it will not be possible to return to the detail later or change the dimensions associated with the summarization.

Support tables are added to reduce development and maintenance costs. Aggregate entities are added on an individual case basis to improve speed and maintainability. This results in a star schema that provides a flexible structure.

The database is created from this model and the work of populating the data warehouse begins. This requires tools to capture the data from the on-line databases at regular intervals. This data is cleansed, translated, massaged and then placed into the data warehouse. How this is accomplished is dependent upon the requirements of the data being captured. It may be a simple program written specifically to handle the data or it may be an acquired commercial tool to handle the capture and cleansing.




Metadata

To be useful, the data contained in the warehouse needs to be accessible. This is where the real work occurs and value of the information comes into play. This may require custom applications, simple off-the-shelf query front-ends, OLAP tools, or multidimensional databases.

If you wish to integrate the metadata captured during the initial phase, then the selection of tools used during the design and in the data capture is very important. For the metadata to be useful, it needs to be stored in a common location and preferably in a common format. Tools such as Oracle's Designer 2000, LogicWorks ERWIN/ BPWIN can store the metadata about the models in a repository. Many tools, in the future, will provide this capability using the MS Repository. The information in a repository can be used to provide information on the source and timing of information stored within the data warehouse.


[Home]  [Profile]  [Solutions]  [Projects]  [Products]  [Inquire]

Twisted Pair Software, Inc.
St. Louis, MO
Voice:  (636)458-5740  Fax:  (314)716-7058

© 1997-2005 Twisted Pair Software, Inc.  All Rights Reserved.

Revised:  04/13/05