
|
Creating a data warehouse starts with understanding what your organization is trying to gain from building a data warehouse. This process should identify the facts and the dimensions that need to be managed. Tasks for engineering an effective, efficient data warehouse include:
|
| The goal of an efficient design is to maintain the lowest
level of detail information that is possible without sacrificing usability. It is always
possible to summarize detailed information to get speed. If summarized information is the
only thing stored, then it will not be possible to return to the detail later or change
the dimensions associated with the summarization. Support tables are added to reduce development and maintenance costs. Aggregate entities are added on an individual case basis to improve speed and maintainability. This results in a star schema that provides a flexible structure. The database is created from this model and the work of populating the data warehouse begins. This requires tools to capture the data from the on-line databases at regular intervals. This data is cleansed, translated, massaged and then placed into the data warehouse. How this is accomplished is dependent upon the requirements of the data being captured. It may be a simple program written specifically to handle the data or it may be an acquired commercial tool to handle the capture and cleansing. |
|
|
To be useful, the data contained in the warehouse needs to be
accessible. This is where the real work occurs and value of the information comes into
play. This may require custom applications, simple off-the-shelf query front-ends, OLAP
tools, or multidimensional databases. If you wish to integrate the metadata captured during the initial phase, then the selection of tools used during the design and in the data capture is very important. For the metadata to be useful, it needs to be stored in a common location and preferably in a common format. Tools such as Oracle's Designer 2000, LogicWorks ERWIN/ BPWIN can store the metadata about the models in a repository. Many tools, in the future, will provide this capability using the MS Repository. The information in a repository can be used to provide information on the source and timing of information stored within the data warehouse. |
Twisted Pair Software, Inc. © 1997-2005 Twisted Pair Software, Inc. All Rights Reserved. Revised: 04/13/05 |