We will create system catalog in the data warehouse. Each database has a list of tables. Each table has a list of attributes. Further, Each attribute has a list of constraints. For constraints, it's a good practice to define a default value for each constraint. Otherwise, once you want to merge multiple tables with null value for the attributes associated with constraints, we will need to turn off the constraint to make data merge. The common case would be people forget to turn on the constraints, which might be used to follow data integrity, again, causing a mess of data.
Numerous vendors alleged that they provide MDM (Master Data Management) tools to help companies extract valuable information from the Big data in the information system. Things that companies usually not know is that MDM is largely a human efforts, that is, what we are looking for is not about tools but services. Profiling is a large part of the MDM, extracting, transforming, and loading data.
The core idea of dimensional modeling is star schema with a few factors, primarily including fact and dimensions. From the graph below from Laynetworks, Facts is like the heart of the whole data and all dimension tables connect to fact, so the referential relationship is one to many between dimension and fact. The size of dimension tables are not always fixed, depending on different types and characteristics. Only time dimension table has stable table size since it only relates to the granularity of fact. The maximum size of fact table is product table times customer table times time dimension table, which is the best case that every customer buys every product every time unit.
![]() |
| Source: http://www.laynetworks.com/Diamension-Modeling5.htm |

No comments:
Post a Comment