Data Warehousing

  • Relational Data Warehousing & Conformed Data Marts.
  • Non-Relational Data Lake Warehousing.
  • High-Volume, High-Velocity, High-Variety Historical Data Capture.
  • Integrated Relational & Non-Relational Data Warehousing.

Successful decision making relies upon the availability of information that allows to accurately analyze past historical events in preparing future decisions. Unfortunately On-line transaction processing (OLTP) systems where the business data is initially captured are unsuitable for supporting such informational needs.

OLTP systems are designed for high frequency, low data volume, rapid transaction processing. They are optimized for minimal transaction processing times, which are achieved by only storing the data required to execute the transactions supported by the system.

In contrast historical reporting transactions are less frequent and executed over much greater data volumes. They are require accessibility to constantly stable snapshots of historical activity. Data warehousing addresses the historical business intelligence and analysis shortcomings of OLTP environments.

Data warehouses have traditionally been implemented as relational databases to store integrated, quality assured, ‘time-variant’, and ‘non-volatile’ data for supporting cross-functional enterprise-wide decision support activities. The modern non-relational data warehouse is architected to store ‘time-variant’, and ‘non-volatile’ data that may not necessarily be fully integrated or quality assured prior to being published to the consumer.

Relational data warehousing structures, conforms, and harmonizes operational and reference retrieved from disparate sources for capture in the centralized Enterprise Data Warehouse (EDW) database. ETL applications transformed source data in accordance with data quality and other data management standards, prior to loading the integrated content into the EDW.

The EDW becomes the business’s ‘single version of the truth’ for supporting cross-functional business intelligence and reporting informational services. Eliminating the delays, costs, and inaccuracies experienced in reconciling reports generated in silo.

Data warehousing has extended beyond the traditional centralized on-premises relational EDW. It now includes cloud hosted Data Lakes and ‘big data’ stores to cater for the massive volumes of high variety non-relational data required for modern analytics.

Data warehouses are designed to store ‘non-volatile’ and ‘time-variant’ snapshots of historical business activity. An essential necessity for supporting period-over-period business analysis and reporting.

Data captured within the data warehouse is timestamped upon retention. The timestamp identifies the time of the business activity for which the data is recorded. Once captured, the data remains unchanged over its duration in the data warehouse. Future instances of the business data are recorded in the data warehouse with their own unique timestamp to reflect the period of business activity it represents. Creating a ‘time lapsed’ historical reflection of business events.

Relational data warehouses are developed, implemented and maintained by information technology (IT) specialists. IT data architecture teams design the data warehouse’s integrated relational data structures, while application teams are responsible delivering the ETL solutions for sourcing, transforming, and capturing the data in the data warehouse.

​Relational EDWs are implemented for supporting enterprise-wide high quality, integral, and consistent business intelligence services. They include internal cross-functional operational, management, and executive reporting. The EDW has generally become the de-facto source for financial, legislative, regulatory, and other forms of mandated external business reporting. ​

Data Lakes and 'big data' stores provide alternative data warehousing opportunities for storing non-relational semi-structured, and unstructured data arriving at high-velocity, in high-volumes, and in varieties of forms. The data is initially captured in its native form upon arrival. It may optionally be curated prior to being published to the consumer. Unlike relational data warehouses, consumers can be allowed to access the data from either of the ‘landing’, ‘curated’, or ‘publication’ zones. The sourced data is transformed and integrated by the consumer at the time of reading (schema on read).

Technologies such Microsoft Polybase and AWS Athena allow for heterogeneous relational and non-relational data to be concurrently queried directly from the respective data management technology. Providing decision makers with a simple, cost-efficient means of gaining a 360 perspective of the business environment.

    © 2020 Data2Intelligence Inc.