Relational Data Warehousing & Conformed Data Marts.
Non-Relational Data Lake Warehousing.
High-Volume, High-Velocity, High-Variety Historical Data Capture.
Integrated Relational & Non-Relational Data Warehousing.
Successful decision making relies upon the ability to accurately review and analyze the results of past business operations. Decision makers require the ability to analyze historical trends, and review the results of their previous actions in order to accurately predict future judgement calls.
On-line Transaction Processing (OLTP) operational systems are architected to optimally process front-line business transactions. OLTP systems are designed to efficiently process large quantities of small data volume transactions. In contrast, business reporting generally involves processing large volumes of data over a fewer number of data processing requests. Making front-line OLTP systems are unsuitable for supporting business reporting needs. Data warehousing addresses the reporting challenge with OLTP systems.
Data warehouses are designed to store integrated, quality assured, ‘time-variant’, ‘non-volatile’ enterprise-wide cross-functional historical business activity snapshots.
Enterprise data warehousing operations are responsible for sourcing disparate operational and reference data, which is conformed and harmonized in accordance with the organization’s data quality and other data management standards prior to be integrated and captured in the data warehouse. This results in an enterprise level ‘single source of truth’ for supporting consensual cross-functional business intelligence reporting and analytics. Avoiding the delays, costs, and inaccuracies experienced in attempting to reconcile reports generated within business silos.
The ‘non-volatile’ and ‘time-variant’ characteristics of a data warehouse ensure the captured data remains intact over time. This is essential for supporting for period-over-period management reporting and analysis. Data for a particular business entity is tagged with a timestamp upon capture. It then remains unchanged over its life-time in the data warehouse. Further instances of the business entity are recorded as additional entries, each uniquely distinguished with its own ‘occurrence’ timestamp.
Data warehousing solutions range from traditional centralized enterprise relational databases, to more recent innovations for capturing non-relational data within ‘big data’ stores and cloud hosted data lakes.
Relational data warehouses are designed, constructed, and managed by the organization’s information technology (IT) departments. They are responsible for implementing the data structures (‘schema on write’) and developing the ETL processing to source, transform, capture, and disseminate data within warehouse. Data management controls assure the availability of integral and trusted data resources, ideal for supporting corporate reporting needs with minimal error tolerance, and susceptibility to governing body audits. The central data warehouse repository is further extended with satellite conformed data marts that are optimally tuned for end-user querying.
Modern data warehousing has extended beyond relational databases to include non-relational cloud hosted data lakes and 'big data' stores. Non-relational data warehouses are designed to accommodate data arriving at high-velocity, in high-volumes, and in varying forms of structured, semi-structured, and unstructured data types. The data is captured in its native form, curated, and published for consumer to structure the data according to their needs (‘schema on read').
Unlike relational data warehousing where consumers are constrained to accessing the data in the state published by the IT custodians, authorized consumers are able to access data from a non-relational data warehouse in either its raw or prepared state. Business users have the flexibility to access the data from the landing, curation, and publication zones of a non-relational data warehouse. This enables business consumers with greater autonomy and agility over the provisioning of data within a self-serve business intelligence environment.
Innovative technologies such Microsoft Polybase and AWS Athena allow for relational and non-relational data warehousing environments to be simultaneously accessed and queried with SQL like constructs. They provide a simplified, cost-efficient means of provisioning data from heterogeneous data warehousing technologies to provide decision makers with an informed 360 perspective of their business environment.