Cloud & On-Premises Relational 'ETL' Data Integration.
On-Demand Data Lake 'ELT' Data Integration.
Message based Services Data Integration.
In-memory Analytics Data Integration.
Data integration is the process of combining exchanged data from different sources into a holistic data set to facilitate transaction processing and decision support operations.
Data integration unifies disparate data accessed from internal and external sources into holistic data sets. Business operations ranging from front-line transaction processing to modern machine learning analytics rely upon integrating exchanged data in executing their operations.
There are various techniques for integrating data. Selection of one over another depends data processing conditions and constraints of the application that need to be supported. They include whether the data resides locally or needs to be sourced from an external location. The latency tolerance of the business operation that need to be supported is always an important consideration in selecting the appropriate data integration method. The techniques for implementing high-latency batch based solutions are very different from those employed in the delivery of low-latency and near-real time data processing applications.
Integration between one or more data sources is done through identifying common data elements that exist in each of the candidate data sets. The shared data elements have to be syntactically and semantically consistent in order for data sets to be successfully integrated. In cases when the data sets are in a state where they cannot be readily integrated, the candidate data elements have to be transformed prior to the integration. Data integration techniques and technologies vary by batch, low-latency, and near real-time data processing operations.
High-latency batch based data integration are implemented as either ETL (Extract-Transform-Load) or ELT (Extract-Load-Transform) solutions.
ETL integration solutions are implemented for populating relational data warehouses and data marts for supporting reporting and other managed business intelligence services. ETL technologies such as SQL-Server Integration Services (SSIS) are able to simultaneously integrate data sourced from on-premises and cloud hosted data stores. The resultant integrated data is captured within enterprise data warehouses which can reside on-premises, or in the cloud. ETL data integration activities are developed, implemented, and managed by information technology specialists.
ELT data integration is implemented within non-relational ‘big data’ and Data Lake data warehouses. In this case the data is integrated after the sourced data that has been landed within the data warehouse. The landed data is captured in its native form, and integrated at the time of querying by the consumer. In contrast to ETL data integration responsibilities, ELT data integration is managed by the consumer at the time of accessing the data.
Low latency data integration solutions are deployed for supporting high throughput, small data volume Service-Orientated- Architecture (SOA), REST web-services, and micro-services Enterprise Application Integration (EAI) data processing solutions. The data is read, transformed, and integrated in either XML or JSON notations. The solutions are deployed with technologies such as the cloud hosted Azure Service Bus, which orchestrate the sourcing, transformation, integration, and delivery of message based data processing.
Sub-second real-time data processing is implemented by caching and integrating data in-memory as it arrives from event based and other message generating sources. The elasticity of the cloud makes it ideal for implementing memory intensive data integration processing, through leveraging technologies such as Azure Microsoft Streaming Analytics and Apache Storm.