Ingest to AWS

The Galileo Elastic Enrichment Cloud ingests large data sets from instances of the HIEBus OLTP transactional data model implemented in disparate SQL Server deployments.

SQL Server Integration Services (SSIS) is used to create file-based exports of HIEBus careevolutiondata database tables on the file system of the HIEBus SQL Server. The extracts are scheduled and initiated with SQL Server Agent. These extracts are optimized for high performance and minimal load on the HIEBus OLTP database. In order to retain these “high performance” and “least possible load” characteristics, these exports perform no data transformations (including column data type conversions) or filtering. The only exception is a query-style filter on “modifieddate” which is used in differential operations to acquire the small subset of data since the last export. These export files provide the critical link between HIEBus SQL Server and AWS.

After file creation these export files are uploaded to S3 in order to make them accessible to AWS Services.

The export files are maintained in an S3 folder structure that serves as a processing queue for ingest into a Redshift schema dedicated to replicating the HIEBus OLTP data structure in a partitioned MPP architecture. During the ingest process, these files are first copied to a temporary “ingest” schema in Redshift that exactly represents the structure and contents of the file. Then, the ingest process determines statistics about the ingest data and merges this data from the “ingest” schema into a Redshift table that is designed to mirror the exact structure and content of the source HIEBus table.