
A new data processing engine deployed in HDFS that will process a users request to publish files directly into memory. This empowers business users to take advantage of raw data in ways that they have never done before. The real performance value of this engine comes as data is transferred back to the Strategy Intelligence server in parallel without using an ODBC or JDBC driver. This leads to a great amount of performance improvement in cube publication. This engine can currently be deployed on Cloudera and Hortonworks and we are planning to add support for MapR.
As you all are aware, ODBC drivers have limitations that are only exacerbated when dealing with Big Data.
Limitation #1: ODBC/JDBC drivers require you to create tables.
The first limitation that many customers have is that they have to take the steps to load the data into memory. This can seem counterproductive as a company has to take the time to organize this data into tables when many new Big Data technologies are designed to get the data into the hands of the users as quickly as possible. This problem is only magnified by the fact that many Big Data cluster admins are very busy
Limitation #2: ODBC Bottleneck
There can be an ODBC bottleneck when loading large amounts of data into memory – (what do I mean by ODBC bottleneck?) that is too much data is trying to be transferred from Hadoop to Strategy through a thin ODBC driver thread.
The Hadoop Gateway directly responds to these limitations.
Hadoop Gateway enables users to directly access HDFS files
Big Data Cluster Admin too busy or unavailable to load data into Hive? The Hadoop Gateway empowers Business users to browse, preview, and ultimately publish raw files from HDFS into Memory. You might think, “great but what can I do with raw files? They aren’t ready for consumption.” The Hadoop Gateway streamlines the data wrangling process by pushing a lot of the data wrangling functionality down to Hadoop.
Hadoop Gateway transfers data in parallel without the need for ODBC/JDBC driver
The Hadoop Gateway specifically optimizes the performance of cube publication by reducing the data fetch and transfer time. Specific jobs will run (in parallel) on Hadoop and then be transfered back to the intelligence server (in parrallel) via TCP.
Customer Success Stories
We have already been successful deploying the Hadoop Gateway at some of the largest companies in the world.
Watch this video for more detailed information into the architecture and steps to deploy. If you have any questions as you work to deploy the Hadoop Gateway send me an email at dharsh@Strategy.com