Dremio’s latest release aims to make data bread lines obsolete

Published in

Digitizing Polaris

3 min readApr 25, 2018

Sure, data scientists and data analysts are rare, but the data engineers who serve up the information they need to glean insights are in even shorter supply.

Kelly Stirman, CMO of self-service data platform maker Dremio, estimates that for every 100 data workers who need access to big data stores, there is only one person to provide it. And the job isn’t always simple. It might include moving information from data lakes, to data warehouses, to data cubes before BI and predictive tools can even be applied.

That’s a problem, not only for data workers but everyone up the pipeline, including the C-Suite. They are all starving for data-driven, game-changing insights.

And finding a way to hire more engineers isn’t the solution. “Providing self-service data access is,” according to Stirman. That is what Dremio was created for.

Run by some of the most experienced big data professionals and open source innovators, Dremio came out of stealth last year positioning itself as the “missing link in data lakes.” From a high level perspective, this means that with Dremio data workers can access, work with and glean insights from data regardless of where it lives , with the same speed as a high performance database.

To accomplish this, Dremio leverages open source projects Apache Arrow, Apache Calcite and Apache Parquet on four areas of innovation — Apache Arrow’s execution engine, which boosts query speeds up to 1000x, its patented Dremio Reflections which dramatically speed up queries without replicating data, Native Push-downs (optimized query semantics for each data source — relational, NoSQL, HDFS, Amazon S3…), and Universal Relational Algebra, a cost-based query planner that automatically substitutes plans to make optimal use of Dremio Reflections. Read more here.

Dremio’s latest release, Dremio 2.0, includes:

Starflake Data Reflections- Dremio can now automatically discover star or snowflake schemas in Hadoop data lakes or Amazon S3 and return results at interactive speeds, regardless of data size. In other words, there is no need to move data to a high performance database like Redshift, Teradata or Vertica in order to use tools like Tableau, Power BI, or Python.

What is the net effect aside from an improved user experience? Stirman says that it is less expensive and administration is simpler on Dremio.

Dremio Learning Agent- Dremio 2.0 includes a learning engine that learns from user behavior and then recommends data sets based on prior interactions. Stirman likens it to the shopping experience on Amazon which makes recommendations based on what other consumers put in their shopping carts. The logic goes something like this: On Amazon “People who bought flashlights also bought batteries”. On Dremio: People who used dataset B also used dataset C.”

Dremio supports Looker

Looker customers can now leverage Dremio to analyze data directly from relational databases, multiple NoSQL data stores such as MongoDB and Elasticsearch, as well as data lakes built on Hadoop, Amazon S3, and Azure Data Lake Store. (It’s worth noting that this capability was already available for Tableau, Qlik and Power BI.)

Fine-Grained Access Control for GDPR

With the impending implementation of GDPR only a few weeks away, Dremio has introduced fine-grained access controls at the row and column level making it easier for companies to stay in compliance.

Dremio’s latest release aims to make data bread lines obsolete

Written by Virginia Backaitis