What is Google's Dremel? How is it different from Mapreduce?

1.6K    Asked by EdythFerrill in Data Science , Asked on Jul 12, 2021

Google's Dremel is described here. What's the difference between Dremel and Mapreduce?

Answered by Dan Peters

Dremel :

Google Dremel is a distributed system software used for interactively querying massive datasets(such as event or log files), developed at Google. It is a data analysis tool designed to run queries on large structured datasets.

Dremel is the query engine used in Google's BigQuery service.

It is the inspiration for Apache Impala, Apache Drill, and Dremio, as it is an Apache-licensed platform that includes a distributed SQL execution engine.

MapReduce:

MapReduce is not designed for analyzing data. is not designed for analyzing data. It is a software framework that allows a collection of nodes to tackle distributed computational problems for large datasets.

An open-source implementation of MapReduce, i.e. Hadoop with co-occurrence of "Hive"(data warehouse software) also allows data analysis for massive datasets using a SQL-style syntax. Hive essentially turns queries into MapReduce functions. In contrast to using a ColumIO format, Hive uses techniques such as table indexing to make queries fast.

Note: Google is not intending Dremel as a replacement for MapReduce and Hadoop, it is taking Dremel as a compliment to these frameworks. According to Google, Dremel is frequently used to analyze MapReduce results or serve as a test run for large scale computations. Dremel can execute many queries over such data that would ordinarily require a sequence of MapReduce, but at a fraction of the execution time. Also, Dremel experimentally surpassed MapReduce by orders of magnitude.



Your Answer

Interviews

Parent Categories