Apache Drill User

Keeping track of Apache Drill. From geeks, for geeks.

0 notes &

Chime In on Drill

How would you use Drill?

What questions or comments do you have about the design of Drill? 

What are your thoughts or suggestions for the Drill community?

The Bay Area Apache Drill User group is going to meet in San Jose, California next Monday 24 February at 6pm, and where ever you may live, we want to hear from you.

Please tweet your ideas or comments using the hashtag #drilltalk by Monday evening Pacific time (you can follow Drill on Twitter as @ApacheDrill).  Or add a comment or question here.

We’ll select several comments and questions and feed them into the discussion on Monday night. Look for a video of the evening after the event to see if your input is included. The link to the meet-up site is

http://bit.ly/1gB2E6p

And to get you thinking about how you’d use Drill, I recently asked Michael Hausenblas (MapR Chief Data Engineer and Drill contributor) for his thoughts looking forward to what Drill will do:

“Apache Drill allows business analysts to query heterogeneous data sources at scale, in a time-efficient and familiar way.

* Heterogeneous data sources … no matter if the data resides in existing relational databases (such as Oracle DB, MySQL, etc.), in a NoSQL database such as MongoDB or is available as Apache Hadoop-native, that is, in HDFS, MapR-FS or HBase, Apache Drill queries the data in-situ, By querying the data where it sits, there is no ETL process required to move the data into a central location as is usual in a data warehouse setting.

  • At scale … Drill works well for small-sized datasets (a few gigabytes) but also scales out to the terabyte and petabyte range, depending only on the number of machines available in a cluster (hence dictating the degree of parallelism at which a query can be executed.
  • Time-efficient … this means two things in the context of Drill:
  1. Because there is no ETL step involved, the data can be queried directly where it is located
  2. Due to the style the query is executed (based on Google Dremel’s multi-level execution tree, in-memory, streaming operators ,etc.) with Drill the response times are typically in the low seconds. This rapid response time is possible even on large datasets, which means it is well suited for low-latency application scenarios. Imagine someone sitting in front of a BI tool clicking on a button, expecting an answer immediately rather than the minutes or hours generally expected from MapReduce-based systems.
  • Familiar way … on the one hand this means that standard query interfaces such as full SQL supported are guaranteed with Drill (no matter if the data resides in a strongly-typed datasource such as a RDBMS or exists as JSON files in, say, HDFS) but also that ad-hoc queries are possible.”

 With those thoughts about Drill in mind, what are your ideas about how you’d use it?

Tweet your comments/questions with hashtag #drilltalk and @ApacheDrill to join the discussion on Monday.