Apache Drill

Apache Drill
Developer(s)	Apache Software Foundation
Stable release	1.9 / November 29, 2016; 7 years ago
Repository	github.com/apache/drill ;
Operating system	Cross-platform
Licence	Apache License, Version 2.0.
Website	drill.apache.org

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project.^[1]

Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes.^[2]

Apache Drill 1.9 adds dynamic UDF feature, enables users to register and unregister UDFs on their own using the new CREATE FUNCTION USING JAR and DROP FUNCTION USING JAR commands.

Features

Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared
Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
Extremely user and developer friendly
Pluggable architecture enables connectivity to multiple datastores

Support

Drill is primarily focused on non-relational datastores, including Hadoop, NoSQL and cloud storage. The following datastores are currently supported:

Hadoop: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR
NoSQL: MongoDB, HBase
Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift
Deal with multiple data formats, including Apache Avro, Apache Parquet and JSON
Support RDBMS storage plugin (Using JDBC to connect)

A new datastore can be added by developing a storage plugin. Drill's unique schema-free JSON data model enables it to query non-relational datastores in-situ (many of these systems store complex or schema-free data).^[3]

References

^ "The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project". Retrieved 2014-12-02.
^ "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.
^ "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.

Papers

Some papers influenced the birth and design. Here is a partial list:

2005 From Databases to Dataspaces: A New Abstraction for Information Management, the authors highlight the need for storage systems to accept all data formats and to provide APIs for data access that evolve based on the storage system’s understanding of the data.
2010 Dremel: Interactive Analysis of Web-Scale Datasets

External links

[1] "The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project". Retrieved 2014-12-02.

[2] "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.

[3] "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.

[1]

[2]

[3]

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive Bluesky iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

Features

Support

See also

References

Papers

External links