Apache Drill

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Apache Drill
Apache Drill logo.svg
Developer(s)Apache Software Foundation
Stable release
1.18.0 / September 5, 2020; 5 months ago (2020-09-05)
RepositoryDrill Repository
Written inJava
Operating systemCross-platform
LicenseApache License 2.0

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project.[1]

Drill supports a variety of NoSQL databases and file systems, including Alluxio, HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop.

Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, if Drill and the datastore are on the same nodes.[2]

Apache Drill 1.9 added dynamic user defined functions.

Apache Drill 1.11 added cryptographic-related functions and PCAP file format support.


  • Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared
  • Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
  • Extremely user and developer friendly
  • Pluggable architecture enables connectivity to multiple datastores

Back-end Support[edit]

Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include:

A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ .[3]

Front-end Support[edit]

Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming.

The dashboard library, Apache Superset, is particularly well suited for visualization of data queried with Drill.

See also[edit]


  1. ^ "The Apache Software Foundation Announces Apache™ Drill™ as a Top-Level Project". Retrieved 2014-12-02.
  2. ^ "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.
  3. ^ "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.


Some papers influenced the birth and design. Here is a partial list:

External links[edit]