Presto (SQL query engine)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Presto is an open-source software project to develop a database query engine using the standard Structured Query Language (SQL).

Description[edit]

Facebook commenced development efforts on Presto in 2012, and announced its release as open source for Apache Hadoop in 2013.[1][2] In 2014, Netflix disclosed they used Presto on 10 petabytes of data stored in the Amazon Simple Storage Service (S3).[3] Airbnb released the source to web interface software called Airpal for Presto in March, 2015.[4][5] In June 2015, data-warehousing company Teradata promoted its commercial support, using the Apache License for the software.[6] In December 2017, Teradata announced its partnership with Starburst[7], an enterprise Presto company that will continue advancing this open source project going forward[8].

Presto Architecture
Presto's query federation capabilities

Presto’s architecture is very similar to a classic database management system using cluster computing. It can be visualized as one coordinator node working in sync with multiple worker nodes. Clients submit SQL statements that get parsed and planned following which parallel tasks are scheduled to workers. Workers jointly process rows from the data sources and produce results that are returned to the client. Compared to the original Apache Hive execution model which used the Hadoop MapReduce mechanism on each query, Presto does not write intermediate results to disk resulting in a significant speed improvement. Presto is written in the Java programming language.[1]

A single Presto query can combine data from multiple sources. Presto offers connectors to data sources including files in Hadoop Distributed File System, Amazon S3, MySQL, Apache Kafka, Apache Cassandra, PostgreSQL and Redis. Unlike other Hadoop distribution-specific tools, such as Cloudera Impala, Presto can work with any flavor of Hadoop or without it. Presto supports separation of compute and storage and may be deployed both on premises and in the cloud.

References[edit]

  1. ^ a b Joab Jackson (November 6, 2013). "Facebook goes open source with query engine for big data". Computer World. Retrieved April 26, 2017. 
  2. ^ Jordan Novet (June 6, 2013). "Facebook unveils Presto engine for querying 250 PB data warehouse". Giga Om. Retrieved April 26, 2017. 
  3. ^ Eva Tse, Zhenxiao Luo, Nezih Yigitbasi (October 7, 2014). "Using Presto in our Big Data Platform on AWS". Netflix technical blog. Retrieved April 26, 2017. 
  4. ^ Doug Henschen (March 5, 2015). "Airbnb Boosts Presto SQL Query Engine For Hadoop". Information Week. Retrieved April 26, 2017. 
  5. ^ James Mayfield (March 4, 2015). "Airpal: a Web UI for PrestoDB". Airbnb blog post. Archived from the original on March 6, 2015. Retrieved April 26, 2017. 
  6. ^ "Teradata Launches First Enterprise Support for Presto". Press release. June 8, 2015. Retrieved April 26, 2017. 
  7. ^ "Teradata Partners with Starburst, a New Company Focused on Continuing the Success of the Presto Open Source Project". Press release. Dec 13, 2017. Retrieved 2017-12-14. 
  8. ^ "Presto - Next Chapter". Starburst blog post. Dec 13, 2017. Retrieved 2017-12-14. 

External links[edit]