Draft:Velox (execution engine)

Velox
Developer(s)	Velox OSS Community
Initial release	2022; 2 years ago
Repository	github.com/facebookincubator/velox
Written in	C++
Operating system	Cross-platform
Type	Database
License	Apache License 2.0
Website	velox-lib.io

Review waiting, please be patient.

This may take 6 weeks or more, since drafts are reviewed in no specific order. There are 1,025 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Velox (execution engine) (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 14 hours ago by Pedroerp-wiki (talk: D · +) · Last edited 3 minutes ago by Citation bot

Submission declined on 26 October 2024 by SafariScribe (talk).

This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.

If you would like to continue working on the submission, click on the "Edit" tab at the top of the window.
If you have not resolved the issues listed above, your draft will be declined again and potentially deleted.
If you need extra help, please ask us a question at the AfC Help Desk or get live help from experienced editors.
Please do not remove reviewer comments or this notice until the submission is accepted.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Declined by SafariScribe 16 days ago. Last edited by Citation bot 3 minutes ago. Reviewer: Inform author.

This draft has been resubmitted and is currently awaiting re-review.

Velox is an open source composable execution engine written and distributed as a C++ library ^[1]. Velox provides reusable, high-performance, and extensible data processing components that can be used when building data management systems. Velox implements the execution engine layer as defined in the composable data stack ^[2] , and as such relies on clients (the engine using the library) to provide a language frontend, an optimizer, and an execution runtime environment. Engines integrate with Velox by providing an optimized query plan, and relying on Velox for its execution.

Velox was created by Meta in 2020 and open sourced in 2022 ^[3] ^[4]. It is today used to accelerate Presto (the Prestissimo project), Spark (using the Apache Gluten project ^[5]), Voltron Data's Theseus engine, and a series of other systems within Meta and across the industry.

History

Velox was created in 2020 at Meta by Orri Erling and Masha Basmanova, soon joined by Pedro Pedreira. Velox's initial target was to accelerate Presto queries as an extension of the Aria project by rewriting the engine in C++. Given the amount of teams at Meta interested on high-performance building blocks for data management system, Velox was created as an extensible and reusable library, and early on adopted by Meta's stream processing platform (XStream), then by Presto (Prestissimo project) and a series of other systems related to data warehouse ingestion, realtime processing, and data for AI/ML.

Velox was open sourced in 2022. Companies like Ahana ^[6] (eventually acquired by IBM in 2024^[7]), Intel, Byte Dance, and Voltron Data joined the project early on. Other companies such as Microsoft, Uber, NVidia, Alibaba, Pinterest, Meituan and others are active contributors.

Features

Velox provides the following features:

Operators: implementation of relational operators such as TableScan, TableWriter, Filter, Project, Aggregation, Joins, Shuffle/Exchange, and more.

Vectors: An Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.

Expression Eval: A vectorized and extensible expression evaluation engine, providing features such as encoding peeling and fast-paths, memoization, constant folding, conjunct re-ordering and more.

Storage and IO: Support for file formats such as Parquet, ORC/DWRF, Nimble, table formats such as Iceberg, network serialization protocols such as Presto Page and Spark UnsafeRow, and cloud storage such as S3, HDFS, GCS, ABFS, and more.

Performance

Velox's execution model is columnar and based on vectorization. Using this model, physical operators are decomposed in small and concise loops of computation (little loops) that can be more efficient processed by modern CPUs. Vectorization provides better data and instruction locality, and enables CPUs to more efficiently leverage techniques such as out-of-order execution and SIMD instructions.

Velox also implements compressed execution by leveraging cascading encodings such as dictionaries, constant, and RLEs during execution to more efficiently implement database operations. Physical operators usually provide multiple paths of execution (where leveraging data encodings is beneficial), and can also generate data that is encoded using the input.

Velox also makes use of lazy materialization techniques to delay the materialization of data to the point during execution when the data is in fact needed. Such techniques along with prefetching, preloading, and IO coalescing improve IO efficiency and reduce the amount of data read and decoded.

Due to these and other performance features, Velox is reported to present 3-4x superior efficiency if compared to systems like vanilla Presto or Spark ^[8].

Integrations

Presto, through the Prestissimo (or Presto Native) effort.
Apache Spark, through Apache Gluten^[9].
Voltron Data Theseus.

References

^ Pedreira, Pedro; Erling, Orri; Basmanova, Masha; Wilfong, Kevin; Sakka, Laith; Pai, Krishna; He, Wei; Chattopadhyay, Biswapesh (2022). "Velox: Meta's Unified Execution Engine" (PDF). Proceedings of the VLDB Endowment. 48th International Conference on Very Large Databases. Sydney, Australia: VLDB Endowment. pp. 3372–3384. 10.14778/3554821.3554829.
^ Pedreira, Pedro; Erling, Orri; Karanasos, Konstantinos; Schneider, Scott; McKinney, Wes; Valluri, Satya; Zait, Mohamed; Nadeau, Jacques (2023). "The Composable Data Management System Manifesto" (PDF). Proceedings of the VLDB Endowment. 49th International Conference on Very Large Databases. Vancouver, Canada: VLDB Endowment. pp. 2150–8097. 10.14778/3603581.3603604.
^ "Introducing Velox: An open source unified execution engine". Engineering Blog at Meta. 2023. Retrieved 2024-11-11.
^ Timothy Morgan (2022). "Meta's Velox Means Database Performance Is Not Subject To Interpretation". The Next Platform. Retrieved 2024-11-11.
^ Shankaran, Akash; Gu, George; Chen, Weiting; Yang, Binwei; Kulkarni, Chidamber; Rambacher, Mark; Tatbul, Nesime; Cohen, David (2023). The Gluten Open-Source Software Project: Modernizing Java-based Query Engines for the Lakehouse Era (PDF). VLDB International Workshop on Composable Data Management Systems (CDMS'23). Vancouver, Canada. pp. 2150–8097.
^ Beth Winkowski (2022). "Ahana Joins Leading Open Source Innovators in its Commitment to the Velox Open Source Project Created by Meta". InfoWorld. Retrieved 2024-11-11.
^ Vikram Murali and Steven Mih (2023-04-12). "IBM joins the Presto Foundation through acquisition of Ahana". PrestoDB Foundation. Retrieved 2024-11-11.
^ Alex Woodie (2022). "New C++ Acceleration Library Velox Juices Code Execution Up To 8x". Big Data Wire. Retrieved 2024-11-11.
^ "The Apache Gluten Project". The Apache Software Foundation. 2023. Retrieved 2024-11-11.

[1] Pedreira, Pedro; Erling, Orri; Basmanova, Masha; Wilfong, Kevin; Sakka, Laith; Pai, Krishna; He, Wei; Chattopadhyay, Biswapesh (2022). "Velox: Meta's Unified Execution Engine" (PDF). Proceedings of the VLDB Endowment. 48th International Conference on Very Large Databases. Sydney, Australia: VLDB Endowment. pp. 3372–3384. 10.14778/3554821.3554829.

[2] Pedreira, Pedro; Erling, Orri; Karanasos, Konstantinos; Schneider, Scott; McKinney, Wes; Valluri, Satya; Zait, Mohamed; Nadeau, Jacques (2023). "The Composable Data Management System Manifesto" (PDF). Proceedings of the VLDB Endowment. 49th International Conference on Very Large Databases. Vancouver, Canada: VLDB Endowment. pp. 2150–8097. 10.14778/3603581.3603604.

[3] "Introducing Velox: An open source unified execution engine". Engineering Blog at Meta. 2023. Retrieved 2024-11-11.

[4] Timothy Morgan (2022). "Meta's Velox Means Database Performance Is Not Subject To Interpretation". The Next Platform. Retrieved 2024-11-11.

[5] Shankaran, Akash; Gu, George; Chen, Weiting; Yang, Binwei; Kulkarni, Chidamber; Rambacher, Mark; Tatbul, Nesime; Cohen, David (2023). The Gluten Open-Source Software Project: Modernizing Java-based Query Engines for the Lakehouse Era (PDF). VLDB International Workshop on Composable Data Management Systems (CDMS'23). Vancouver, Canada. pp. 2150–8097.

[6] Beth Winkowski (2022). "Ahana Joins Leading Open Source Innovators in its Commitment to the Velox Open Source Project Created by Meta". InfoWorld. Retrieved 2024-11-11.

[7] Vikram Murali and Steven Mih (2023-04-12). "IBM joins the Presto Foundation through acquisition of Ahana". PrestoDB Foundation. Retrieved 2024-11-11.

[8] Alex Woodie (2022). "New C++ Acceleration Library Velox Juices Code Execution Up To 8x". Big Data Wire. Retrieved 2024-11-11.

[9] "The Apache Gluten Project". The Apache Software Foundation. 2023. Retrieved 2024-11-11.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]