Draft:Velox (execution engine)
Review waiting, please be patient.
This may take 7 weeks or more, since drafts are reviewed in no specific order. There are 1,270 pending submissions waiting for review.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Reviewer tools
|
Submission declined on 26 October 2024 by SafariScribe (talk). This submission is not adequately supported by reliable sources. Reliable sources are required so that information can be verified. If you need help with referencing, please see Referencing for beginners and Citing sources.
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
This draft has been resubmitted and is currently awaiting re-review. |
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Developer(s) | Velox OSS Community |
---|---|
Initial release | 2022 |
Repository | github |
Written in | C++ |
Operating system | Cross-platform |
Type | Database |
License | Apache License 2.0 |
Website | velox-lib |
Velox is an open source composable execution engine written and distributed as a C++ library [1]. Velox provides reusable, high-performance, and extensible data processing components that can be used when building data management systems. Velox implements the execution engine layer as defined in the composable data stack [2] , and as such relies on clients (the engine using the library) to provide a language frontend, an optimizer, and an execution runtime environment. Engines integrate with Velox by providing an optimized query plan, and relying on Velox for its execution.
Velox was created by Meta in 2020 and open sourced in 2022 [3] [4]. It is today used to accelerate Presto (the Prestissimo project), Spark (using the Apache Gluten project [5]), Voltron Data's Theseus engine, and a series of other systems within Meta and across the industry.
History
[edit]Velox was created in 2020 at Meta by Orri Erling and Masha Basmanova, soon joined by Pedro Pedreira. Velox's initial target was to accelerate Presto queries as an extension of the Aria project by rewriting the engine in C++. Given the amount of teams at Meta interested on high-performance building blocks for data management system, Velox was created as an extensible and reusable library, and early on adopted by Meta's stream processing platform (XStream), then by Presto (Prestissimo project) and a series of other systems related to data warehouse ingestion, realtime processing, and data for AI/ML.
Velox was open sourced in 2022. Companies like Ahana [6] (eventually acquired by IBM in 2024[7]), Intel, Byte Dance, and Voltron Data joined the project early on. Other companies such as Microsoft, Uber, NVidia, Alibaba, Pinterest, Meituan and others are active contributors.
Features
[edit]Velox provides the following features:
- Operators: implementation of relational operators such as TableScan, TableWriter, Filter, Project, Aggregation, Joins, Shuffle/Exchange, and more.
- Vectors: An Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.
- Expression Eval: A vectorized and extensible expression evaluation engine, providing features such as encoding peeling and fast-paths, memoization, constant folding, conjunct re-ordering and more.
- Storage and IO: Support for file formats such as Parquet, ORC/DWRF, Nimble, table formats such as Iceberg, network serialization protocols such as Presto Page and Spark UnsafeRow, and cloud storage such as S3, HDFS, GCS, ABFS, and more.
Performance
[edit]Velox's execution model is columnar and based on vectorization. Using this model, physical operators are decomposed in small and concise loops of computation (little loops) that can be more efficient processed by modern CPUs. Vectorization provides better data and instruction locality, and enables CPUs to more efficiently leverage techniques such as out-of-order execution and SIMD instructions.
Velox also implements compressed execution by leveraging cascading encodings such as dictionaries, constant, and RLEs during execution to more efficiently implement database operations. Physical operators usually provide multiple paths of execution (where leveraging data encodings is beneficial), and can also generate data that is encoded using the input.
Velox also makes use of lazy materialization techniques to delay the materialization of data to the point during execution when the data is in fact needed. Such techniques along with prefetching, preloading, and IO coalescing improve IO efficiency and reduce the amount of data read and decoded.
Due to these and other performance features, Velox is reported to present 3-4x superior efficiency if compared to systems like vanilla Presto or Spark [8].
Integrations
[edit]- Presto, through the Prestissimo (or Presto Native) effort.
- Apache Spark, through Apache Gluten[9].
- Voltron Data Theseus.
References
[edit]- ^ Pedreira, Pedro; Erling, Orri; Basmanova, Masha; Wilfong, Kevin; Sakka, Laith; Pai, Krishna; He, Wei; Chattopadhyay, Biswapesh (2022). "Velox: Meta's Unified Execution Engine" (PDF). Proceedings of the VLDB Endowment. 48th International Conference on Very Large Databases. Sydney, Australia: VLDB Endowment. pp. 3372–3384. 10.14778/3554821.3554829.
- ^ Pedreira, Pedro; Erling, Orri; Karanasos, Konstantinos; Schneider, Scott; McKinney, Wes; Valluri, Satya; Zait, Mohamed; Nadeau, Jacques (2023). "The Composable Data Management System Manifesto" (PDF). Proceedings of the VLDB Endowment. 49th International Conference on Very Large Databases. Vancouver, Canada: VLDB Endowment. pp. 2150–8097. 10.14778/3603581.3603604.
- ^ "Introducing Velox: An open source unified execution engine". Engineering Blog at Meta. 2023. Retrieved 2024-11-11.
- ^ Timothy Morgan (2022). "Meta's Velox Means Database Performance Is Not Subject To Interpretation". The Next Platform. Retrieved 2024-11-11.
- ^ Shankaran, Akash; Gu, George; Chen, Weiting; Yang, Binwei; Kulkarni, Chidamber; Rambacher, Mark; Tatbul, Nesime; Cohen, David (2023). The Gluten Open-Source Software Project: Modernizing Java-based Query Engines for the Lakehouse Era (PDF). VLDB International Workshop on Composable Data Management Systems (CDMS'23). Vancouver, Canada. pp. 2150–8097.
- ^ Beth Winkowski (2022). "Ahana Joins Leading Open Source Innovators in its Commitment to the Velox Open Source Project Created by Meta". InfoWorld. Retrieved 2024-11-11.
- ^ Vikram Murali and Steven Mih (2023-04-12). "IBM joins the Presto Foundation through acquisition of Ahana". PrestoDB Foundation. Retrieved 2024-11-11.
- ^ Alex Woodie (2022). "New C++ Acceleration Library Velox Juices Code Execution Up To 8x". Big Data Wire. Retrieved 2024-11-11.
- ^ "The Apache Gluten Project". The Apache Software Foundation. 2023. Retrieved 2024-11-11.
- Draft articles on internet culture
- Draft articles on software
- Draft articles on computing
- Draft articles on technology
- AfC submissions on science, mathematics and engineering
- Pending AfC submissions
- AfC pending submissions by age/11 days ago
- AfC submissions by date/12 November 2024
- AfC submissions by date/26 October 2024