Jump to content

Draft:Apache Doris

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Citation bot (talk | contribs) at 13:57, 25 June 2024 (Alter: title, date, template type. Add: doi, pages, issue, volume, journal, arxiv. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Explicit | Category:AfC submissions by date/02 January 2024 | #UCB_Category 65/131). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Apache Doris
Developer(s)Apache Software Foundation
Stable release
1.2.3 / March 20, 2023
Repositorygithub.com/apache/doris
Written inJava / C++
LicenseApache License 2.0
Websitedoris.apache.org

Apache Doris is an open source real-time data warehouse mostly written in Java and C++. It is a column-oriented DBMS compatible with the MySQL protocol. The design of Apache Doris integrates the distributed storage engine of Google Mesa and the massively parallel processing SQL query engine of Apache Impala..[1]

History

Apache Doris originated as a project initiated by Baidu in 2008 to cater to the specific requirements of the company's advertising business. It was developed into an analytic database that supported a range of data services including multidimensional analysis, user profile analysis, ad hoc queries, and real-time dashboards. In 2017, it was open sourced and made available on GitHub.[2] In July 2018, Apache Doris entered the Apache Incubator program, a process designed by the Apache Software Foundation to guide and nurture open-source projects. In June 2022, the Apache Software Foundation announced the graduation of Apache Doris as a Top-Level Project.[3] By then, it accumulated 300 code contributors and over 500 enterprise users, including ByteDance, Tencent, and Xiaomi.[4]

Features

Apache Doris uses technologies including column-oriented storage, indexes, parallel execution engine, vectorization, and query optimizer in query execution. It is horizontally scalable and operates independently of third-party services. It uses on-demand JSON in resource utilization optimization[5], and Bloom Filter indexes in query acceleration[6].

Doris offers compatibility with big data components such as Apache Flink, Apache Hive, Apache Hudi, Apache Iceberg, Apache Spark, and Elasticsearch.[7] It undertakes data aggregation and join queries in a real-time data processing architecture[8] and historical data queries in a log analytics platform[9]

See also

References

  1. ^ "Incubation proposal to the Apache Software Foundation by the Doris core developers". cwiki.apache.org. Retrieved 26 April 2024.
  2. ^ "Apache Doris just 'graduated': Why care about this SQL data warehouse". InfoWorld. 24 June 2022. Retrieved 26 April 2024.
  3. ^ "Graduated from Apache Incubator as a Top-Level Project". Apache Software Foundation. 16 June 2022. Retrieved 26 April 2024.
  4. ^ "Apache Doris Analytical Database Graduates from Apache Doris". Datanami. 20 June 2022. Retrieved 26 April 2024.
  5. ^ Keiser, John; Lemire, Daniel (2024). "On-demand JSON: A better way to parse documents?". Software: Practice and Experience. 54 (6): 1074–1086. arXiv:2312.17149. doi:10.1002/spe.3313.
  6. ^ Ma, Qingzhi (2023). "SieveJoin: Boosting Multi-Way Joins with Reusable Bloom Filters". arXiv:2308.16370 [cs.DB].
  7. ^ "The Apache Software Foundation Announces Apache® Doris™ as a Top-Level Project". GlobeNewswire (Press release). 16 June 2022. Retrieved 26 April 2024.
  8. ^ "Krypton: Real-time Serving and Analytical SQL Engine at ByteDance" (PDF). vldb.org. Retrieved 26 April 2024.
  9. ^ Zhang, Jun; Zhang, Li (3 February 2023). "A web log real-time analysis platform based on stream computing". Proceedings Volume 12511, Third International Conference on Computer Vision and Data Mining (ICCVDM 2022). Third International Conference on Computer Vision and Data Mining (ICCVDM 2022). Hulun Buir, China: SPIE Digital Library. pp. 125111W(2023). doi:10.1117/12.2660112. 125111W.