Apache Beam

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Apache Beam
Beam-logo-full-color-name-right-200-autocrop.png
Original author(s)Google
Developer(s)Apache Software Foundation
Initial releaseJune 15, 2016; 5 years ago (2016-06-15)
Stable release2.30.0 (June 9, 2021; 4 months ago (2021-06-09)[1]) [±]
RepositoryBeam Repository
Written inJava, Python, Go
Operating systemCross-platform
LicenseApache License 2.0
Websitebeam.apache.org

Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.[2] Beam Pipelines are defined using one of the provided SDKs and executed in one of the Beam’s supported runners (distributed processing back-ends) including Apache Flink, Apache Samza, Apache Spark, and Google Cloud Dataflow.[3]

History[edit]

Apache Beam[3] is one implementation of the Dataflow model paper.[4] The Dataflow model is based on previous work on distributed processing abstractions at Google, in particular on FlumeJava[5] and Millwheel.[6][7]

Google released an open SDK implementation of the Dataflow model in 2014 and an environment to execute Dataflows locally (non-distributed) as well as in the Google Cloud Platform service.

Timeline[edit]

Version Release date
Current stable version: 2.33.0 2021-10-07
Old version, no longer maintained: 2.32.0 2021-08-25
Old version, no longer maintained: 2.31.0 2021-07-08
Old version, no longer maintained: 2.30.0 2021-06-09
Old version, no longer maintained: 2.29.0 2021-04-27
Old version, no longer maintained: 2.28.0 2021-02-22
Old version, no longer maintained: 2.27.0 2021-01-08
Old version, no longer maintained: 2.26.0 2020-12-11
Old version, no longer maintained: 2.25.0 2020-10-23
Old version, no longer maintained: 2.24.0 2020-09-18
Old version, no longer maintained: 2.23.0 2020-07-29
Old version, no longer maintained: 2.22.0 2020-06-08
Old version, no longer maintained: 2.21.0 2020-05-27
Old version, no longer maintained: 2.20.0 2020-04-15
Old version, no longer maintained: 2.19.0 2020-02-04
Old version, no longer maintained: 2.18.0 2020-01-23
Old version, no longer maintained: 2.17.0 2020-01-06
Old version, no longer maintained: 2.16.0 2019-10-07
Old version, no longer maintained: 2.15.0 2019-08-22
Old version, no longer maintained: 2.14.0 2019-08-01
Old version, no longer maintained: 2.13.0 2019-05-22
Old version, no longer maintained: 2.12.0 2019-04-25
Old version, no longer maintained: 2.11.0 2019-02-26
Old version, no longer maintained: 2.10.0 2019-02-01
Old version, no longer maintained: 2.9.0 2018-12-13
Old version, no longer maintained: 2.8.0 2018-10-29
Old version, no longer maintained: 2.7.0 (LTS) 2018-10-03
Old version, no longer maintained: 2.6.0 2018-08-08
Old version, no longer maintained: 2.5.0 2018-06-26
Old version, no longer maintained: 2.4.0 2018-03-20
Old version, no longer maintained: 2.3.0 2018-01-30
Old version, no longer maintained: 2.2.0 2017-12-02
Old version, no longer maintained: 2.1.0 2017-08-23
Old version, no longer maintained: 2.0.0 2017-05-17
Old version, no longer maintained: 0.6.0 2017-03-11
Old version, no longer maintained: 0.5.0 2017-02-02
Old version, no longer maintained: 0.4.0 2016-12-29
Old version, no longer maintained: 0.3.0 2016-10-31
Old version, no longer maintained: 0.2.0 2016-08-08
Old version, no longer maintained: 0.1.0 2016-06-15
Legend:
Old version
Older version, still maintained
Latest version
Latest preview version
Future release

See also[edit]

References[edit]

  1. ^ "Blogs". beam.apache.org. The Apache Software Foundation. Retrieved 2021-06-09.
  2. ^ Woodie, Alex (22 April 2016). "Apache Beam's Ambitious Goal: Unify Big Data Development". Datanami. Retrieved 4 August 2016.
  3. ^ a b "Cloud Dataflow - Batch & Stream Data Processing".
  4. ^ Akidau, Tyler; Schmidt, Eric; Whittle, Sam; Bradshaw, Robert; Chambers, Craig; Chernyak, Slava; Fernández-Moctezuma, Rafael J.; Lax, Reuven; McVeety, Sam; Mills, Daniel; Perry, Frances (1 August 2015). "The dataflow model" (PDF). Proceedings of the VLDB Endowment. 8 (12): 1792–1803. doi:10.14778/2824032.2824076. Retrieved 4 August 2016.
  5. ^ Chambers, Craig; Raniwala, Ashish; Perry, Frances; Adams, Stephen; Henry, Robert R.; Bradshaw, Robert; Weizenbaum, Nathan (1 January 2010). "FlumeJava: Easy, Efficient Data-parallel Pipelines" (PDF). Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM: 363–375. doi:10.1145/1806596.1806638. S2CID 14888571. Archived from the original (PDF) on 23 September 2016. Retrieved 4 August 2016.
  6. ^ Akidau, Tyler; Whittle, Sam; Balikov, Alex; Bekiroğlu, Kaya; Chernyak, Slava; Haberman, Josh; Lax, Reuven; McVeety, Sam; Mills, Daniel; Nordstrom, Paul (27 August 2013). "MillWheel" (PDF). Proceedings of the VLDB Endowment. 6 (11): 1033–1044. doi:10.14778/2536222.2536229. Archived from the original (PDF) on 1 February 2016. Retrieved 4 August 2016.
  7. ^ Pointer, Ian. "Apache Beam wants to be uber-API for big data". InfoWorld. Retrieved 4 August 2016.