User:Waury/sandbox

Stratosphere
Developer(s)	HPI, HU Berlin, TU Berlin
Stable release	0.4 / January 13, 2014
Preview release	0.5-SNAPSHOT
Written in	Java, Scala
Operating system	Cross-platform
Type	Distributed Computing
License	Apache License 2.0
Website	stratosphere.eu

This is the user sandbox of Waury. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

Stratosphere is an open-source software framework for processing large data sets on clusters. It is developed by the Hasso Plattner Institute, HU Berlin, TU Berlin^[1] and individual contributors. It is licensed under the Apache License 2.0.

Architecture

Front ends

Stratosphere provides a Java and a Scala API to write programs.^[2]. The internals of Stratosphere are implemented in Java.

Runtime

Operators

Similar to MapReduce or Apache Hadoop the Stratosphere runtime provides second-order functions called Operators (in publications called PACTs (PArallelization ContracTs)) for processing records with UDFs.^[3]

Function name	Image	Description
MAP	Stratosphere MAP	Receives one record stream and applies UDF to each record.
REDUCE	Stratosphere REDUCE	Receives one record stream and applies UDF to all records with the same key.
CO-GROUP	Stratosphere CO-GROUP	Receives two record streams and applies UDF to all records with the same key.
CROSS	Stratosphere CROSS	Receives two record streams and applies UDF to each element of their Cartesian product.
JOIN	Stratosphere JOIN	Receives two record streams and applies UDF to each record of an equi-join on the keys.

The runtime contains different primitives to execute these operators. Currently, there are two join algorithms implemented in Stratosphere: Sort-Merge join and Hybrid Hash Join. The REDUCE operator uses an external sort.^[3]

Iterations

In addition to the five second-order functions Stratosphere also provides two types of iterations, bulk and incremental.^[4]^[5]

Stratosphere Compiler

In contrast to other frameworks the pipeline is not fixed to a Map step followed by an optional Reduce step. A Stratosphere job can be an arbitrary DAG composed of multiple operators, bulk iterations, incremental iterations, data sources and data sinks. This enables optimizations like choosing a join strategy at runtime based on metadata, available system resources and compiler hints provided by the user.

The result of the optimized Stratosphere program is a job graph that can be executed on the cluster.

Nephele execution engine

Nephele is the low-level parallel execution engine of Stratosphere. It handles cluster resource allocation and in-memory and network communication.^[6].

File systems

Stratosphere supports HDFS, HBase, Avro, Amazon S3, JDBC and local file systems as data sources and data sinks for Stratopshere jobs.

References

^ "Stratosphere.eu Website". Retrieved 27 November 2013.
^ "Stratosphere Intro (Java and Scala Interface)".
^ ^a ^b Battré, Dominic; Ewen, Stephan; Hueske, Fabian; Kao, Odej; Markl, Volker; Warneke, Daniel (2010). "Nephele/PACTs". Proceedings of the 1st ACM symposium on Cloud computing. pp. 119–130. doi:10.1145/1807128.1807148. ISBN 9781450300360.
^ Ewen, Stephan; Schelter, Sebastian; Tzoumas, Kostas; Warneke, Daniel; Markl, Volker (2013). ""Iterative parallel data processing with stratosphere: an inside look"": 1053–1056. doi:10.1145/2463676.2463693. {{cite journal}}: Cite journal requires |journal= (help)
^ Ewen, Stephan; Tzoumas, Kostas; Kaufmann, Moritz; Markl, Volker (July 2012). "Spinning fast iterative data flows". Proceedings of the VLDB Endowment. 5 (11): 1268–1279. doi:10.14778/2350229.2350245. Retrieved 3 December 2013.{{cite journal}}: CS1 maint: date and year (link)
^ Warneke, Daniel; Kao, Odej (2009). "Nephele". Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers. pp. 1–10. doi:10.1145/1646468.1646476. ISBN 9781605587141.

External links

Category:Free software programmed in Java Category:Cloud computing Category:Cloud infrastructure Category:Free software for cloud computing

[Project_Partners-1] "Stratosphere.eu Website". Retrieved 27 November 2013.

[2] "Stratosphere Intro (Java and Scala Interface)".

[BattréEwen2010-3] Battré, Dominic; Ewen, Stephan; Hueske, Fabian; Kao, Odej; Markl, Volker; Warneke, Daniel (2010). "Nephele/PACTs". Proceedings of the 1st ACM symposium on Cloud computing. pp. 119–130. doi:10.1145/1807128.1807148. ISBN 9781450300360.

[StratosphereIterations2013-4] Ewen, Stephan; Schelter, Sebastian; Tzoumas, Kostas; Warneke, Daniel; Markl, Volker (2013). ""Iterative parallel data processing with stratosphere: an inside look"": 1053–1056. doi:10.1145/2463676.2463693. {{cite journal}}: Cite journal requires |journal= (help)

[StratosphereIterations2012-5] Ewen, Stephan; Tzoumas, Kostas; Kaufmann, Moritz; Markl, Volker (July 2012). "Spinning fast iterative data flows". Proceedings of the VLDB Endowment. 5 (11): 1268–1279. doi:10.14778/2350229.2350245. Retrieved 3 December 2013.{{cite journal}}: CS1 maint: date and year (link)

[WarnekeKao2009-6] Warneke, Daniel; Kao, Odej (2009). "Nephele". Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers. pp. 1–10. doi:10.1145/1646468.1646476. ISBN 9781605587141.

[1]

[2]

[3]

[4]

[5]

[6]