= H2O (software) =

H_{2}O
- Logo: 178px|Logo
- Author: Sri Satish Ambati, Cliff Click
- Developer: H2O.ai
- Released: 2011
- Latest Release Version: 3.46.0.2
- Latest Release Date: 13 May 2024
- Operating System: Unix, Mac OS, Microsoft Windows
- Programming Language: Java, Python, R
- Genre: Statistics software
- License: Apache License 2.0

H_{2}O is an open-source, in-memory, distributed machine learning and predictive analytics platform developed by the company H2O.ai (previously 0xdata). The software uses a distributed architecture for parallel processing on standard hardware. It supports algorithms for large-scale data analysis and model deployment.

H_{2}O is primarily used by data scientists and developers for statistical modeling and data-driven decision-making. The platform is designed to handle in-memory computations across a distributed computing environment. It offers implementations for numerous statistical and machine learning algorithms, which are accessible through various programming interfaces.

The software is released under the Apache License 2.0.

==Functionality and features==
H_{2}O provides a suite of supervised and unsupervised machine learning algorithms. Its core functions include:
- Supervised learning: algorithms in the field of statistics, data mining and machine learning such as generalized linear models, random forests, gradient boosting and deep learning are implemented for classification and regression tasks.
- Unsupervised learning: including K-Means clustering and principal component analysis.
- Automated machine learning: a features designed to automate the processes of model selection, tuning, and ensemble creation.

The software can ingest data from various sources, including the Hadoop Distributed File System, Amazon S3, SQL databases, as well as local file systems. It operates natively on Apache Spark clusters through Sparkling Water. Proponents claim that improved performance is achieved compared to other analysis tools. The software is distributed free of charge, under a business model based on the development of individual applications and support.

==Architecture==
H_{2}O is primarily written in Java. It uses a distributed architecture that allows the platform to cluster nodes for parallel processing and in-memory storage of data and models.

Users interact with the H_{2}O platform through several primary interfaces:
- Programming language interfaces: APIs are provided for the R and Python programming languages, and various Apache offerings (Apache Hadoop and Spark, as well as Maven).
- H_{2}O Flow: a graphical web-based interactive computational environment that functions as a notebook interface for data exploration, model building, and scripting.
- REST-API: allows for integration with other applications and frameworks such as Microsoft Excel or RStudio. With the H_{2}O Machine Learning Integration Nodes, KNIME offers algorithmic workflows.

While the algorithm executes, approximate results are displayed, so that users can track the progress and intervene if needed.

==History, influences, and extensions==
The software project was initiated by the company 0xdata, which later changed its name to H2O.ai. The three Stanford professors Stephen P. Boyd, Robert Tibshirani and Trevor Hastie form a panel that advises H_{2}O on scientific issues. Since its inception, H_{2}O provides open-source machine learning libraries for enterprise use. The core H_{2}O platform is often complemented by offerings from H2O.ai, such as H_{2}O Driverless AI.

==Reception==
H_{2}O is referenced in peer-reviewed literature regarding automated machine learning (AutoML). The platform has been categorized as a "Leader" and a "Strong Performer" in industry reports by Forrester Research. H_{2}O (the open-source platform) and the associated commercial platform Driverless AI have been recurring winners of InfoWorld's most prestigious awards, including both the Best of Open Source Software ("Bossies") and the Technology of the Year awards.
