H2o (Analytics tool)
H20 is an open source big data analysis tool. It was originally engineered through an academic-private collaboration between faculty at Stanford and Silicon Valley and specialty computer learning firm 0xdata. Academic contributions include those of Stephen Boyd, Robert Tibshirani, and Trevor Hastie (all of Stanford), and Jan Jorgensen (of Purdue).
The stated objective of the project is to provide an analytical interface for cloud computing, enabling providing users with statistical tools that were previously only available to large organizations with more resources.
The basic theory underpinning the H20 project first observes that big data are data too large to be manipulated or analyzed using traditional tools. In order to make use of the increased power of large data sets, analytical algorithms such as GLM or K-means clustering, are written into specialized software programs. These programs allow users to manipulate data by providing them access to increased computing power, rather than by truncating data. Efficiency, a critical component, is often achieved by dividing data into subsets and then analyzing each subset simultaneously using the same algorithm. Results from these independent processes are compared and adjusted iteratively until convergence produces the estimated statistical parameters of interest.