H2O is open-source software for big-data analysis. It is produced by the company H2O.ai. H2O allows users to fit thousands of potential models as part of discovering patterns in data. The H2O software runs can be called from the statistical package R, Python, and other environments. It is used for exploring and analyzing datasets held in cloud computing systems and in the Apache Hadoop Distributed File System as well as in the conventional operating-systems Linux, macOS, and Microsoft Windows. The H2O software is written in Java, Python, and R. Its graphical-user interface is compatible with four browsers: Chrome, Safari, Firefox, and Internet Explorer.
1. H2O
The H2O project aims to develop an analytical interface for cloud computing, providing users with tools for data analysis.[1] The software is open-source and freely distributed. The company receives fees for providing customer service and customized extensions.
1.1. Mining of big dBig Data
- See also: Data mining and Machine learning
|
|
Creating a data warehouse
|
Concepts |
- Database
- Dimension
- Dimensional modeling
- Fact
- OLAP
- Star schema
- Snowflake schema
- Reverse star schema
- Aggregate
|
Variants |
- Data mart
- Sixth normal form
- Surrogate key
| | |
|
| | |
- Column-oriented DBMS
- Data vault modeling
- HOLAP
- MOLAP
- ROLAP
- Operational data store
|
Elements |
|
Fact |
- Fact table
- Early-arriving fact
- Measure
|
Dimension |
- Dimension table
- Degenerate
- Slowly changing
|
Filling |
- Extract-Transform-Load (ETL)
- Extract
- Transform
- Load
|
|
|
Open-source |
Using a data warehouse
|
Concepts |
- Business intelligence
- Dashboard
- Data mining
- Decision support system (DSS)
- OLAP cube
- Data warehouse automation
|
Languages |
- Data Mining Extensions (DMX)
- MultiDimensional eXpressions (MDX)
- XML for Analysis (XMLA)
|
Tools |
- Business intelligence software
- Reporting software
- Spreadsheet
|
|
|
- ADMB
- DAP
- gretl
- JASP
- JAGS
- JMulTi
- Julia
- GNU Octave
- OpenBUGS
- Orange
- PSPP
- R (RStudio)
- SageMath
- SimFiT
- SOFA Statistics
- Stan
- XLispStat
|
Freeware |
Related
|
People |
|
Products |
- Comparison of OLAP servers
- Data warehousing products and their producers
|
|
|
- BV4.1
- CumFreq
- SegReg
- XploRe
- WinBUGS
|
Commercial |
Cross-platform |
- Data Desk
- GAUSS
- GraphPad InStat
- GraphPad Prism
- IBM SPSS Statistics
- IBM SPSS Modeler
- JMP
- Maple
- Mathcad
- Mathematica
- MATLAB
- OxMetrics
- RATS
- Revolution Analytics
- SAS
- SmartPLS
- Stata
- StatView
- SUDAAN
- S-PLUS
- TSP
- World Programming System (WPS)
| |
Big datasets are too large to be analyzed using traditional software like R. The H2O software provides data structures and methods suitable for big data. H2O allow users to analyze and visualize whole sets of data without using the Procrustean strategy of studying only a small subset with a conventional statistical package.[2] H2O's statistical algorithms includes K-means clustering, generalized linear models, distributed random forests, gradient boosting machines, naive bayes, principal component analysis, and generalized low rank models.[3]
H2O is also able to run on Spark.[4]
Iterative methods for real-time problems
H2O uses iterative methods that provide quick answers using all of the client's data. When a client cannot wait for an optimal solution, the client can interrupt the computations and use an approximate solution.[1] In its approach to deep learning,[2][3][5] H2O divides all the data into subsets and then analyzing each subset simultaneously using the same method. These processes are combined to estimate parameters by using the Hogwild scheme,[6] a parallel stochastic gradient method.[7] These methods allow H2O to provide answers that use all the client's data, rather than throwing away most of it and analyzing a subset with conventional software.
1.2. Software
Programming languages
The H2O software has an interface to the following programming languages: Java (6 or later), Python (2.7.x, 3.5.x), R (3.0.0 or later) and Scala (1.4-1.6).[2][8]
Operating systems
The H2O software can be run on conventional operating-systems: Microsoft Windows (7 or later), Mac OS X (10.9 or later), and Linux (Ubuntu 12.04 ; RHEL/CentOS 6 or later),[8] It also runs on big-data systems, particularly Apache Hadoop Distributed File System (HDFS), several popular versions: Cloudera (5.1 or later), MapR (3.0 or later), and Hortonworks (HDP 2.1 or later). It also operates on cloud computing environments, for example using Amazon EC2, Google Compute Engine, and Microsoft Azure. The H2O Sparkling Water software is Databricks-certified on Apache Spark.[8]
Graphical user interface and browsers
Its graphical user interface is compatible with four browsers (unless specified, in their latest versions (As of June 2015)): Chrome, Safari, Firefox, Internet Explorer (IE10).[8]
Notes
- ↑ 1.0 1.1 1.2 (Harris 2012)
References
External links
Statistical software
|
Public domain |
|
|
|
|
Windows only |
- BMDP
- EViews
- GenStat
- LIMDEP
- LISREL
- MedCalc
- Microfit
- Minitab
- MLwiN
- NCSS
- SHAZAM
- SigmaStat
- Statistica
- StatsDirect
- StatXact
- SYSTAT
- The Unscrambler
- UNISTAT
|
Excel add-ons |
- Analyse-it
- SPC XL
- SigmaXL
- UNISTAT for Excel
- XLfit
- RExcel
|
|
|
|
- ↑
- 2.0 2.1 2.2 2.3 (Novet 2014)
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 "Recommended systems for H2O". H2O.ai. May 2015. http://0xdata.com/product/recommended-systems-for-h2o/.
- ↑ (Hardy 2014)
- ↑ https://github.com/h2oai/h2o-2/blob/master/LICENSE.txt
- ↑ 6.0 6.1 Aiello, Spencer; Tom Kraljevic; Petr Maj (2015), h2o: R Interface for H2O, Contributed Packages, with contributions from the 0xdata team, The R Project for Statistical Computing, https://cran.r-project.org/web/packages/h2o/index.html
- ↑ "FAQ — H2O 3.10.2.1 documentation" (in en). http://docs.h2o.ai/h2o/latest-stable/h2o-docs/faq.html#sparkling-water.
- ↑ "Prediction of IncRNA using Deep Learning Approach". Tripathi, Rashmi; Kumari, Vandana; Patel, Sunil; Singh, Yashbir; Varadwaj, Pritish. International Conference on Advances in Biotechnology (BioTech). Proceedings: 138-142. Singapore: Global Science and Technology Forum. (2015)
- ↑ Description of the iterative method for computing maximum-likelihood estimates for a generalized linear model.
- ↑
-
- Benjamin Recht; Re, Christopher; Wright, Stephen; Feng Niu (2011). J. Shawe-Taylor. ed. "Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent".
- Advances in Neural Information Processing Systems
- (Curran Associates, Inc.)
- 24
- : 693–701
- . http://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent.pdf
- .
-
- Recht's PDF
- Gage, Deborah (15 April 2013). "Platfora founder goes in search of big-data answers".
- . https://blogs.wsj.com/venturecapital/2013/04/15/platfora-founder-goes-in-search-of-big-data-answers/
- Hackett, Robert (3 August 2014), "Arno Candel, physicist and hacker, 0xdata",
- , Meet Fortune's 2014 Big Data All-Stars
- , http://fortune.com/2014/08/03/meet-fortunes-2014-big-data-all-stars/
-
- Hardy, Quentin (3 May 2014). "Valuable humans in our digital future". New York Times. http://bits.blogs.nytimes.com/2014/05/03/valuable-humans-in-our-digital-future/?. Retrieved 1 June 2015.
- Harris, Derrick (14 August 2012). "How 0xdata wants to help everyone become data scientists". Gigaom Research. https://gigaom.com/2012/08/14/how-0xdata-wants-to-help-everyone-become-data-scientists/. Retrieved 1 June 2015.
- Novet, Jordan (7 November 2014). "0xdata takes $8.9M and becomes H2O to match its open-source machine-learning project". VentureBeat. https://venturebeat.com/2014/11/07/h2o-funding/. Retrieved 1 June 2015.
- H2O software at H2O.ai (formerly 0xdata)
- Github repository of H2O software
- YouTube channel of H2O.ai