= Pentaho =

Pentaho
- Logo: Pentho logo -1.jpg
- Author: Pentaho Corporation
- Developer: Hitachi Vantara
- Latest Release Version: 10.2.0.0-xxx
- Operating System: Windows, Windows Server, Linux, Mac OS X
- Platform: x86-64
- Genre: Data Management, Data Analytics, Data Governance, Data Quality, Business Intelligence
- License: Pentaho Data Integration Enterprise Edition (EE): Hitachi Commercial License;, , Pentaho Business Analytics: Hitachi Commercial License;, , Pentaho Data Catalog: Hitachi Commercial License;, , Pentaho Data Optimiser: Hitachi Commercial License;

Pentaho is the brand name for several data management software products that make up the Pentaho+ Data Platform. These include Pentaho Data Integration, Pentaho Business Analytics,  Pentaho Data Catalog, and Pentaho Data Optimiser.

== Overview ==
Pentaho is owned by Hitachi Vantara, and is a separate business unit. Pentaho started out as business intelligence (BI) software developed by the Pentaho Corporation in 2004. It comprises Pentaho Data Integration (PDI) and Pentaho Business Analytics (PBA). These provide data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load (ETL) capabilities.

Pentaho was acquired by Hitachi Data Systems in 2015 and in 2017 became part of Hitachi Vantara. In November 2023, Hitachi Vantara launched the Pentaho+ Platform, comprising the original Pentaho Data Integration and Pentaho Business Analytics software, and new Pentaho Data Catalog, and Pentaho Data Optimiser software products. Hitachi Vantara intends to extend the Pentaho+ platform with tools for Data Quality and Data Mastering.

== Products ==

=== Pentaho Data Optimizer (PDO) ===
Pentaho Data Optimizer allows organizations to manage, maintain and tier their data based on its business value, the cost of managing it, and regulatory requirements. It uses the auto-discovery features of the Pentaho Data Catalog to achieve this.

=== Pentaho Data Catalog (PDC) ===
In March 2020 and June 2021 Hitachi Vantara acquired Waterline Data and Io-Tahoe respectively, and amalgamated both into its Pentaho Data Catalog (PDC). PDC automatically finds, analyzes, and tags structured and unstructured data and contextualizes it with business glossary terms and governance policies.

=== Pentaho Data Integration (PDI) & Pentaho Business Analytics (PBA). ===
Pentaho Data Integration (PDI) and Pentaho Business Analytics (PBA) use a Java framework to create business intelligence solutions. Although most known for its Business Analysis Server (formerly known as Business Intelligence Server), the PDI/PBA software is indeed a couple of Java classes with specific functionality. On top of those Java classes one can build any business intelligence solution.

The only exception to this model is the ETL tool Pentaho Data Integration - PDI (formerly known as Kettle.) PDI is a set of software used to design data flows that can be run either in a server or standalone processes. PDI encompasses Kitchen, a job and transformation runner, and Spoon, a graphical user interface to design such jobs and transformations.

Features such as reporting and OLAP are achieved by integrating sub-projects into the PDI/PBA framework, like Mondrian OLAP engine and jFree Report. For some time by now those projects have been brought into Pentaho's curating. Some of those subprojects even have standalone clients like Pentaho Report Designer, a front-end for jFree Reports, and Pentaho Schema Workbench, a GUI to write XMLs used by Mondrian to serve OLAP cubes.

Pentaho offers enterprise and community editions of those PDI software. The enterprise software is obtained through an annual subscription and contains extra features and support not found in the community edition. PDI & PBA's core offering is frequently enhanced by add-on products, usually in the form of plug-ins, from the company and the broader community of users.

==== Server applications ====
Pentaho Enterprise Edition (EE) and Pentaho Community Edition (CE).
| Product | Offering | Type | Recent version (EE) | Recent version (CE) | Description |
| Pentaho BA Platform | EE, CE | Server application | 7.1 | 7.1 | Commonly referred to as the BI Platform, and recently renamed Business Analytics Platform (BA Platform), makes up the core software piece that hosts content created both in the server itself through plug-ins or files published to the server from the desktop applications. It includes features for managing security, running reports, displaying dashboards, report bursting, scripted business rules, OLAP analysis and scheduling out of the box. Commercial plug-ins from Pentaho expand out-of-the-box features. A few open-source plug-in projects also expand capabilities of the server. The Pentaho BA Platform runs in the Apache Java Application Server. It can be embedded into other Java Application Servers. |
| Pentaho Analysis Services (Mondrian) | EE, CE | Server application | 3.7.0 | 3.6.1 | Pentaho Analysis Services, codenamed Mondrian, is an open-source OLAP (online analytical processing) server, written in Java. |
| Pentaho Dashboard Designer (PDD) | EE | Server plug-in | 5.0.6 | - | A commercial plug-in provided to enterprise edition (EE) subscribers. It allows users to create dashboards, which are collections of other content components displayed together with the goal of providing a centralized view of key performance indicators (KPI)s and other business data movements, letting users monitor them and make decisions. Content components are usually individual Information graphics, tables, OLAP views or reports. The plug-in simplifies dashboard creation through the use of layout templates, drag-and-drop interaction and a GUI for providing parameters and inputs to dashboard components. |
| Pentaho Analysis (Analyzer) (PAZ) | EE | Server plug-in | 5.0.6 | - | The Pentaho Analyzer plug-in provides a web-based, drag-and-drop OLAP viewer. It allows a user to visually create MDX queries by dragging parts of a previously defined Mondrian OLAP schema onto a canvas, where other activities can take place like filtering, sorting, creating calculated members from other measures, exporting the result table to PDF or MS Excel, and optionally graphing the data. It is also known to work on Apple iPads by using the Safari web browser. |
| Pentaho Interactive Reporting (PIR) | EE | Server plug-in | 5.0.6 | - | This plug-in enables users to create ad hoc reports in a visual drag-and-drop fashion. |
| Pentaho Data Access Wizard | EE, CE | Server plug-in | - | - | This plug-in is bundled with all servers and allows users to create new data sources for use throughout the system from other databases or CSV files uploaded to the server while using a setup wizard. During the steps of creating a data source users also are given a chance to create a data model describing how columns or fields relate to each other creating hierarchies of relationships like Time: Year, Quarters, Months, Weeks and Product Division, Category, Type etc. The resultant model is used by Mondrian and any other plug-in like Analyzer or Saiku to create new queries against the newly created data source. This component is part of what Pentaho introduces as agile BI, which simply means having a way to start from basic data and quickly iterate through steps to discover the proper way to structure, study and present the data. |
| Pentaho Mobile | EE | Server piece | 5.0.6 | - | A new addition since 4.5-GA suite that is a user interface adapted for use with the Apple iPad. It exposes all of the major functionality of OLAP analysis and running of reports and dashboards that allow greater interaction on a small, touchscreen. Mobile also adds features for bookmarking favorite content for easy access and the concept of opening several pieces of content in tabs. |

=== Desktop/client applications ===
| Product | Offering | Type | Recent version | Description |
| Pentaho Data Integration (PDI) | EE, CE | Desktop application | EE : 9.5.0.0-xxx (2023/05) CE : (2022/11) | Pentaho Data Integration, codenamed Kettle, consists of a core data integration (ETL) engine, and GUI applications that allow the user to define data integration jobs and transformations. It supports deployment on single node computers as well as on a cloud, or cluster. |
| Pentaho for Big Data | EE, CE | PDI plug-in | N/A | Pentaho for Big Data is a data integration tool based on Pentaho Data Integration. It allows executing ETL jobs in and out of big data environments such as Apache Hadoop or Hadoop distributions such as Amazon, Cloudera, EMC Greenplum, MapR, and Hortonworks. It also supports NoSQL data sources such as MongoDB and HBase. |
| Pentaho Report Designer | EE, CE | Desktop application | 9.0.0.0-423 | Pentaho Report Designer is a visual, banded report writer. Features include using subreports, charts and graphs. It can query and use data from many sources including SQL, MDX, Community Data Access, scripting, static table definitions and more. It consists of a core reporting engine, capable of generating reports based on an XML definition file stored in a Zip (file format) with a .PRPT extension. Many tools have been developed surrounding the reporting engine, including GUI designers and ad hoc wizards that guide the user through a step-by-step process of creating a report, using solely graphical tools without the need to write any code. |
| Pentaho Data Mining | EE, CE | Desktop application | Weka | Pentaho Data Mining used the Waikato Environment for Knowledge Analysis (Weka) to search data for patterns. Weka consists of machine learning algorithms for a broad set of data mining tasks. It contains functions for data processing, regression analysis, classification methods, cluster analysis, and visualization. Based on the discovered patterns, users can predict future trends. |
| Pentaho Metadata Editor (PME) | EE, CE | Desktop application | 9.0.0.0-423 | The metadata editor is used to create business models and act as an abstraction layer from the underlying data sources. The resulting metadata models are used by Pentaho Interactive Reporting, Saiku Reporting, and Pentaho's legacy AD HOC reporting plug-in applications to create reports within the BA server without using any of the other external desktop applications. |
| Pentaho Aggregate Designer (PAD) | EE, CE | Desktop application | download 9.0.0.0-423 | Aggregate Designer operates on Pentaho Analysis (Mondrian) XML schema files and the database with the underlying tables described by the schema to generate precalculated, aggregated answers to speed up analysis work and MDX queries executed against Mondrian. This is accomplished by the software examining the hierarchies described in the schema and the measures also defined there and generating SQL which would result in the creation of tables storing those answers away for future use by Mondrian. After using the software to generate these aggregate tables, the original Mondrian XML schema file describing the OLAP cube is modified to reference the precomputed results. |
| Pentaho Schema Workbench (PSW) | EE, CE | Desktop application | 9.0.0.0-423 | Pentaho Schema Workbench provides a graphical interface for designing OLAP cubes for Pentaho Analysis (Mondrian). The schema created is stored as a regular XML file on disk. It is not necessary to use the Schema Workbench to create schema, but it is often helpful for beginners and even experts who need go inspect a cube visually and come up to speed with how to maintain or extend it. |
| Pentaho Design Studio (PDS) | EE, CE | Desktop application | 4. | The Pentaho BA Server supports special XML scripts called xactions to implement business logic and other forms of automation in the platform. Design Studio is a modified version of the Eclipse Development Environment with a plug-in designed to understand the components supported by xaction scripts. Xactions are very powerful, and useful, but sometimes prove difficult to troubleshoot because of the low-level way they interact with parts of the BA server. Developers are starting to use Pentaho Data Integration transformation files to carry out automation and business logic tasks. The transformations can be run directly by the BA Server and visually debugged in Pentaho Data Integration (PDI) and are quickly gaining favor in the community over xactions. It is a small leap to imagine PDI transformations will eventually replace xactions entirely. |

=== Community driven, open-source Pentaho server plug-ins ===
All of these plug-ins function with Pentaho Enterprise Edition (EE) and the older Pentaho Community Edition (CE).
| Product | Type | Recent version | Description |
| Ctools | Server plug-in suite | Various | Known as the Community tools, it includes a growing array of features usually contained in a package with an abbreviated name where the first C always stands for community and simultaneously represents its status as being both free of cost and open-source. The tools are produced and managed by Webdetails. Documentation on the tools is found at ctools.webdetails.org. Most often the Ctools suite is installed by using a linux script., but there are plans in an upcoming release to have a package manager included in the BA Server that helps with installation. |
| Community Charting Components (CCC) | Server plug-in | Various | A charting library on top of Protovis, a very powerful free and open-source visualization toolkit. The aim of CCC is to provide developers with a way to include into their dashboards the basic chart types without losing the main principle: Extensibility. The charts created with CCC become components that appear in dashboards. |
| Community Build Framework (CBF) | Build Script Framework | 3.7 | Focused on a multi-project/ multi-environment scenario, the Community Build Framework (CBF) provide a way to set up and deploy Pentaho-based applications. It is an Apache Ant, Java build-script that allows a user to create a template of their Pentaho BA Server installation, including patches and any customizations or special content and roll it out quickly. It can help migrations to new versions of the BA Server, and with rapidly producing customized Pentaho servers for clients. |
| Community Data Access (CDA) | Server plug-in | latest | Acts as a common layer for accessing data on the Pentaho BA server. CDA files can contain SQL, MDX, Pentaho Data Integration transformation files, scripted data sources and more. CDA also provides a REST API for directly calling the Pentaho BA server and receiving the results of a query back as JSON, XML, XLS, HTML or CSV. The default is JSON. HTML output makes it easy for MS Excel users to perform Web queries and pull results directly into an Excel workbook without additional software in the middle. CDA comes bundled in all of Pentaho's servers. |
| Community Data Browser (CDB) | Server plug-in | | Community Data Browser uses a visual OLAP browser called Saiku to create a query which can be used by R for performing analytics on the result set. |
| Community Distributed Cache (CDC) | Server plug-in | latest | |
| Community Data Generator (CDG) | PDI Jobs | N/A | CDG is a data warehouse generator that helps create sample data for creating proof of concept dashboards. Given the definition of dimensions that we want, CDG will randomize data within certain parameters and output 3 different things: |
| Community Data Validation (CDV) | Server plug-in | | CDV adds the ability of creating validation tests on the Pentaho BA server for the purpose of verifying both the integrity of the server itself and also the data being used by the server. |
| Community Graphics Generator (CGG) | Server plug-in | latest | |
| Community Dashboard Editor (CDE) | Server plug-in | 20120719 | CDE is an advanced user tool for creating dashboards in the Pentaho BA server. CDE and the technology underneath (CDF, CDA and CCC) allows users to develop and deploy dashboards in the Pentaho platform in a fast and effective way. It is not as user friendly as Pentaho Dashboard Designer plug-in, but enables users to create much more sophisticated designs. |
| Community Dashboard Framework (CDF) | Server plug-in | 4.8-stable | CDF comes bundled in all of Pentaho's servers. It is the framework used both by CDE and Pentaho's Dashboard Designer to create dashboards on the system. |
| Community Startup Tabs (CST) | Server plug-in | 1.0 | Out of the box a Pentaho BA Server comes with a user interface called the Pentaho User Console (PUC) which show all content by opening tabs within itself. Community Startup Tabs provide an easy way to define and show specialized content to users by automatically opening tabs when they sign in. |
| Saiku | Server plug-in | latest | Saiku is a modular open-source analysis suite offering lightweight OLAP which remains easily embeddable, extendable and configurable. It is similar in form and function to the Pentaho Analyzer plug-in. |
| Saiku-Reporting | Server plug-in | 1.0-GA | A rapidly developing AD HOC reporting tool, similar to Pentaho's Interactive Reporting plug-in. Key Features: |

== Licensing ==
Pentaho followed an open core business model for several years, however with their 10.2 release in 2024 switched to non-OSS licensing. The new licensing doesn't allow for running in production without subscribing to their Enterprise Edition.

It provides two different editions of Pentaho Business Analytics: a Developer Edition (non production use only) and an Enterprise Edition. The enterprise edition needs to be purchased on a subscription model. The subscription model includes support, services, and product enhancements via annual subscription. The enterprise edition is available under a commercial license. Enterprise license goes with 3 levels of Pentaho Enterprise Support: Enterprise, Premium and Standard.

==See also==

- Nutch - an effort to build an open source search engine based on Lucene and Hadoop, also created by Doug Cutting
- Apache Accumulo - Secure Big Table
- HBase - Bigtable-model database
- Hypertable - HBase alternative
- MapReduce - Google's fundamental data filtering algorithm
- Apache Mahout - machine learning algorithms implemented on Hadoop
- Apache Cassandra - a column-oriented database that supports access from Hadoop
- HPCC - LexisNexis Risk Solutions High Performance Computing Cluster
- Sector/Sphere - open-source distributed storage and processing
- Cloud computing
- Big data
- Data-intensive computing
- The Apache project HOP is a project started as a fork of Kettle
