- 1 Oracle Data Mining (ODM)
- 1.1 Overview and Design Principles
- 1.2 History
- 1.3 Functionality
- 1.4 Data Preparation
- 1.5 Graphical User Interface: Oracle Data Miner
- 1.6 Text mining
- 1.7 PL/SQL Interface
- 1.8 Java Interface
- 1.9 Predictive Analytics and the Explain and Predict Packages
- 1.10 Spreadsheet Add-In for Predictive Analytics
- 1.11 Sequence Alignment (BLAST)
- 1.12 References
- 1.13 Further Readings
- 1.14 External Links
Oracle Data Mining (ODM)
Oracle Data Mining (ODM) is a software product distributed as an option to Oracle Corporation's Relational Database Management System (RDBMS) Enterprise Edition (EE) that contains a colection of data mining, machine learning and data analysis algorithms for classification, prediction, regression, clustering, associations, feature selection, feature extraction and sequence alignment (BLAST).
Oracle Data Mining
10.gR2 / October, 2006
|Type||data mining and analytics|
Overview and Design Principles
Oracle Data Mining (ODM) implements data mining algorithms inside the Oracle relational database. Execution tasks run asynchronously and independently of any specific user interface as part of standard database processing pipelines and applications. The model-building, scoring, and metadata management operations are accessed via a GUI (Oracle Data Miner) and either a PL/SQL or Java-based API. The main design principle is to enable data mining algorithms to operate natively on relational database tables and eliminate the need for extraction or transferring of data into a standalone server. The design takes advantage of the database environment which provides the means to efficiently execute large queries and analyze large volumes of data. The results of the data mining operations are stored in database tables and are available for access by generic SQL database queries and database-based reporting tools and applications. ODM is organized around a few generic operations providing a general unified interface for all data mining functions. These operations include functions to create, apply, test and manipulate data mining models. Models can be built using a “create model” function with parameters specifying for example the model name, the function type (e.g., classification), the input table(s), the target field, and the algorithm settings. Other functions provide descriptive information about a data mining model, testing, and management capabilities.
ODM supports the following data mining functions:
- Data transformations and model analysis:
- Naive Bayes (NB). It makes predictions using Bayes’ Theorem assuming that each attribute is conditionally independent of the others.
- Adaptive Bayes Network (ABN). It differs from Naive Bayes in that it relaxes the conditional independence assumption to include one or more predictors in a conditionally independent feature, and also determines the number and structure of the features using a greedy, recursive procedure adapted to the data. Provides two flavors of rules: aggregate., for a global understanding of the model’s decision process, and detailed: for insight into why the model made a specific prediction.
- Support Vector Machines (SVM). An implementation of SVM for binary and multi-class classification.
- Decision Trees (DT). It implements Classification & Regressions Trees containing Confidence, Support, Splitting Criterion and surrogate attributes for each node.
- Anomaly Detection.
- Support Vector Machines (SVM) implementation for the prediction of a continuous target attribute.
- Enhanced k-means (EKM). It uses a distance-based similarity measure (Euclidean) and tessellates the data space. It can create either balanced or unbalanced hierarchies and handles large data volmes via summarization.
- Orthogonal Partitioning Clustering (O-Cluster). Instead of a distance metric it uses a density based approach to find natural data clusters. It creates unbalanced hierarchical trees using active sampling and orthogonal projections.
- Association models:
- A priori algorithm (AM). Association rules capture the co-occurrence of items or events in large volumes of “transactional” data such as in the case of market basket analysis
- Feature extraction. Feature Extraction creates new set of features by decomposing the original data in a number of features far smaller than the number of dimensions (attributes).
Graphical User Interface: Oracle Data Miner
Oracle Data Mining can be accessed using Oracle Data Miner a GUI “client” that provides access to the data mining functions and structured templates called Activity Guides that prescribe the order of operations, perform all algorithm-required data transformations and provide intelligent settings and optimizations for model parameters. The user interface also allows the automated generation of Java and/or SQL code associated with the data mining activities.
Oracle Data Mining allows the use of text (unstructured data) as an input attribute. The Support Vector Machine, Association Rules, K-Means Clustering, and Non-negative Matrix Factorization algorithms can all operate on text (unstructured data).
Oracle Data Mining provides a native PL/SQL interface as a set of SQL primitives invoked in program block(s). The interface consists of two PL/SQL packages. For example the code below illustrates a call to build a classification model:
DBMS_DATA_MINING.CREATE_MODEL( model_name => 'SVM_model', function => DBMS_DATA_MINING.classification, data_table_name => ‘multitumor_train', case_id_column_name => 'id', target_column_name => ‘class', settings_table_name => 'svm_settings');
Predictive Analytics and the Explain and Predict Packages
Oracle Data Mining contains two SQL self-contained packages: PREDICT and EXPLAIN for building classification or feature selection models. The results are the predicted scores (PREDICT), or the ranked list of features (EXPLAIN), which can be used as part of an operational pipeline, or displayed on the command line or in a spreadsheet.
Spreadsheet Add-In for Predictive Analytics
This is an add-In to Microsoft Excel that allows users to access the fully automated PL/SQL PREDICT and EXPLAIN packages. The Data may be in either Excel or the Database.