Aster Data Systems

From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses, see Aster (disambiguation).
Teradata Aster Analytics
Public, Part of Teradata Corp
Industry Advanced Analytics
Fate Acquired
Founded 2005
Headquarters San Carlos, California USA
Products Teradata Aster Analytics
Parent Teradata
Website www.teradata.com/Teradata-Aster/overview

Aster Data Systems is a data management and analysis software company headquartered in San Carlos, California. It was founded in 2005 and acquired by Teradata in 2011.

Products[edit]

Teradata Aster® Analytics is a multi-genre, scalable, advanced analytics solution that empowers business users to uncover and operationalize non-intuitive insights.

Aster Analytics combines different data stores, analytic processing, and analytic algorithms that are typically found across multiple silo’d platforms within a single analytic solution to solve complex business problems.

Aster Analytics leverages SQL, MapReduce, Graph, and R functions on data across row, column and file stores. The ability to combine analytics from multiple genres (such as machine learning, statistics, path, pattern, graph, or text) within a single work stream is a unique differentiator for Aster Analytics.

Aster Analytics is based on a Massively Parallel Processing (MPP) architecture, where tasks are run simultaneously across multiple nodes for more efficient processing. The Aster Database includes three analytic engines (SQL, MapReduce, and Graph) designed to provide optimal processing of the analytic task across massive volumes of data. In addition, the analytics are performed directly in the database to eliminate data movement and to leverage the MPP.

While most big data platforms require specialized data science skills to design, build, and maintain parallel MapReduce programs, Aster Analytics provides interfaces to enable a broad range of users to discover insights from their data. Highly technical users such as programmers and data scientist can code directly to the APIs for the engines to develop their own analytics. Business analysts with SQL skills can leverage over 100 prebuilt functions to easily run sophisticated analytics against their data. It’s simple to deploy these analytic programs using Teradata Aster AppCenter to enable self-service big data discovery by enterprise users.

The complete Aster Analytics software solution is made up of several components: the database and client, the analytics portfolio, and AppCenter.

Aster Database and Client

The Aster Database is a parallel database that enables users to store and process multi-structured data. This section describes the components and provides a compare/contrast analysis with a traditional relational database.

  • Data Stores: Relational databases require a data model with structure such as third normal form or a star schema. While the Aster Database can load multi-structured data in raw form. Machine data, web logs, text, or Internet of Things data can be loaded as a row, column and file store, and transformed into structured data for further analytic processing.
  • Analytic Engines: Aster Analytics offers three different analytic engines (SQL, SQL-MapReduce, and SQL-GR) that can be used together within a single query, delivering optimal performance. Additionally, users can install open source R as an “add-on” engine for R processing. Each is described in more detail below:
    • SQL – Aster Analytics offers a true relational SQL engine for efficient set-based processing. Many of the statistical techniques, sorting, merging and aggregations are best done in SQL.
    • SQL-MapReduce – MapReduce is a programming model (not a language) and implementation for processing large multi-structured data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of servers. MapReduce is better suited than SQL for row-over-row processing use cases, such as pattern detection, affinity analysis, and collaborative filtering. The SQL implementation of such use cases translates into expensive full-table scans and multiple table joins, whereas MapReduce uses a procedural framework to process such use cases more efficiently. Iterative queries such as affinity analysis of more than two products, text analytics, and many machine learning techniques are typically processed more efficiently using MapReduce vs. SQL.
    • SQL-GR - Aster Analytics SQL-GR is a native graph processing engine that makes it easy to perform powerful graph analysis, especially complex graph analysis such as social network/influencer analysis, fraud detection, supply chain management, network analysis and threat detection, and money laundering across big data sets. SQL-GR is based on the Bulk Synchronous Processing (BSP) model, and uses massively iterative, distributed and parallel processing to solve complex graph problems. Aster SQL-GR is focused on large-scale graph analytics, with processing architectures and APIs designed specifically for parallel execution on commodity-class clusters.
    • R –Open source R and R packages can be embedded directly in the Aster Analytics database, where R scripts can be executed on one or many v-Workers for concurrent processing across multiple v-Workers. With Aster 6.20, the Aster R client package allows users to run R scripts through the R “apply” family of functions. This eliminates the need to know SQL. In addition, Aster R provides a library of prebuilt parallel functions that leverages the SQL-MapReduce and SQL engines. Aster R also includes combiners to implement the split/apply/combine construct to parallelize R functions.
  • Seamless Network Analytics Processing (SNAP) - The SNAP framework) works transparently to deliver an integrated end user experience with a single user interface and a single high-level language (SQL) across multiple processing engines in the top layer and data stores in the bottom layer.
    • Using the Unified SQL Interface, analysts and data scientists can invoke and combine various types of analytics (Graph, Path/Pattern, Text, SQL and Statistics) in a single SQL statement.
    • Integrated Optimizer takes the query with multiple analytics and parses it into multiple sub-queries to be executed by workload-specific execution engines like SQL-GR and SQL-MapReduce. The integrated optimizer provides powerful advanced optimizations, like global collaborative planning and adaptive optimization and execution, by eliminating redundant operations and reordering of operators in a query. Integrated Executor orchestrates the execution of query and iterative phases across multiple engines. It passes parameters and results and performs common workload management and monitoring across engines.
    • Common Storage System and Services provide a unified data storage architecture that eliminates wasteful and redundant effort. It is a plug and play system designed for typed data stores and provides common services such as fault tolerance, replication, security, and snapshots across the data stores.
  • Aster Client (Aster R) – Aster Client includes client tools such as Aster Database Cluster Terminal, ODBC, JDBC and .NET interfaces, Loaders and Exporter, and most notably Aster R.
    • Aster R - Package that allows an R user to interact with Aster through an R interface. The package provides wrappers for R functions that run SQL or SQL-MR functions in Aster Analytics to parallelize them. Aster R package also provides runners that allow users to run R scripts across single or multiple v-Workers to run tasks concurrently. These runners also provide a combiner that allows programmers to write custom SQL parallel programs using the MapReduce constructs. This is a powerful capability that allows R programmers to exceed the data, processing, and memory limitations of R.

The Aster Analytics Portfolio

The integrated discovery portfolio provides over 100 ready-to-use SQL-based functions that address the big data analytics and discovery process. These analytics are defined in detail in the Aster Analytics Portfolio white paper.

Statistical Analysis

Approximate Distinct Count

Approximate Percentile

Average

Confusion Matrix

Correlation

CoxPH, CoxPredict, CoxSurvFit

Distribution Matching

Exponential Moving Average

Enhanced Histogram

F-Measure

Generalized Linear Model (GLM)

GLM Predict

Histogram

K-Nearest Neighbor Classification (KNN)

LARS Functions

Linear Regression

Logistic Predict (deprecated)

Logistic Regression (deprecated)

LRTEST

Percentile

Principal Component Analysis

Sample

Simple Moving Average

Support Vector Machines

Vector Distance

Volume Weighted Average Price

Weighted Moving Average

Decision Trees

Forest Drive, Predict & Analyze

Single Tree Drive & Predict

Naïve Bayes

Naïve Bayes Map, Reduce & Predict

Data

Transformation

Antiselect

Apache Log Parser

FellegiSunter Trainer & Predict

E-Mail Parser

Identity Match

IpGeo

JSON Parser

Multicase

Murmurhash

Outlier Filter

PST Parser AFS

Pack

Pivot

Scale Functions

String Similarity

Sessionization

Unpack

Unpivot

URI Pack & Unpack

XML Parser & Relation

Cluster Analysis

Canopy

Gaussian Mixture Model

KMeans

KMeansPlot

Minhash

Association Analysis

Basket Generator

Collaboration Filter

WSRecommender

Graph Analysis

All Pairs Shortest-Path

Betweenness

Closeness

Eigenvector Centrality

gTree

Hidden Markov Model

Local Clustering Coefficient

Loopy Belief Propagation

Modularity

nTree

PageRank

Personalized SALSA

Random Walk Sample

Shapley Value

Text Analysis

Chinese Text Segmentation

Latent Dirichlet Allocation (LDA)

Levenshtein Distance

Naïve Bayes Text Classifier

Named Entity Recognition (NER)

nGram

POS Tagger

Sentenizer

Sentiment Extraction

Term Frequency-Inverse Document Frequency (TF-IDF)

Text Classifier

Text Chunker

Text Morph

Text Parser

Text Tagging

Text Tokenizer

Time Series, Path &

Pattern Matching

Arima & Predictor

Attribution

Burst

Cumulative Moving Average

Dynamic Time Warping (DTW)

Discrete Wavelet Transforms

Discrete Wavelet Transforms on 2 Dimensional Input

Frequent Paths

Inverse Discrete Wavelet Transforms

Inverse Wavelet Transforms on Multiple Sequences

Interpolator

nPath

Path Generator, Starter & Summarizer

Symbolic Aggregation Approximation (SAX)

Sessionization

Shapelets

Location Analysis

Load Geometries

Point in Polygon

Geometry Overlay

Visualization

CfilterViz

NpathViz

Aster Database

System Function

nc_skew

nc_relationstats

Figure 1: The Aster Analytics Portfolio

Teradata Aster AppCenter

Teradata Aster AppCenter provides a framework that allows data scientists and business analysts to embed their SQL or Java analytic code into apps which can be shared or deployed within AppCenter as shown in the diagram below. The AppCenter framework provides a web-based portal to manage and view apps, services to build apps, and RESTful API and other methods for third-party tool integration. These apps are stored, managed, and invoked through a web-based interface, allowing self-service access to users of all skill levels.

History[edit]

Aster Data was co-founded in 2005 by Stanford University graduate students George Candea, Mayank Bawa and Tasso Argyros.[1][2] It received funding from First Round Capital, Sequoia Capital, Institutional Venture Partners, Cambrian Ventures, Jafco Ventures as well as angel investors including Rajeev Motwani, Ron Conway and David Cheriton.[3] It received first round of funding of $5 million in 2005, then a second round of $17 million in February 2009, and third round of $30 million in September 2010.[4]

Teradata had acquired an 11 percent ownership interest in Aster Data Systems in September 2010. On March 3, 2011, Teradata agreed to pay an additional $263 million for the remaining ownership interest, net of debt and other expenses.[5][6] The acquisition completed in April 2011.[2] In September 2011 a computer appliance version of the product was announced, with pre-configured software bundled with hardware.[7]

In October 2012, Aster announced a second version of its appliance. In addition to the Aster database software, the new appliance was available with nodes running the Hortonworks distribution of Apache Hadoop.[8][9]

In October 2013, version 6 of Aster database software was announced. It supported graph database technology, and a file system that the company said was compatible with the Hadoop distributed file system.[10][11]

In June 2014, Teradata introduced Aster® R, which extends the power of open source R analytics by lifting the memory and processing limitations. Aster R offers the R analyst an enterprise-ready business analytics solution that is massively scalable, reliable, and easy-to-use. Aster R leverages a high-performance computer platform with all the benefits of security, data management, and an ensemble of analytics.[12]

In February 2015, Teradata announced new big data apps powered by the Teradata Aster AppCenter. The big data apps are focused on industries that include consumer finance, entertainment and gaming, healthcare, retail, and telecommunications. The big data apps use AppCenter, which provides a common framework to build, deploy, and consume interactive, web-based applications.[13]

In October, 2015 Teradata announced Aster Analytics on Hadoop, which is an integrated analytics solution featuring a set of more than 100 business-ready, distinctly different analytics techniques and seven industry applications to run directly on Hadoop®. Teradata Aster Analytics on Hadoop allows users to combine machine learning, text, path, pattern, graph, and statistics within a single workflow.[14]

Clients and partners[edit]

Since the acquisition by Teradata in 2011, the Aster Analytics customer base has grown exponentially. Today, leading organizations across the globe in a variety of industries like telecommunications, financial services, retail, and manufacturing use Aster Analytics for use cases as diverse as omni-channel customer experiences, digital marketing, and preventive machine maintenance. Aster Analytics customers include household names like Verizon Wireless, Wells Fargo, Discover, NCR, Dell, Siemens, and Swisscom.

Aster Analytics has strong partnerships with various vendors. Featured technology partners include Dell (for its hardware platform), MicroStrategy, Tableau, Qlik, Spotfire (to integrate MapReduce and business intelligence software), and Informatica (for its data integration software).

Recognition[edit]

For his key leadership and visionary role in creating and establishing the “Big Data” market; Aster co-founder/CEO and IIT Bombay alumn Mayank Bawa, B.Tech, Computer Science, Class of 1999, was awarded Young Alumni Achiever Award at IIT Bombay's 55th Foundation Day held on Monday 10 March 2014.

Aster Data hosts the "Data Analytics Summit" (also known as the "Big Data Summit"), made up of regional events in Chicago, Washington, D.C., Dallas, Burlingame, California, and New York City.[15]

Aster's former Chief Technical Officer, Tasso Argyros, was named a "2011 Technology Pioneer" by the World Economic Forum in the category of Information Technologies and New Media,[16] and was positioned as a "Company to Watch" in 2010 by Intelligent Enterprise's Editor's Choice.[17][18] It was ranked seventh on the 2011 "top 50 venture-funded companies" by the Wall Street Journal.[19]

References[edit]

  1. ^ "George Candea". Dependable Systems Laboratory faculty profile. École Polytechnique Fédérale de Lausanne. Retrieved August 12, 2011. 
  2. ^ a b "Teradata Completes Acquisition of Aster Data: Bringing together two leaders to deliver powerful insights for organizations". News release (Teradata). April 6, 2011. Retrieved August 12, 2011. 
  3. ^ "Aster Data Systems Extends Oversubscribed Series B Funding to $17 Million IVP Adds $5 Million in Additional Funding for Proven Leader in Frontline Data Warehousing". News release (Institutional Venture Partners). February 25, 2009. Retrieved August 12, 2011. 
  4. ^ Jason Kincaid (September 22, 2010). "Aster Data Raises Another $30 Million To Help Manage Big Data". TechCrunch. Retrieved August 12, 2011. 
  5. ^ "Teradata to Acquire Aster Data". News release (Aster Data). March 3, 2011. Archived from the original on 19 July 2011. Retrieved August 12, 2011. 
  6. ^ Timothy Prickett Morgan (March 3, 2011). "Teradata snaps up Aster Data for $263m: Big data analytics". The Register. Archived from the original on 28 June 2011. Retrieved August 12, 2011. 
  7. ^ Chris Kanaracus (September 22, 2011). "Teradata's Aster analytic database gets appliance treatment". Info World. Retrieved October 25, 2011. 
  8. ^ Tony Baer (October 26, 2012). "It's happening: Hadoop and SQL worlds are converging". ZDNet. Retrieved October 14, 2013. 
  9. ^ "Teradata Big Analytics Appliance Enables New Business Insights on All Enterprise Data". Press release (Teradata Aster). October 17, 2012. Retrieved October 14, 2013. 
  10. ^ Andrew Brust (October 8, 2013). "Teradata Aster gets graph database, HDFS-compatible file store". ZDNet. Retrieved October 14, 2013. 
  11. ^ "Teradata Aster Discovery Platform Liberates Data Scientists Around the World". Press release (Teradata Aster). October 8, 2013. Retrieved October 14, 2013. 
  12. ^ "Teradata Lifts the Limitations on Open Source R Analytics". www.teradata.com. Retrieved 2015-12-17. 
  13. ^ "Teradata Launches Next-Generation Big Data Apps". www.teradata.com. Retrieved 2015-12-17. 
  14. ^ "Breakthrough Teradata Software Pushes the Analytic Edge with Internet of Things Data". www.teradata.com. Retrieved 2015-12-17. 
  15. ^ "Data Analytics Summit: The premiere event for supercharging analytics on big data". official web site. Retrieved August 12, 2011. 
  16. ^ "Tasso Argyros, CTO, Aster Data and World Economic Forum Technology Pioneer 2011". World Economic Forum. Retrieved August 12, 2011. 
  17. ^ "Aster Data Honored as "Company to Watch" in the 2010 Intelligent Enterprise Editors' Choice Awards" (Press release). Aster Data. March 8, 2010. Retrieved August 12, 2011. 
  18. ^ "Intelligent Enterprise Editors' Choice Awards 2010". Information Week. February 11, 2010. p. 5. Retrieved August 12, 2011. 
  19. ^ "The Top 50 Venture-Backed Companies". Wall Street Journal. March 20, 2011. Archived from the original on 29 June 2011. Retrieved August 12, 2011. 

External links[edit]