Jump to content

SAP HANA

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 8.28.150.80 (talk) at 20:23, 20 May 2013. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

SAP HANA
Developer(s)SAP AG
Stable release
SAP HANA 1.0 SPS5 / November 14, 2012; 11 years ago (2012-11-14)
Written inC, C++
Available inMulti-lingual
TypeIn-memory RDBMS
LicenseProprietary
Websitewww.saphana.com

www.sap.com/hana

SAP Community Network

SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]

  • SAP HANA DB (or HANA DB) refers to the database technology itself,
  • SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
  • SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as an appliance. It also includes the modeling tools from HANA Studio as well as replication and data transformation tools to move data into HANA DB,[2]
  • SAP HANA One refers to a deployment of SAP HANA certified for production use on the Amazon Web Services (AWS) cloud.[3] (see below)
  • SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).

HANA DB takes advantage of the low cost of main memory (RAM), data processing abilities of multi-core processors and the fast data access of solid-state drives relative to traditional hard drives to deliver better performance of analytical and transactional applications. It offers a multi-engine query processing environment which allows it to support both relational data (with both row- and column-oriented physical representations in a hybrid engine) as well as graph and text processing for semi- and unstructured data management within the same system. HANA DB is 100% ACID compliant.[2]

While HANA has been called variously an acronym for HAsso's New Architecture[4] (a reference to SAP founder Hasso Plattner) and High Performance ANalytic Appliance, HANA is a name not an acronym.[5]

History

SAP HANA is the synthesis of three separate products – TREX, P*Time and MaxDB.

  1. TREX (Text Retrieval and Extraction) is a search engine. It began in 1996 as a student project at SAP in collaboration with DFKI. TREX became a standard component in SAP NetWeaver in 2000. In-memory attributes were added in 2002 and columnar data store was added in 2003, both as ways to enhance performance.
  2. In 2005 SAP acquired Menlo Park based Transact in Memory, Inc.[6] With the acquisition came P*Time, an in-memory light-weight online transaction processing (OLTP) RDBMS technology with a row-based data store.
  3. MaxDB (formerly SAP DB), a relational database coming from Nixdorf via Software AG (Adabas D) to SAP, was added to TREX and P*Time to provide persistence and more traditional database features like backup.

In 2008, SAP CTO Vishal Sikka wrote about HANA "...our teams working together with the Hasso Plattner Institute and Stanford University demonstrated how a new application architecture is possible, one that enables real-time complex analytics and aggregation, up to date with every transaction, in a way never thought possible in financial applications".[4] In 2009 a development initiative was launched at SAP to integrate the three technologies above to provide a more comprehensive feature set. The resulting product was named internally and externally as NewDB until the change to HANA DB was finalized in 2011.

SAP HANA is not SAP's first in-memory product. Business Warehouse Accelerator (BWA, formerly termed BIA) was designed to accelerate queries by storing BW infocubes in memory. This was followed in 2009 by Explorer Accelerated where SAP combined the Explorer BI tool with BWA as a tool for performing ad-hoc analyses. Other SAP products using in-memory technology were CRM Segmentation, By Design (for analytics) and Enterprise Search (for role based search on structured and unstructured data). All of these were based on the TREX engine.

Taking a different approach Advanced Planning and Optimization (APO) used LiveCache for its analytics.

Versions, service packs

SAP co-founder (and Chairman of the SAP Supervisory Board as of 2012) Hasso Plattner advocated a ‘versionless’ system for releases. The support packages to date have been:[1]

  • SP0 – released 20 November 2010; HANA first public release
  • SP1 – released 20 June 2011; HANA general availability (GA); focus is as an operation data mart
  • SP2 – released 27 June 2011; more data mart functions
  • SP3 a.k.a HANA 1.5 – released 7 November 2011); focus is on HANA as the underlying database under Business Warehouse (BW); also named Project Orange
  • SP4 – Q2, 2012; resolved a variety of stability issues and add new features for BW, according to SAP
  • SP5 – Feb, 2013; introduces Extended Application Services (REST driver)[7][8]

Market position

Big data

Big data refers to datasets that exceed the abilities of commonly used tools. While no formal definition based on size exists, these datasets typically reach terabytes (TB), petabytes (PB), or even exabytes in size. SAP has positioned HANA as its solution to big data challenges at the low end of this scale.[9] At launch HANA started with 1TB of RAM supporting up to 5TB of uncompressed data. In late 2011 hardware with 8TB of RAM became available which supported up to 40TB of uncompressed data. SAP owned Sybase IQ with its more mature MapReduce-like functionality has been cited as a potentially better fit for larger datasets.[9][10] By May 2012, HANA was able to run on servers with 100TB main memory powered by IBM. Hasso Plattner claimed that the system was big enough to run 8 largest SAP customers.[11]

Other databases marketed by SAP

SAP still offers other database products:

As a database agnostic company,[12] SAP also resells databases from vendors such as IBM, Oracle and Microsoft to sit under their ERP Business Suite.

Competition

Offering its own database solution to support its Business Suite ERP puts SAP in direct competition with some of its largest partners IBM, Microsoft and Oracle. Among the more prominent competing products are:

  • Appliances

Applications

Strategic workforce planning

SAP Business Objects Strategic Workforce Planning (SWP) was among the first SAP applications to be redesigned to take advantage of HANA's abilities. SWP on HANA is aimed at HR executives who want to simulate workforce models in real-time taking into account turnover, retirement, hiring and other variables.[13]

Smart Meter Analytics

In September 2011 SAP released its Smart Meter Analytics tool. This is to help utility companies with large smart meter deployments to manage and use the large amount of data generated by such meters.

Ecosystem

Hardware Partners

As of 2012, seven partners have hardware solutions certified for HANA.[1][14] In alphabetic order they are

  1. Cisco[15]
  2. Dell[16]
  3. Fujitsu[17]
  4. Hitachi[18]
  5. HP[19]
  6. IBM[20]
  7. NEC[21]

Developers Community

The focal point of the community of developers on SAP HANA platform is SAP HANA Developer Center or "the DevCenter". The DevCenter offers general information, education materials, community forums, plus access to SAP HANA database with free licenses:

Access to some materials and features may require free registration.

SAP HANA Cloud Options

In September 2011 SAP announced its intentions to partner with EMC and VMware to enable a HANA based application infrastructure cloud.[22] This platform as a service (PaaS) offering includes HANA DB-as-a-service in conjunction with a choice of either a Java-based or ABAP-based stack. Applications built for either stack will have access to HANA DB through a variety of APIs. The Java based approach, codenamed Project River, is based on the NetWeaver 7.3.1 Java application server. The ABAP-based approach is designed more for SAP's existing user base - for example in the SAP Business ByDesign suite of business applications including ERP, CRM and supply chain management.[23]

On October 16, 2012 SAP announced general availability of two SAP HANA options delivered in the cloud:[3]

  • SAP NetWeaver Cloud (now called SAP HANA Cloud[24]) – an open standards-based application service and
  • SAP HANA One – a deployment of SAP HANA on the Amazon Web Services cloud on an hourly basis. Only 60GB option is available and a 24/7 instance costs $30,572/year,[25] though an upfront commitment with Amazon can substantially reduce the hardware portion of the cost.

Technology

Architecture

At its most basic, the architecture of the HANA database system has the following components.[2]

  • Four Management services
  • The Connection and Session Management component manages sessions/connections for database clients. Clients can use a variety of languages to communicate with the HANA database.
  • The Transaction Manager component helps with ACID compliance by coordinating transactions, controlling transactional isolation and tracking running and closed transactions.
  • The Authorization Manager component handles all security and credentialing (see Security below).
  • The Metadata Manager component manages all metadata such as table definitions, views, indexes and the definition of SQL Script functions. All metadata, even of different types, is stored in a common catalog.
  • Three Database Engine components
  • Calculation Engine component executes on calculation models received from SQL Script (and other) compilers.
  • Optimizer and Plan Generator component parses and optimizes client requests.
  • Execution Engine component invokes the various In-Memory Processing Engines and routes intermediate results between consecutive execution steps based on the optimized execution plan.
  • Three In-Memory Storage Engines
  • Relational Engine (see Column and row store below)
  • The Graph Engine (where should this go?)
  • Text Engine (see Unstructured data below)
  • Persistency Layer (see Storage below)

Column and row store

The Relational Engine supports both row- and column-oriented physical representations of relational tables. A system administrator specifies at definition time whether a new table is to be stored in a row- or in a column-oriented format. Row- and column-oriented database tables can be seamlessly combined into one SQL statement, and subsequently, tables can be moved from one representation form to the other.

The row store is optimized for concurrent WRITE and READ operations. It keeps all index structures in-memory rather than persisting them on disk. It uses a technology that is optimized for concurrency and scalability in multi-core systems. Typically, Metadata or rarely accessed data is stored in a row-oriented format.

Compared to this, the column store is optimized for performance of READ operations. Column-oriented data is stored in a highly compressed format in order to improve the efficiency of memory resource usage and to speed up the data transfer from storage to memory or from memory to CPU. The column store offers significant advantages in terms of data compression enabling access to larger amounts of data in main memory. Typically, user and application data is stored in a column-oriented format to benefit from the high compression rate and from the highly optimized access for selection and aggregation queries.

Business Function Library

The Business Function Library is a reusable library (similar to stored procedures) for business applications embedded in the HANA calculation engine. This eliminates the need for developing such calculations from scratch. Some of the functions offered are

Predictive Analysis Library

Similar to the Business Function Library, the Predictive Analysis Library is a collection of compiled analytic functions for predictive analytics. Among the algorithms supported are

R integration

R is a programming language designed for statistical analysis. An open source initiative (under the GNU Project) R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[26] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript.[27][28]

Storage

The Persistency Layer is responsible for the durability and atomicity of transactions. It manages data and log volumes on disk and provides interfaces for writing and reading data that are leveraged by all storage engines. This layer is based on the proven persistency layer of MaxDB, SAP’s commercialized disk-centric relational database. The persistency layer ensures that the database is restored to the most recent committed state after a restart and that transactions are either completely executed or completely undone. To achieve this efficiently, it uses a combination of write-ahead logs, shadow paging, and savepoints.

Buffer management

Logging and transactions

HANA's persistence layer manages logging of all transactions in order to provide standard backup and restore functions. The same persistence layer manages both row and column stores. It offers regular save points and logging of all database transaction since the last save point.[29]

Concurrency and locking

HANA DB uses the multiversion concurrency control (MVCC) principle for concurrency control. This enables long-running read transactions without blocking update transactions. MVCC, in combination with a time-travel mechanism, allows temporal queries inside the Relational Engine.[2][30]

Data retrieval

Reporting

Unstructured data

Since ever more applications require the enrichment of normally structured data with semi-structured, unstructured, or text data, the HANA database provides a text search engine in addition to its classic relational query engine.

The Graph Engine supports the efficient representation and processing of data graphs with a flexible typing system. A new dedicated storage structure and a set of optimized base operations are introduced to enable efficient graph operations via the domain-specific WIPE query and manipulation language. The Graph Engine is positioned to optimally support resource planning applications with huge numbers of individual resources and complex mash-up interdependencies. The flexible type system additionally supports the efficient execution of transformation processes, like data cleansing steps in data-warehouse scenarios, to adjust the types of the individual data entries, and it enables the ad-hoc integration of data from different sources.

The Text Engine provides text indexing and search abilities, such as exact search for words and phrases, fuzzy search (which tolerates typing errors), and linguistic search (which finds variations of words based on linguistic rules). In addition, search results can be ranked and federated search abilities support searching across multiple tables and views. This functionality is available to applications via specific SQL extensions. For text analyses, a separate Preprocessor Server is used that leverages SAP’s Text Analysis library.[2]

Data provisioning

Replication services

The figure above gives an overview of the alternative methods for data replication from a source system to a HANA database. Each method handles the required data replication differently, and consequently each method has different strengths. It depends on your specific application field and the existing system landscape as to which of the methods best serves your needs.

Trigger-Based Data Replication Using SAP Landscape Transformation (LT) Replication Server is based on capturing database changes at a high level of abstraction in the source ERP system. This method of replication benefits from being database-independent, and can also parallelize database changes on multiple tables or by segmenting large table changes.

Extract, transform, load (ETL) based data replication uses SAP BusinessObjects Data Services to extract the relevant business data from a source system such as ERP and load it into a HANA database. In addition, the ETL-based method offers options for the integration of third-party data providers. Replication jobs and data flow are configured in Data Services. This permits the use of multiple data sources (including external ones) and data validation.[30]

Transaction Log-Based Data Replication Using Sybase Replication is based on capturing table changes from low-level database log files. This method is database-dependent. Database changes are propagated for each database transaction, and they are then replayed on the HANA database. This maintains consistency, but at the cost of being unable to use parallelizing to propagate changes.(rewrite)[30]

Operations, administration

Backup and recovery

Immediately after launch, with Service Pack 2, backup and recovery abilities were limited to either Recovery to Last Back-up or Older Data Back-up or Recovery to Last State Before Crash. Additional backup features were implemented in Service Pack 3. These included a Full Automatic or Manual Log Backup option and a Point In-Time Recovery option. New administration features included a new Backup Catalog which records all backup attempts.[31]

Modeling

Non-materialized views

One implication of HANA’s ability to work with a full database in memory is that computationally intensive KPI calculations can be completed rapidly when compared to disk based databases. Pre-aggregation of data in cubes or storage of results in materialized views is no longer necessary.[32]

Information Composer

SAP HANA Information Composer is a web based tool which allows users to upload data to a HANA database and manipulate that data by creating Information Views. In the data acquisition portion, data can be uploaded, previewed and cleansed. In the data manipulation portion objects can be selected, combined and placed in Information Views which can be used by SAP BusinessObjects tools.[33]

Security

Security and role based permissions are managed by the Authorization Manager in HANA DB. Besides standard database privileges such as create, update or delete HANA DB also supports analytical privileges that represent filters or drill-down limitations on queries as well as access control access privileges to values with certain attributes. HANA DB components invoke the Authorization Manager whenever they need to check on user privileges. The authentication can then be done either by the database itself or be further delegated to an external authentication provider, such as an LDAP directory.[2]

Performance and scalability

SAP has stated that customers have realized gains as high as 100,000x in improved query performance when compared to disk based database systems.[34]

Benchmarks

In March 2011, Wintercorp (an independent testing firm specializing in large scale data management) was retained by SAP to audit test specifications and results from test runs. The test used concepts similar to those of the industry standard TPC-H benchmark. The test data had between 600 million and 1.8 billion rows and the test ran five analytical query types and three operational report query types. The combined throughput of analytical and operational report queries ran between 3007 queries/hour and 10,042 queries per hour depending on the volume of data.[35]

Scale-out architecture

To enable scalability in terms of data volumes and the number of application requests, the HANA database supports scale-up and scale-out. For scale-up, all algorithms and data structures are designed to work on large multi-core architectures especially focusing on cache-aware data structures and code fragments. For scale-out, the HANA database is designed to run on a cluster of individual machines allowing the distribution of data and query processing across multiple nodes.[2]

Competitors

Competing in-memory databases for online transaction processing and analytics workloads include:

References

  1. ^ a b c Appleby, John. "Updated: The SAP HANA FAQ - answering key SAP In-Memory questions". Bluefin Solutions (Corporate Blog). Retrieved 23 January 2012.
  2. ^ a b c d e f g Färber, Franz (2011). "SAP HANA Database – Data Management for Modern Business Applications" (PDF). SIGMOD Record. 40 (4): 45–51. Retrieved 24 January 2012. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)
  3. ^ a b "SAP Introduces SAP HANA® Cloud, One of the Industry's First In-Memory Cloud Platforms". SAP (Corporate Press release). Retrieved 20 March 2013.
  4. ^ a b Sikka, Vishal. "Timeless Software". Timelessness / Blogger.com (personal blog). Retrieved 19 January 2012.
  5. ^ Desmond, Paul (11 August 2011). "SAP HANA – Updating the Naming Conventions". ERP Executive. Retrieved 23 January 2012.
  6. ^ "Transact In Memory, Inc". Bloomberg Businessweek. Retrieved 19 January 2012.
  7. ^ "SAP HANA forum: SP5 availability". Retrieved 7 March 2013.
  8. ^ Jung, Thomas. "SAP HANA Extended Application Services". Retrieved 7 March 2013.
  9. ^ a b Woods, Dan (5 January 2012). "Bringing Value of Big Data to Business: SAP's Integrated Strategy". Forbes. Retrieved 23 January 2012.
  10. ^ Rudnytskiy., Vitaliy. "Big Data and SAP HANA? Or Sybase IQ?". Vital BI / Wordpress (Personal Blog). Retrieved 23 January 2012.
  11. ^ "IBM and SAP create the world's largest SAP HANA system". IBM.
  12. ^ Greenbaum, Joshua. "A Revolution Threatens the Relational Database". IT Business Edge. Retrieved 23 January 2012.
  13. ^ "HANA and the Future of Business Intelligence". EPI-USE Systems Limited. Retrieved 19 March 2013.
  14. ^ Appleby, John. "SAP HANA: an analysis of the major hardware vendors". People, Process, Technology (personal blog). Retrieved 19 January 2012.
  15. ^ "SAP High-Performance Analytic Appliance" (PDF). Cisco Systems, Inc. Retrieved 19 January 2012.
  16. ^ "Dell Strengthens ERP Solutions Portfolio with PowerEdge R910 Server Now Certified to Run SAP® In-Memory Appliance (SAP HANA™)". Dell Inc. Retrieved 19 January 2012.
  17. ^ "SAP Solutions: SAP High Performance Analytic Appliance" (PDF). Fujitsu Limited. Retrieved 19 January 2012.
  18. ^ "New Hitachi Converged Platform for SAP HANA Helps Organizations Manage and Analyze Massive Volumes of Critical Data". Hitachi Data Systems Corporation. Retrieved 19 January 2012.
  19. ^ "HP AppSystem for SAP HANA™". Hewlett-Packard Development Company, L.P.
  20. ^ "IBM Systems and Services for SAP HANA". IBM.
  21. ^ "NECs Appliance Server for SAP HANA(R) Certified by SAP". NEC.
  22. ^ Scott, Jennifer (9 November 2011). "SAP holds hands with EMC and VMware for cloud computing push". CloudPro. Retrieved 25 January 2012.
  23. ^ Massimo Pezzini,, Daniel Sholler. "SAP Throws Down the Next-Generation Architecture Gauntlet With HANA (Research Note G00219001)". Gartner. Retrieved 25 January 2012.{{cite web}}: CS1 maint: extra punctuation (link)
  24. ^ "SAP HANA® Cloud Portal Evaluated by Gartner". Retrieved 20 March 2013.
  25. ^ "SAP HANA One AWS Marketplace". Retrieved 24 March 2013.
  26. ^ Große, Philipp (3 September 2011). "Bridging Two Worlds with RICE: Integrating R into the SAP In-Memory Computing Engine" (PDF). Proceedings of the VLDB Endowment. 4 (12): 1307–1317. Retrieved 25 January 2012. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  27. ^ HANA Pocketbook-DRAFT.pdf "HANA Pocketbook for Developers - DRAFT" (PDF). SAP. Retrieved 23 January 2012. {{cite web}}: Check |url= value (help)
  28. ^ Jitender Aswani. "Advanced Analytics with R and SAP HANA". Slideshare. Retrieved 2012-03-14. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  29. ^ "SAP HANA - Overview and Architecture". ERPHowTos.com. Retrieved 23 January 2012.
  30. ^ a b c "SAP HANA Technical Operations Manual" (PDF). SAP. Retrieved 23 January 2012.
  31. ^ Holder, Steve. "Why Hana: Where (and When) HANA Fits in Your Company's Analytics Strategy" (PDF). SAP Canada. Retrieved 25 January 2012.
  32. ^ Sevilla, Manuel. "OLAP databases are being killed by In-Memory solutions". CapGemini. Retrieved 25 January 2012.
  33. ^ "SAP HANA Information Composer" (PDF). SAP. Retrieved 25 January 2012.
  34. ^ Sikka, Vishal (29 December 2011). "The renewal of enterprise landscapes". Financial Times. Retrieved 26 January 2012.
  35. ^ Winter, Richard. "Audit Letter for the SAP HANA Performance Test, March 16, 2011" (PDF). Wintercorp. Retrieved 24 January 2012.