Sybase IQ

From Wikipedia, the free encyclopedia
Jump to: navigation, search

SAP IQ (also known as SAP Sybase IQ and Sybase IQ) is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited[1] with pioneering the commercialization of column-store technology.

At the foundation of SAP IQ lies high-performance column store technology that allows for speed compression and ad-hoc analysis without additional tuning, as well as high scalability and cloud enablement. SAP IQ also provides application services enabling developers to build smarter apps. SAP IQ comes with in-database analytics, multilingual client APIs, federation and web enablement. It includes IQ database drives for web 2.0 programming environments and an extensibility framework for embedding analytics inside the database. The top layer comes with an ecosystem of partner solutions, tools, and apps that are embedded within SAP IQ through the application services layer. This includes Business Intelligence (BI) tools, data integration tools, database admin (DBA) tools, and packaged apps. Everything is encapsulated within a grid architecture.

SAP IQ has an open interface approach towards its ecosystem. Many popular commercial and open source business intelligence and data integration tools are certified to work with SAP IQ. Moreover, now as part of SAP, SAP IQ is also integrated with SAP’s Business Intelligence portfolio of products to form an end-to-end business analytics software stack, and is an integral component of SAP's In-Memory Data Fabric Architecture and Data Management Platform.

Awards & Recognitions[edit]

World’s Largest Data Warehouse Guinness World Record 12.1 PB – SAP IQ, SAP HANA, together with technology partners BMMsoft, HP, Intel, NetApp, and Red Hat generated a world record for the world’s largest data warehouse. “A team of engineers from SAP, BMMsoft, HP, Intel, NetApp, and Red Hat, built the data warehouse using SAP HANA and SAP IQ 16, with BMMsoft Federated EDMT running on HP DL580 servers using Intel® Xeon® E7-4870 processors under Red Hat Enterprise Linux 6 and NetApp FAS6290 and E5460 storage. The development and testing of the 12.1PB data warehouse was conducted by the team at the SAP/Intel Petascale lab in Santa Clara, Calif., and audited by InfoSizing, an independent Transaction Processing Council certified auditor.”

This new world record demonstrates SAP HANA’s and SAP IQ’s ability to efficiently handle extreme-scale enterprise data warehouse and big data analytics. Read the whitepaper on Building a 12 Petabyte Data Warehouse conducted by Infosizing

History[edit]

In the early 1990s, Waltham, Massachusetts-based Expressway Technologies, Inc. developed the Expressway 103, a column-based, engine optimized for analytics, that would eventually become Sybase IQ. Sybase acquired Expressway and re-introduced the product in 1995 as IQ Accelerator, then renamed it shortly thereafter to Sybase IQ, giving it version number 11.0.[2]

By offering the IQ product as part of a collection of related technologies often found in a data warehouse (including Sybase Adaptive Server Enterprise, Replication Server, PowerDesigner PowerDesigner, and SQL Anywhere), Sybase became one of the first mainstream companies to acknowledge the need for specialized products for the data warehouse market.[3]

With version 12.0, Sybase replaced the loosely coupled query interface from Adaptive Server Enterprise with a tight coupling with SQL Anywhere.

Version 16 brings a re-engineered column store for extreme, petabyte scale, data volumes, and more extreme data compression.[4]

Version History

With the release of SP08, the version numbers have been changed to align with and match SAP HANA’s version numbers to reflect the product’s continuous integration with SAP HANA. The actual release title SP03 is a follow-on to SP02, covering all platforms not affected by the release.

  • Version 16 SP03: Released December 2013 (Follow-on to release SP02, for all platforms not released in SP02)
  • Version 16 SP02: Released November 2013 (Linux only, provides HANA functionality support for HANA SP07 release) This functionality is available across all other platforms with the next SP release, SP03[5]
  • Version 16 SP01: Released May 2013 (Made IQ 16.0 available on IBM AIX 64-bit, HP-UX Itanium 64-bit, Linux on Power 64-bit, Solaris SPARC 64-bit, and Solaris UltraSPARC T 64-bit, which were not available with the 16.0 release in March)
  • Version 16: Released March 2013 (Re-engineered column store for petabyte scale analytics, and extreme compression, N-bit dictionary compression, data affinity for “Share Nothing” MPP query performance in a flexible “shared Everything” architecture, Hash Partitioned tables, super charged bulk loading, tiered indexing, in-memory write-optimized store, distributed query processing, scaled out query engine, role-based access control (RBAC ), LDAP authentication, added multiplex resiliency
  • Version 15.4: Released November, 2011 (Enhanced software architecture and design that supports distributed query processing for Big Data analytics with enhanced in-database analytics framework to support a native MapReduce API)[6]
  • Version 15.3: Released June, 2011 (Added software architecture and design that supports distributed query processing, Massively Parallel Processing [MPP] grid architecture with a shared-everything approach)[4]
  • Version 15.2: Released June, 2010 (Added query federation, ability to analyze structured and unstructured text data)
  • Version 15.1: Released July, 2009 (Added in-database analytics for running external C++ code inside the engine)[4]
  • Version 15.0: Released March, 2009 (Revamped the query and loading foundation to take advantage of new multi-core chip architectures and multi-tier storage technologies)[7] This release set the foundation of the Very Large Database (VLDB) platform.
  • Version 12.7: Released August 2006
  • Version 12.6: Released December 2004
  • Version 12.5: Released October 2002
  • Version 12.4.3: Released May 2001
  • Version 12.4.2: Released June 2000
  • Version 12.4: Released September 1999 (as Sybase IQ)
  • Version 12: Released February, 1999 (as Adaptive Server IQ; included integration with SQL Anywhere)
  • Version 11: Released December 1994 (as IQ Accelerator)

In-Memory Data Fabric[edit]

SAP’s revolutionary approach streamlines and simplifies Data Warehousing into an In-Memory Data Fabric[8] – providing greater speed and scale along with agility for development and efficiency that reduces data movement and data preparation. With SAP’s complete In-Memory Data Fabric architecture you gain the ability to:

  • Simplify: Reduce the work of managing and maintaining the warehouse—Eliminate staging databases, pre-computations, and artificial structures
  • Accelerate: Real-time response time regardless of data volume or data location; manage and integrate massive volumes of data
  • Predict: Predict trends, needs, and customer behavior to get the insights you need to stay ahead of your competition

Unlike most other solutions in the market, SAP’s in-memory data fabric was built from the ground up on a modern columnar database infrastructure with strategic tools for modeling, remote data access, and high performance rather than simply adding memory or columnar caching to an existing solution.

SAP IQ plays an important part in the In-Memory Data Fabric providing petabyte scale expansion.

Markets & Use Cases[edit]

SAP IQ is primarily targeted to three different use cases:

  1. High-performance DBMS engine for business reporting
  2. Big data and advanced analytics, and
  3. Extreme-scale enterprise data warehousing

SAP targets these use cases for select verticals, including financial services, telecommunications, information providers and government, retail and banking. More use cases will likely be introduced with platform integration The company has published case studies that show market penetration across the globe with large installations concentrated in North America and Asia Pacific.

SAP IQ with SAP HANA

With the advent of big data, SAP IQ has coupled with SAP HANA to deliver a distributed in-memory analytics platform. There are three main applications and use cases which try to capitalize on SAP IQ’s strengths concerning scalability and performance as an EDW and big data processor, while leveraging SAP HANA’s in-memory speed for operational reporting:

SAP IQ as a Near-line Service (NLS) to SAP HANA This combination leverages SAP HANA’s in-memory based system by using its power around agility on current or recent data to deliver real-time analytics, and SAP IQ’s cost-effective near-line storage (NLS) for massive volumes of cooler or less-critical (such as historical) data, which is accessed via the NLS connector in SAP Business Warehouse (BW).[9] Older data is thus moved to SAP IQ for analysis and reporting in the SAP BW environment, while in-memory analysis is performed on “hot” critical data using SAP HANA. SAP BusinessObjects BI Platform provides a transparent access to both current and historical data from SAP HANA and SAP IQ simultaneously.[9] According to SAP, this Powerful integrated solution delivers the performance and responsiveness business users demand while keeping IT storage and maintenance costs in check, resulting in the ideal balance of performance and cost.[9]

SAP HANA for operational reporting with SAP IQ for big data processing (NLS) In this scenario, SAP Enterprise Resource Planning (ERP) data goes into SAP HANA which acts as an operational data store for immediate analysis. Once the data is analyzed it is integrated into SAP IQ via Near-line storage mechanisms (as described above). Here SAP IQ acts as an enterprise data warehouse that receives data from a variety of traditional sources (such as OLTP Databases and files systems), and SAP HANA Operational Data Store(ODS)[10]

SAP IQ as an Enterprise Data Warehouse (EDW) with SAP HANA as Agile Data Mart When SAP IQ is used as an EDW, it can also be augmented with HANA’s in-memory technology. Common uses include planning and analysis reports where simultaneous OLTP processing is needed. In this case, data flows from SAP IQ to SAP HANA.[10] SAP BusinessObjects BI can be used to achieve visibility across both platforms.

Technology[edit]

To a user, SAP IQ looks just like any relational DBMS with a SQL-based language layer accessible via ODBC/JDBC drivers. However, inside, Sybase IQ is a column-oriented DBMS, which stores data tables as sections of columns of data rather than as rows of data like most transactional databases.

Column-Store Architecture

Column-orientation has a number of advantages.[11] If a search is being done for items matching a particular value in a column of data, only the storage objects corresponding to that data column within the table need to be accessed. A. A traditional row-based database would have to read the whole table, top to bottom. Another advantage is that when indexed correctly, a value that would have to be stored once in each row of data in a traditional database is stored only once, and in SAP IQ, an n-bitindex is used to access the data.[12] Nbit and tiered indexing is used to allow for increased compression and fast, incremental batch loads.

Additionally, the column-based storage enables SAP IQ to compress data efficiently on the fly.[9]

Indexing Technology

Prior to SAP IQ 16, each data page was structured as an array of cells of a fixed size, so all values have the same data type. While this storage approach is efficient for structured and fixed length data, this does not hold for the more unstructured and variable sized data that is seen today. To combat storage inefficiency and store variable sized data with minimal wasted space, each page is composed of cells of a variable size that are packed tightly together; the column store architecture supports a variable number of cells per page and various page formats within a column. SAP IQ also applies Lempel-Ziv-Welch ( LZW ) compression algorithms[13] to each data page when it is written to disk, to significantly reduce data volume.[14]

Bitmaps are used for secondary indexes.[13]

Massively Parallel Processing Framework SAP IQ has a massively parallel processing (MPP) framework based on a shared-everything environment that supports distributed query processing. Most other products capable of MPP tend to be based on shared-nothing environments. The benefit of shared-everything is that it's more flexible in terms of the variety of queries that can be optimized—especially for balancing the needs of many concurrent users. The downside is that in extreme cases, competition among processors to access a shared pool of storage (usually a storage-area network), can lead to I/O contention, which affects query performance.[12]

However, the aforementioned storage architecture of SAP IQ allows compute and storage layers to scale out independently of each other and also allows these resources to be provisioned on-demand for better utilization without restructuring the underlying database.

Multiplex Architecture SAP IQ uses a clustered grid architecture, which is made up of clusters of SAP IQ servers, or Multiplex. These clusters are used to scale performance for large numbers of concurrent queries or queries that are great in complexity. This is built upon a shared-everything architecture where all compute nodes interact with the same shared storage and queries have the ability to distribute across all compute nodes. The Multiplex has a coordinator node which manages the database catalog and coordinates transactional writes to the store. Other nodes can be reader only nodes, or readers and writes, like the coordinator node. The storage fabric can be implemented with numerous technologies that allow sharing amongst the multiplex nodes.

This architecture has multiple uses, including workload balancing and elastic virtual data marts. Workload balancing is achieved by the SAP IQ query engine through dynamically increasing/decreasing parallelism in response to changes in server activity. There is automatic failover if a node stops participating in a query, and other nodes will pick up work originally assigned to the failed node so the query can complete. On the client side, compatibility with external load balances ensures that queries are initiated on physical servers in a balanced fashion to eliminate bottlenecks. Physical nodes in the Multiplex can be grouped together into “logical servers” which allow workloads to be isolated from each other (for security or resource balancing purposes); machines can be added to these as demand changes. The aim of the grid architecture is to enable resiliency even during global transactions.

Loading Engine The SAP IQ Loading Engine can be used for incremental batch, low latency, concurrent loading, and bulk loading (with both client and server data files). The bulk loading process allows for multiple load process to occur simultaneously, if the loads are of different tables. Data can be loaded from other databases as well as files. Page-level snapshot versioning allows concurrent loads and queries, with locking occurring at the table level only. With SAP Replication Server, now enhanced to optimize loads into SAP IQ, transactions are compiled into the fewest set of operations, and then bulk micro-batch loads into SAP IQ are performed, which gives the appearance of real-time, continuous loads.

The bulk loader now performs all operations in parallel to make full use of all server cores, remove bottlenecks, and keep all threads productive, instead of serializing the process. The loading process remains a two-phase process, first reading raw data and creating FP indexes, and second creating secondary indexes, but everything is executed in parallel. High Group indexes, which the query optimizer relies on for information about which columns/rows contain which data values, are now structured as a set of tiers, increasing as you move down the pyramid.

Lastly, SAP IQ introduces a write-optimized, Row Level Versioned (RLV) Delta store which enables high-velocity data loads and fast availability of data to users. This store is minimally indexes and compressed, with row-level locking for concurrent write, and its own transaction log and is append only, and acts as a companion to the main store, with data being loaded at high speed to the RLV store, and migrating to the main store later, merging to it periodically. To the user it does not appear as though there are two separate entities at work and queries operate transparently across the two stores. To make use of this the users can specify particular “hot” database tables as RLV tables.

Framework and Client APIs SAP IQ offers query APIs based on pure ANSI SQL standards (with few restrictions), that include OLAP and full-text search support. Stored procedures are supported in both ANSI SQL and Transact-SQL dialects, and can be executed on a scheduled or immediate basis. As well there are database drivers for a variety of programming languages such as JAVA, C/C++m PHP, PERL, Python, Ruby, and ADO.Net.

Handling of Unstructured Data SAP IQ is an analytics engine that can query both structured and unstructured data and join the results together. SAP IQ introduced a new text index and an SQL “contains” clause to facilitate this by searching for terms within a blob of unstructured text; SAP Sybase’s partnerships with vendors allow for various binary forms of text files to be ingested into SAP IQ and text indexes created for them; these text indexes get the data ready for higher level text analysis applications to perform full-text searches within SAP IQ via SELECT statements. SELECT syntax can be used by applications performing tokenization, categorization and further text analysis.

In-Database Analytics / Extensibility Framework In-database analytics are built upon the fundamental concept of keeping analytics algorithms close to the data for higher performance. The extensibility framework, called “in-database analytics” enables embedding of analytic functions inside the database engine of SAP IQ, moving analytics into the database, instead of to a specialized environment out of the database, a process which is error prone and slower. Pre-built functions are available natively and via partners of SAP IQ providing specialized statistical and data mining libraries that plug into SAP IQ. This framework increases SAP IQ’s power to do advance processing and analysis as the data does not have to be moved out of the database into a specialized environment for analytics. All data and results obtained can be shared though the DBMS and can be easily acquired through an SQL interface. Through user defined functions (UDFS) partners can extend the DBMS with custom computations, by providing a specialized statistical and data mining libraries that plug right into SAP IQ to enhance its performance of advanced processing and analysis.

Security SAP IQ provides several features, both included in the base product and licensable separately, to help protect the security of the user’s data. A new feature introduced in IQ 16 is the Role Based Access Control (RBAC) which enables for a separation of duties and upholds the principle of least privilege, by allowing the breakdown of privileged operations into fine-grained sets that can be individually granted to users. Included as part of the base product are: users, groups and permissions, database administration authorities, user login policies, database encryption, transport-layer security, IPV6, role-based access control, and database auditing. Additional features are part of a licensable option called the advanced security option: FIPS encryption, Kerberos authentication, LDAP authentication, and Database column encryption.

Information Lifecycle Management (ILM) As part of ILM, SAP IQ allows users to create multiple user DBSpaces (logical units of storage/containers for database objects) for organizing data. This can be used to separate structured or unstructured data, group it together according to age and value, or to partition table data. DBSpaces can also be marked as read-only to enable one-time consistency checking and back-up. Another application of ILM is the ability to partition tables, and place moving portions along the storage fabric and backup capabilities; this enables a storage management process where data cycles through tiered storage, moving from faster more expensive storage to slower, cheaper storage as it ages, partitioning data according to value.

High availability and Disaster Recovery The multiplex set-up provides scalability and High Availability for compute nodes because a multiplex coordinator node can failover to an alternate coordinator node.

The SAP IQ Virtual Backup also allows users to quickly backup data, and along with storage replication technology, data is continuously copied so backups can occur quickly and “behind the scenes”. Once virtual backups are completed they can be verified through test and restore; enterprise data can be copied for development and testing. Then all that’s left is to complete the backup at a transactionally consistent point in time. SAP claims that disaster recovery is easier with a shared everything approach to MPP. The SAP Sybase PowerDesigner modelling tool enables users to build an ILM model that can be deployed with SAP IQ. Storage types, DBSpaces, and lifecycle phases can be defined in an ILM model and the tool can be used to generate reports and create partition creation and movement scripts.

Hadoop Integration SAP IQ provides federation with the Hadoop distributed file system (HDFS), a very popular framework for big data, so that enterprise users can continue to store data in Hadoop and utilize its benefits. Integration is achieved in four different ways, depending on the user’s needs, through client-side federation, ETL, data, and query federation. Client-side federation joins data from IQ and Hadoop at a client application level while ETL federation lets the user load Hadoop data into the column store schemas of IQ. HDSF data can also be joined with IQ data on the fly through SQL queries from IQ, and finally results of MapReduce jobs can be combined with IQ data, also on the fly.

SAP Control Center SAP Control Center replaces Sybase Central as a Web-Based graphical tool for administration and monitoring. SAP Control Center can be used for monitoring SAP Sybase servers and resources (node, multiplex) from any location, as well as monitoring performance, and spotting usage trends, The web application has a multi-tiered plug-in architecture which is made up of a server and product-based agents that bring SAP Sybase performance back to the Control Center Server.

Web-Enabled Analytics SAP IQ comes with web-enabled application drivers facilitating access to SAP IQ from a variety of Web 2.0 programming and execution environments (Python, Perl, PHP, .Net, Ruby). Through query federation with other databases, developers can build applications that interact with multiple data sources at the same time (as well as with database platforms from other vendors). Federation proxy tables that map to tables in external databases can be created; these are materialized as in-memory tables but that can be interacted with as though they resided within SAP IQ. This way data sources can be combined in a unified view.

Supported Platforms[edit]

SAP IQ also supports plugging in external algorithms written in C++ and Java. SQL queries can call these algorithms, allowing for the execution of in-database analytics, which provides better performance and scalability. Additionally, Sybase IQ also provides drivers for access via languages such as PHP, Perl, Python, and Ruby on Rails.

SAP IQ supports most major operating system platforms, including:

Customers[edit]

Sybase claims that Sybase IQ is currently installed in over 2000 customer sites. Notable customers include comScore Inc.,[15] CoreLogic, Investment Technology Group (ITG),[16] and the U.S. Internal Revenue Service (IRS).[17]

While Sybase IQ has been widely used for focused, data-mart-style deployments,[18] it has also been deployed as an enterprise data warehouse.

Competitors[edit]

Competitors SAP IQ competes against “RDBMS platforms used for enterprise data warehouses (Oracle, IBM, Microsoft) and specialty analytics platforms (Teradata, Oracle Exadata, IBM/Netezza).”

External links[edit]

References[edit]

  1. ^ C-Store: A column-oriented DBMS, Stonebraker et al., Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005
  2. ^ Cole, Barb (November 7, 1994), Sybase makes a data warehousing play via acquisition, Network World 
  3. ^ Moore, Trevor (2010), Sybase IQ Survival Guide, p. 16, ISBN 978-1-4466-5758-4 
  4. ^ a b c http://scn.sap.com/community/developer-center/analytic-server/blog/2013/04/01/sap-sybase-iq-16-for-xldb-analytics-now-available
  5. ^ http://scn.sap.com/thread/3430651
  6. ^ http://www.saphana.com/community/blogs/blog/2013/11/06/big-data-pb-of-in-bwonhana
  7. ^ http://scn.sap.com/community/developer-center/analytic-server/blog/2014/03/14/sap-iq-in-the-cloud-whats-your-opinion
  8. ^ http://www.sap.com/pc/tech/data-warehousing.html
  9. ^ a b c d http://www.sap.com/bin/sapcom/en_us/downloadasset.2013-09-sep-23-13.achieving-analytics-that-are-both-cost-effective-and-optimized-for-performance-pdf.html
  10. ^ a b http://scn.sap.com/community/bi-platform/businessobjects-bi-for-sap/blog/2013/11/27/how-does-all-this-work-together-bw-bw-on-hana-suite-on-hana-hana-live-part-8
  11. ^ MacNicol, Roger; French, Blaine (August 2004), Sybase IQ Multiplex – Designed For Analytics, Proceedings of the 31st VLDB Conference, Trondheim, Norway 
  12. ^ Sybase IQ#cite note-Moore-1
  13. ^ a b http://blasthemy.com/sap/TechEd13/1_Session_PDFs/RDP/RDP107/RDP107.pdf
  14. ^ http://www.doblerconsulting.com/wp-site/wp-content/uploads/SAPSybaseIQWhatsNewin16SeptemberWebinarfinal.pdf
  15. ^ Henschen, Doug (November 24, 2010), ComScore’s Big Data Deployment In Detail, Information Week 
  16. ^ Clark, Don (November 18, 2007), Start-Ups Mine Database Field — Nimble Software Helps Make Sense Of Information Tide, Wall Street Journal 
  17. ^ Lai, Eric (March 22, 2008), Been audited lately? Blame the IRS's massive, superfast data warehouse, ComputerWorld 
  18. ^ Henschen, Doug (July 12, 2011), Sybase IQ Gains Beefier Analysis Capabilities, Information Week