Clusterpoint

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Clusterpoint Ltd.
Private
Industry enterprise software
cloud computing
Founded August 21, 2006
Founder Gints Ernestsons
Jurgis Orups
Zigmars Rasscevskis
Oskars Viksna
Headquarters London, United Kingdom
Products Clusterpoint Database
Clusterpoint Database Cloud
Clusterpoint Server
Clusterpoint Database
Developer(s) Clusterpoint Ltd.
Initial release 2006
Stable release 3.0 / February 15, 2015 (2015-02-15)
Development status Active
Written in C, C++
Operating system Cross-platform
Available in English
Type distributed database
enterprise search
operational database
document-oriented
NoSQL, XML, JSON, SQL database
cloud DBAAS
License free[clarification needed]
Website www.clusterpoint.com

Clusterpoint is a privately held European technology company developing and supporting Clusterpoint database software platform. [1][2][3] Company was co-founded by software engineers with expertise in Big data computing. Founders were dissatisfied with complexity, scalability and performance limitations of relational database architecture. The first version of the product was released in 2006. Company is venture capital backed.[4][5][6]

Clusterpoint database is a document-oriented database server platform for storage and processing of XML and JSON data in a distributed fashion on large clusters of commodity hardware. Database architecture blends ACID-compliant OLTP transactions, full-text search and analytics in the same code, delivering high availability and security.[7][8]

Clusterpoint database enables to perform transactions in a distributed document database model in the same way as in a SQL database. Users can perform secure real-time updates, free text search, analytical SQL querying and reporting at high velocity in very large distributed databases containing XML and JSON document type data. Transactions are implemented without database consistency issues plaguing most of NoSQL databases and can safely run at high-performance speed previously available only with relational databases.[9] Real time Big data analytics, replication, loadsharing and high-availability are standard features of Clusterpoint database software platform.[10]

Clusterpoint database enables web-style free text search with natural language keywords and programmable relevance sorting of results. Constant and predictable search response time with latency in milliseconds and high quality of search results are achieved using policy-based inverted indexation and unique relevance ranking method. Database also supports essential SQL query, that can be combined with free text search in a single REST API.[11]

For most of its history Clusterpoint was servicing business customers as an enterprise software vendor. [12][13][14]

In January 2015 Clusterpoint changed the licensing policy to free software license version. From February 2015 Clusterpoint database is available as a cloud service (DBAAS). The latest Clusterpoint database production version is 3.0, released in February, 2015.[15]

Use cases[edit]

Clusterpoint database delivers real-time business information management in electronic document format. It can be used as a high-performance operational database for web and mobile database services requiring scalability, high-performance and strong security. Software enables to safely handle financial, billing, security, medical, travel, information services, e-commerce, government and municipal open data and other data stored in electronic document data format that uses industry standard XML and JSON markup.[16][17][18][19]

Generic database use cases can also be where flexible XML or JSON document data model commonly fits best: processing mix of variable data, including structured data, unstructured data (textual), semi-structured data and blobs such as images, voice, video files. Software can be used for computing tasks requiring low millisecond-range latency data processing services in distributed databases, for instance, to feed data at high speed to interactive NoSQL visualizations, Big data online analytics and safe reporting in large databases.[20][20]

Distinctive technology[edit]

High-speed ACID-compliant Transactions in Distributed Document Database[edit]

Clusterpoint database provides distributed, ACID-compliant transactions, including basic SQL support, in a document model database that is massively scalable for Big data volumes. Distributed transactions, data storage, search and analytics can be performed at high performance and high availability, while delivering strong database consistency and security. It gives Clusterpoint performance and scalability advantage over other NoSQL document databases, that are compromising on security and integrity of customer data, typically providing only limited eventual consistency at high availability.[21]

Programmable database ranking for search relevance in Big data[edit]

Another distinction is programmable ranking index, that can be flexibly customized through relevance rules assigned in the Document Policy configuration file. It is a small XML configuration file accompanying each Clusterpoint database. Database search behavior can be quickly changed through configuring of ranking index rules vs modifying software code. The increasing importance of ranking is directly derived from the explosion in the volume of data handled by current applications. The user would be overwhelmed by too many unranked results. Furthermore, the sheer amount of data makes it almost impossible to process queries in the traditional compute-then-sort approach.[22]

Customer application software code can be simplified by delegating most indexing and search sorting details, including ranking algorithms, to the Document policy configuration attributes in Clusterpoint database. Document policy, when customized for a particular web or mobile application need, determines the particular ranking index organization at the physical storage level by presorting the actual index data for custom relevance algorithms. Developers can avoid most of complex SQL programming for data sorting and grouping in their application software code, while database hardware can be liberated from the excessive Big data sorting per each database query. Instead the Clusterpoint database ranking index delivers fast search and relevance sorting functionality, without performance degradation characteristic to relational SQL databases.

Ranking index method, applied to document database model, enables Clusterpoint to outperform SQL databases at search by several orders of magnitude. It solves information overload and latency problem for interactive web and mobile applications processing Big data. Today limited-size mobile device screens and network bandwidth restrictions prevent users requesting and processing large size data volumes per each query. Database search and querying need to be interactive and transactional to satisfy Internet users. Clusterpoint ranking index was designed for this computing model. It extracts relevant data first and returns information page by page in decreasing relevance. For instance, using only free text search, latency in large databases containing billions of document will be milliseconds, while relevance ranking will prevent overwhelming end-user with too much low-quality search results. This is also a crucial design element for distributed document database architecture: it makes its index scalable so that it can be safely shared across large cluster of servers without ignificant performance loss at data injection, free text search and access.[23]

Additionally Clusterpoint ranking index can be fine-tuned by developers to match the natural language terms in queries to the most relevant textual data content in a customer database. When querying a distributed database with free text format keywords in natural language or with phrases, ranking index sorts out the best relevant documents where query is matching textual content parts in the database, taking into account natural language density, word statistics and language-specific grammatics attributes (incl. stemming, spelling, collation), performing automatic self merged joins. Very few database products support similar type of self-merge joins.[24]

Adjusting ranking rules, customers can configure various grouping, ordering and positioning algorithms for their search results through the ranking index so that it starts delivering the best end-user search experience. A set of ranking configuration rules, once established for a particular database, is then being applied and maintained automatically by Clusterpoint database when customer data is loaded or updated through Clusterpoint database CRUD API commands.

Developers can freely use full text search as the fastest information access method in Clusterpoint databases, while having capability to flexibly query the database structure with standard analytics using essential SQL. In Clusterpoint database both methods can be combined in a single query, enabling combined analytical and search queries in mixed structured and unstructured data content.[25]

Clusterpoint database deployments[edit]

Clusterpoint database is used in production deployments of enterprise customers operating their 24/7 web and mobile services from 2006. Vendor has built partnerships that provide solutions in different industry sectors, such as:

  • Governance, Risk Management and Regulatory Compliance[26]
  • Agile Web Software Development[27]
  • Online Business Intelligence in NoSQL and Big Data[28]
  • Cloud Computing Services[29]
  • Web Site Design[30]
  • Cybersecurity and Lawful Intercept[31]

A public demonstration solution powered by Clusterpoint database, illustrating how document type data of the entire Wikipedia and DBpedia (English) data corpus can be efficiently managed within a single consolidated database platform is available on the Web site Wikisearch.net.[32]

Competitors[edit]

Clusterpoint database technology is positioned by industry experts among other emerging NoSQL and Big data technologies having distributed data management architecture.[33]

Platform Components[edit]

The Clusterpoint database software source code is being developed in C and C++ programming languages and supports multi-threading, multi-core CPUs and distributed computing. Primary method of developer's access to the platform capabilities is REST API. Clusterpoint database software is being managed across the large cluster of commodity hardware with Clusterpoint Console application. Console provides centralized administration and control for all customer databases through a single web GUI. In order to access Clusterpoint Console, or download it along Clusterpoint database software for on-premise use, customers have to sign up for Clusterpoint Cloud Database Account on the vendor website. Sign-up is free, no credit card required.

Architecture[edit]

Clusterpoint database has multi-master shared-nothing, distributed, document-oriented database architecture storing XML and JSON data types. [34]

It works as transactional high-speed OLTP database for XML and JSON data objects. New content can be added, updated and deleted in real-time, with real-time all changed data indexing, including full text, date, numeric, geospatial data. Index data immediately can be read for search and analytics after each document has been inserted, updated or deleted, while ACID-compliant transactions provide security and consistency. Database API also supports storage and processing of binary data as part of document data object model.

It supports no-single-point-of failure fault-tolerant infrastructure hardware setup with multi-datacenter replication capability for the entire distributed database cluster.

Query syntax[edit]

To query a database customers can use either free text query, XML-based syntax, Essential SQL query or Clusterpoint REST API that supports JSON.

General features[edit]

  • Data is managed in open, cross-platform, industry standard XML or JSON format using open API, for instance, Python API[35][36] or JavaScript Node.js API[37]
  • Data structure agnostic and type-rich database, handles variable data structure XML or JSON documents in a single database. Supports unstructured textual data, dates, numbers, meta-data (all XML and JSON types)
  • Cross-platform support: binaries are available for Linux, FreeBSD, Mac OS X and Windows. Clusterpoint database software can be compiled on other operating systems.
  • Multi-master cluster software architecture: no single point of failure, any cluster node can serve as a master and run the management application
  • Horizontal database scalability: scales out from a single server to few thousands of servers networked into a cluster infrastructure

Access features[edit]

  • REST API is used for XML and JSON document format management, search and data manipulation.
  • Consistent UTF-8 encoding. Non-UTF-8 data can be saved, queried, and retrieved with a special binary data type.
  • XML and JSON objects for API queries and responses: enable direct integration in other programming languages supporting XML or JSON parsing, no specific client software required

Search/query features[edit]

  • Built-in rich full text search functionality, with fast and free use of keywords and phrases, result snippeting, highlighting, term proximity search and other full-text search options[38]
  • Querying with term stemming, term wildcards and character position patterns, for inflected words and plural word forms delivering automagical self merge-joins[39]
  • Essential SQL or SQL-like XML-structured (fielded) queries like in SQL SELECT ... WHERE ... statements
  • Cluster-wide analytics aggregation with MIN(), MAX(), COUNT(), AVG() like in SQL SELECT ... GROUP BY ..., ORDER BY ... statements
  • Sorting of results in alphabetic, numeric, date order or according to result relevance
  • Autocomplete (instant search as you type) using the actual index data
  • Spell-check of query terms with alternative spelling suggestions for "Did you mean that?" functionality
  • Boosting of search query terms at query time, in order to increase, decrease or overwrite through the API relevancy weights or sorting rules built into the ranking index
  • Dynamic data classification per query by multi-level customer defined facets with exact hit counting (examples: categories, themes, product catalogs, geographic locations etc.)
  • Text-analytics driven similar content search across the entire database
  • XML or JSON data structure relevance ranking by tag weighting and document relevance ranking by document rating
  • Textual relevance ranking for matching search query terms to context, taking into account frequency and density of natural language terms
  • Predictive calculation of expected number of results based on the actual index statistics in large size databases to optimize performance

Administration/production use features[edit]

  • Granular security partitioning: API users and their access rights are based on groups and permissions assigned per specific databases and API commands
  • Transaction journaling, redo logs, access logs, error logs and audit logs enabled by default
  • Document versioning enabled by default (preserving previous document versions for a certain time period)
  • Reindexing in background with automatic switchover provides availability during reindexation
  • Online, offline and incremental database backup
  • Automatic or manual synchronization of database replicas
  • Multiple administrator accounts for secure multi-tenancy of different customer databases on the same hardware
  • Centralized web GUI based database administration Console, including one-click configuration of clustered and replicated databases across all nodes

Automatic full database content indexing[edit]

Clusterpoint software automatically builds and maintains document-type XML and JSON data content index when data us loaded, updated or deleted. A single database index (ranking index) is maintained to support these types of querying:

  • natural language based full text search indexing, including language-specific stemming and collation rules
  • XML or JSON data structure queries (with full-text, exact match and binary match options) or Essential SQL queries for analytics
  • virtual data structure search created from aliasing multiple real tags values to speed up Boolean OR queries
  • ad hoc search across all database content irrespectively from the database structure
  • numeric and date range search
  • geospatial search by range, distance or polygon coordinates and ordering by distance from a certain point
  • multi-level faceted search with automatic results classification by XML / JSON tags assigned as containing facets
  • combination of any of the above database search criteria into complex nested multi-part query expressions using Boolean AND, OR, NOT logic

Database administration[edit]

Clusterpoint database can be controlled centrally through the Clusterpoint Console application. It is a web-GUI dashboard that enables to control all database services enterprise-wide, including cluster database administration, configuration of indexing and ranking policy, secure user account management, audit and log file view, database backup/restore, database sharding and replication.

Each customer database is being started and stopped as an isolated database server process for the controlled management of CPU resources, RAM memory and disk storage. All databases share a single networked computing and storage infrastructure.

Clusterpoint Console is used to manage underlying hardware (cluster nodes) to share computing resources among different databases in parallel.

Process and storage architecture[edit]

Clusterpoint database processes are safely isolated, each process runs only in its own RAM memory address space. It can access only its own local file system storage folder with the same name containing the particular database XML or JSON documents, index, configuration and log files stored on that local cluster node (shard). This architecture delivers elastic horizontal scale out ability and cluster-wide control over resource consumption for a particular customer database. It also prevents unauthorized access to multi-tenant databases using the same computing hardware infrastructure, with option to fully encrypt sensitive data.

Multi-tenancy and virtualization[edit]

Clusterpoint supports secure multi-tenant database services. Software platform takes care about safe partitioning of runtime database computing environment among all cluster CPUs nodes, all RAM processes and all storage resources within a larger cluster, while operating databases in parallel on the same hardware equipment. This method delivers the best utilization of modern multi-core CPU hardware arranged in large distributed clusters.

Use of native multi-tenancy is the preferred method for high-performance database computing with Clusterpoint software vs operating system level virtualization or software containerization for safe multi-tenancy. OS-level virtualization may decrease available network bandwidth and computing resource, creating also unexpected bottlenecks at storage I/O level, that could result into increased application latencies. Database virtualization can be best use for prototyping and development where operational performance guarantees and low latency are not the first priority.[40]

Clusterpoint Cloud Database as A Service (DBAAS) is a secure multi-tenant database platform, with isolated data for each customer account and encrypted access security. Clusterpoint software does not need virtualization for safe and efficient multi-tenancy.

Multi-copy database replication[edit]

Automatic multi-copy replication for the entire database is built into the Clusterpoint database software. It is active replication, with workload sharing within a cluster. Clusterpoint supports high-performance OLTP transactions, ACID-compliant, within a main cluster in a single data center, while providing fail-over to more datacenters running database replica clusters. Fail-over takes only few seconds, if communication latency among data centers is minor.

Database replicas in Clusterpoint architecture are used for automatic load balancing of database search queries through Clusterpoint API.

In multi-datacenter use network bandwidth among locations may become the critical issue for Clusterpoint architecture because of increased latencies for database updates and synchronization delays among replicas, in particular, if encrypted VPN networking over the Internet links is used.

A high-capacity bandwidth might be required for high-performance database replication among geographically different location datacenters.

Extendable server-side scripting with Lua[edit]

The Lua extends Clusterpoint Server functionality with custom server-side scripts. Lua scripts can implement customer-specific functions such as data aggregation, ETL tasks, meta-data markup, call-back to external programming languages using web services for extra functionality, real-time alerting or asynchronous triggers. Scripts can be executed before, during or after Clusterpoint API transactions of interest. Built-in configurable server-side hooks activate Lua scripts in different stages of each Clusterpoint transaction execution process.

Custom Lua scripts can be stored in Clusterpoint Server to work as "stored procedures".

Programming language support[edit]

Clusterpoint database uses REST principles and HTTP/HTTPS messaging for client-server communications between customer software applications and Clusterpoint database server. Any client programming language or development environment, supporting HTTP POST/GET messaging, can connect to Clusterpoint Server directly and read, write, update, delete and search XML and JSON documents.

In versions 1.x, 2.x and 3.0 REST API interface for JSON data format transforms customer data between JSON and XML, while only XML is used for internal server-side data storage and processing by Clusterpoint Server.

Clusterpoint Server has native client API Libraries using HTTP and faster TCP/IP transport protocol for the following popular programming environments:

Please check the vendor web site for API support in other languages.

Licensing and support[edit]

Since January 2015 Clusterpoint database has a free software license.

Vendor provides standard software maintenance and technical support service based on subscription model (on premise or Clusterpoint Database Cloud), delivering it over email, Skype or phone.[41]

Premium technical support for customers using the software in 24h/7d production environments includes remote problem diagnostics and resolution based on Service-level agreement. Vendor provides installation support, help-desk, training and partnership programs.[42][43][44]

3rd party tools and applications[edit]

  • GOL: Big Data SIEM Analytics tool from Clusterpark - Log, Events and Security Records Search and Analytics.[45]
  • DigiBrowser: Quick SQL denormalization into NoSQL database - imports multi-table SQL database into one Clusterpoint database using automagic denormalization.[46]
  • NTSS: Network Traffic Security System for Lawful Intercept - High-speed capture, store, search and analysis of all Internet traffic for the corporate network.[47][48]

See also[edit]

References[edit]

  1. ^ "Clusterpoint Group Limited". Companies House (UK). Retrieved March 5, 2015. 
  2. ^ "Clusterpoint Development Center". Lursoft (LV). Retrieved March 5, 2015. 
  3. ^ "Clusterpoint Profile on Firmas.lv". Firmas.lv (LV). Retrieved March 5, 2015. 
  4. ^ "Imprimatur Capital About Clusterpoint". Imprimatur Capital. Retrieved March 9, 2015. 
  5. ^ "Clusterpoint Raises EUR1 Million From BaltCap". Privateequitywire. Retrieved June 14, 2013. 
  6. ^ "Clusterpoint Receives €1 Million From BaltCap". Arcticstartup.com. Retrieved June 14, 2013. 
  7. ^ "List of NOSQL Databases". Nosql-database.org. Retrieved March 9, 2015. 
  8. ^ "The NoSQL movement: document databases". Dataversity. Retrieved June 14, 2013. 
  9. ^ "Big data startups / document stores". Bigdata-startups.com. Retrieved June 14, 2013. 
  10. ^ "Technology Behind Clusterpoint Database". Gints Ernestsons, Founder. Retrieved March 9, 2015. 
  11. ^ "Fulltext search engines". Mediawiki.org. Retrieved June 14, 2013. 
  12. ^ "Bloomberg Company Research Profile". Bloomberg.com. Retrieved March 9, 2015. 
  13. ^ "Crunchbase Clusterpoint Profile". Crunchbase.com. Retrieved June 14, 2013. 
  14. ^ "BusinessWeek Clusterpoint Profile". Businessweek. Retrieved June 14, 2013. 
  15. ^ "Clusterpoint Database Cloud Inside Out". Jurgis Orups, Clusterpoint CTO. Retrieved March 9, 2015. 
  16. ^ "Business Directory Use Case". Yellow Search Today. Retrieved March 4, 2015. 
  17. ^ "Clusterpoint Use Case In E-commerce". Exim.lv. Retrieved March 4, 2015. 
  18. ^ "Clusterpoint In E-Health Solutions". Aura Healthcare. Retrieved March 9, 2015. 
  19. ^ "Open Data and Public Services 2015". Garage48 Foundation. Retrieved March 9, 2015. 
  20. ^ a b "Clusterpoint and ZoomCharts". Zoomcharts.com. Retrieved March 4, 2015. 
  21. ^ "Developers Club NoSQL Meetup with Clusterpoint". Dev Club Riga. Retrieved March 4, 2015. 
  22. ^ "6th International Workshop on Ranking in Databases / VLDB 2012". Very Large Databases Conference 2012. Retrieved March 9, 2015. 
  23. ^ "Top NOSQL document databases". Big Data Analytics Today. Retrieved March 9, 2015. 
  24. ^ "How to make a Google App Engine application searchable using self merge joins". Google, Inc. Retrieved March 9, 2015. 
  25. ^ "Clusterpoint XML NoSQL Database Engine". Romans Malinovskis, CTO at Linkedfinance.com. Retrieved March 9, 2015. 
  26. ^ "Infogov Proteus iGRC (Internet Governance and Regulatory Compliance)". Infogov Ltd (United Kingdom). Retrieved March 9, 2015. 
  27. ^ "Agile Web Software Development". Agile.org. Retrieved March 9, 2015. 
  28. ^ "Turbocharge HTML5 web applications". Ambienttech. Retrieved March 9, 2015. 
  29. ^ "Elastec Enterprise Cloudworks". Elastec Technology Solutions (Pty) Ltd. Retrieved March 9, 2015. 
  30. ^ "Converting web sites to NoSQL". Rixtellab. Retrieved March 9, 2015. 
  31. ^ "Bit IT Solution for Network Traffic Control". Bit IT solutions. Retrieved March 9, 2015. 
  32. ^ "Wikisearch.net: Wikipedia and DBpedia Big Data Analytics (English)". Wikisearch.net. Retrieved March 9, 2015. 
  33. ^ "NoSQL Scaling Beyond Traditional SQL" (PDF). Intel Corp. Retrieved March 9, 2015. 
  34. ^ "HP Guide to NoSQL". Hewlett-Packard Corp. March 5, 2015. 
  35. ^ "Clusterpoint API on Github". Github.com. Retrieved March 9, 2015. 
  36. ^ "Python API for Clusterpoint Server". Python.org. Retrieved March 9, 2015. 
  37. ^ "Clusterpoint Node.js API". NPM, inc. Retrieved March 9, 2015. 
  38. ^ "Full Text Search Explained". Everything.Explained.At. Retrieved March 9, 2015. 
  39. ^ "Making you app searchable using self merge-joins". Google. Retrieved June 14, 2013. 
  40. ^ "The Do's and Don'ts of Virtualizing Database Servers". Network Computing. Retrieved March 9, 2015. 
  41. ^ "Clusterpoint DBaaS Cloud Service". Facebook. Retrieved March 9, 2015. 
  42. ^ "Clusterpoint DBMS by 1DataGroup". 1DataGroup. Retrieved March 9, 2015. 
  43. ^ "Knowledge Academy Training Course in Clusterpoint DBMS". Knowledge Academy. Retrieved March 9, 2015. 
  44. ^ "Big Data Meetup. Clusterpoint XML Database Engine". Meetup.com. Retrieved March 9, 2015. 
  45. ^ "GOL: Big Data SIEM Analytics tool". Clusterpark Ltd. Retrieved March 4, 2015. 
  46. ^ "DigiBrowser: Quick SQL denormalization into NoSQL database". Datorikas Instituts DIVI. Retrieved March 4, 2015. 
  47. ^ "Clusterpoint NTSS Product Review". SpiceWorks, Inc. Retrieved March 9, 2015. 
  48. ^ "Clusterpoint Network Traffic Surveillance System". iiGrowth LLC. Retrieved March 9, 2015.