Clustrix

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Clustrix Inc
Private
Industry Computer database
Founded December 15, 2006 (2006-12-15) in San Francisco, California, U.S.
Founder Paul Mikesell, Sergei Tsarev, Eric Hoffman
Headquarters San Francisco, CA, United States
Products Clustrix Database Server
Number of employees
40–50
Website clustrix.com

Clustrix, Inc. is a San Francisco-based private company founded in 2006 that develops a database management system marketed as NewSQL.[1][2]

History[edit]

Clustrix was founded in November 2006, sometimes called Sprout-Clustrix as it formed with the help of Y Combinator.[3] Founders include Paul Mikesell (formerly of EMC Isilon) and Sergei Tsarev. Some of its technology tested at customers since 2008.[4]

Initially called Sierra, at its official announcement in 2010, the product was called the Clustered Database System (CDS).[5][6] The company received $10 million in funding from Sequoia Capital, U.S. Venture Partners (USVP), and ATA Ventures in December 2010.[7] Robin Purohit became chief executive in October 2011, and another round of $6.75 million was raised in July 2012.[8][9] Another round of funding from the original backers of $16.5 million was announced in May 2013,[10] and a round of $10 million in new funding in August 2013 was led by HighBAR Ventures.[11] Purohit was replaced by Mike Azevedo in 2014.[12] A round of over $23 million in debt financing was disclosed in February 2016..[13]. On September 20, 2018 it was announced that Clustrix was acquired by MariaDB.[14]

Technology[edit]

ClustrixDB uses automatic data distribution, a sophisticated query planner, and a distributed execution model to provide scalability and concurrency in an ACID compliant RDBMS. To accomplish this, ClustrixDB utilizes many of the same techniques used by other Massively Parallel Processing (MPP) databases: It uses Paxos for distributed transaction resolution, and Multi-Version Concurrency Control (MVCC) to prevent transaction conflicts. With the aid of the major components outlined above, ClustrixDB provides this distributed execution with a simple SQL interface while also providing scalability, efficiency, and fault tolerance. [15]

Key Features of ClustrixDB[16]

  • Scalable
  • High-Volume, High Concurrency OLTP
  • Automatic Data Distribution
  • Distributed Query Execution
  • Fault-Tolerant
  • Flexible Deployment Options
  • MySQL Compatible
  • Easy to Migrate from MySQL

ClustrixDB is a full-featured RDBMS that combines a sophisticated distributed architecture with a simple SQL interface. It was specifically built for online transaction processing (OLTP) as well as for MySQL compatibility.

To better enable you to understand this document, some definitions are in order. Within ClustrixDB, a node refers to an individual networked server while a cluster consists of three or more nodes that are configured to work together.

Some of the key features of ClustrixDB are as follows:

Scalable[edit]

ClustrixDB uses shared-nothing architecture; the only architecture that is known to scale linearly as nodes are added. In a shared-nothing architecture, each node owns a portion of the data. Reads and writes are distributed to multiple nodes to reduce contention. Additionally, ClustrixDB automatically distributes both data and query execution to scale.

ClustrixDB further allows you to easily add (“Flex Up”) and reduce (“Flex Down”) capacity of your database to meet the changing and seasonal requirements of your application. To expand your cluster’s capacity, use the Flex Up feature to add nodes to your cluster. Conversely, use the Flex Down feature of ClustrixDB to scale back your configuration. Both operations automatically redistribute data within the cluster in the background while the database remains online and available.

High-Volume, High Concurrency OLTP[edit]

ClustrixDB was built specifically for large-scale, online transaction processing (OLTP). It ensures that transactions always maintain ACID compliance, even in a distributed environment. ClustrixDB is a fully relational database built for high throughput workloads and can scale reads and writes by simply adding nodes. By distributing data and workload across different nodes and cores in a cluster, ClustrixDB can achieve levels of parallelism that far exceed what is possible on a single instance database.

Automatic Data Distribution[edit]

A key component of ClustrixDB is the Rebalancer which runs continuously in the background to automatically manage the distribution of data for the cluster. It ensures that multiple copies (replicas) of your data are maintained across the cluster. If the cluster encounters an unexpected node failure and there are not sufficient replicas available, the Rebalancer automatically works to create more. Both data and load are distributed evenly across the cluster, even as data is being added or removed.

Distributed Query Execution[edit]

Each node of a ClustrixDB cluster is configured with the same version of the database engine, a map of all the system's data, and its own query compiler. Each node is capable of performing both reads and writes.

Queries are evaluated by ClustrixDB's powerful Query Optimizer, Sierra, to determine the optimal execution plan. The database then dissects each query into individual segments that are distributed to the nodes containing the relevant data. In essence, ClustrixDB delivers the query to the related data and then amalgamates the results. All of this sophisticated distributed query planning and execution is leveraged by a simple SQL interface.

Fault-Tolerant[edit]

ClustrixDB was designed for fault tolerance and by default, can sustain a node or zone failure with no data loss. When a node or zone becomes unavailable, it is removed from service and the cluster is able to leverage copies of data on other nodes. The ClustrixDB Rebalancer will automatically create additional replicas of that data, a process that happens transparently in the background with no user intervention and, as such, makes ClustrixDB auto-healing. To learn more, see the article regarding ClustrixDB's Consistency, Fault Tolerance, and Availability. To learn more about deploying a cluster across multiple zones, see Zones.

Flexible Deployment Options[edit]

ClustrixDB can be deployed wherever CentOS 7.4+ is available -- in clouds (AWS, Rackspace, Azure) or on commodity hardware of your choice. For more on recommended hardware platforms, see ClustrixDB Reference Server Configurations.

MySQL Compatible[edit]

ClustrixDB utilizes MySQL syntax and constructs for SQL, DML, DDL, triggers, and stored procedures. For current MySQL users, that means that your present environment is likely already compatible with ClustrixDB. Although similar to MySQL, ClustrixDB was actually built from the ground-up. To see specific differences between ClustrixDB and MySQL syntax, see this articles about General Differences from MySQL.

ClustrixDB also supports the MySQL Replication protocol, including SBR and RBR. For more information, see Configuring Replication.

Users are provided with a simple SQL interface for accessing data within ClustrixDB. An application will see ClustrixDB as a single instance, regardless of how many nodes are in the cluster or where the data is located. ClustrixDB does not shard data and no modifications to the application are required to access the database. Each node within ClustrixDB is able to service any reads or writes.

Easy to Migrate from MySQL[edit]

Because ClustrixDB speaks MySQL, migration utilizes both existing MySQL tools and proprietary ClustrixDB products.

References[edit]

  1. ^ What we talk about when we talk about NewSQL
  2. ^ The NewSQL Movement
  3. ^ "Form D: Notice of Sale of Securities". United States Securities and Exchange Commission. July 5, 2007. Retrieved September 5, 2016. 
  4. ^ "The Clustrix story". DBMS2 Blog. May 12, 2010. Retrieved September 5, 2016. 
  5. ^ Camille Riketts (May 3, 2010). "Y Combinator's Clustrix rolls out databases that scale". Venture Beat. Retrieved September 5, 2016. 
  6. ^ Stacey Higginbotham (May 3, 2010). "Clustrix Builds the Webscale Holy Grail: A Database That Scales". Gigaom. Retrieved September 5, 2016. 
  7. ^ Barb Darrow (August 19, 2013). "Clustrix bags $10M more in funding to keep scaling out its SQL database". Gigaom. Retrieved September 5, 2016. 
  8. ^ Robin Wauters (October 18, 2011). "Clustrix Lands Former Hewlett-Packard VP Robin Purohit As Its New CEO". Tech Crunch. Retrieved September 5, 2016. 
  9. ^ Ryan Lawler (July 5, 2012). "Big Data Startup Clustrix Raises $6.75 Million From Sequoia And Others To Build Scalable Databases". Tech Crunch. Retrieved September 5, 2016. 
  10. ^ Barb Darrow (May 6, 2013). "Clustrix nets $16.5M to push its database outside the box". Gigaom. Retrieved September 5, 2016. 
  11. ^ Barb Darrow (August 19, 2013). "Clustrix bags $10M more in funding to keep scaling out its SQL database". Gigaom. Retrieved September 5, 2016. 
  12. ^ "Clustrix Names New CEO Mike Azevedo and Executive Chairman Bruce Armstrong". Wall Street Journal. September 9, 2014. Retrieved September 5, 2016. 
  13. ^ "Form D: Notice Exempt Offering of Securities". United States Securities and Exchange Commission. February 12, 2016. Retrieved September 5, 2016. 
  14. ^ "MariaDB Acquires Clustrix Adding Distributed Database Technology". February 20, 2018. Retrieved September 20, 2018. 
  15. ^ "ClustrixDB - High Level Architectural Overview - Clustrix Documentation - Confluence". docs.clustrix.com. Retrieved 2018-07-16. 
  16. ^ "Key Features of ClustrixDB - Clustrix Documentation - Confluence". docs.clustrix.com. Retrieved 2018-07-16. 

External links[edit]