Greenplum

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Greenplum
Type Division of Pivotal Software
Industry Big Data technologies
Founded 2003
Headquarters San Mateo, California, United States
Products Unfied Analytics Platform (UAP), Database Software, Chorus Software, Enterprise-Ready Hadoop, Data Computing Appliance (DCA), Analytics Labs

Greenplum was a big data analytics company headquartered in San Mateo, California.[1][2] Greenplum's products include its Unified Analytics Platform, Data Computing Appliance, Analytics Lab, Database, HD and Chorus. Greenplum was acquired by EMC Corporation in July 2010,[3] and then became part of Pivotal Software in 2012.[4]

Company[edit]

Greenplum was founded in September 2003 by Scott Yara and Luke Lonergan.[5] It was a merger of two smaller companies Metapa in Los Angeles and Didera in Fairfax, Virginia.[6] Investors included SoundView Ventures, Hudson Ventures and Royal Wulff Ventures. A total of $20 million in funding was announced at the merger.[7] Greenplum, based in in San Mateo, California, released its database management system software in April 2005 calling it Bizgres.[8] In July 2006 a partnership with Sun Microsystems was announced.[9] Greenplum was acquired by EMC Corporation in July 2010,[3] becoming the foundation of EMC's Big Data Division. Its computer appliance was announced in October 2010. In 2011 Greenplum announced more products and services. In May 2012 Greenplum released its Analytics Workbench, and in October 2012 Chorus.[9] In December 2012 it became part of a joint venture of VMware and parent company EMC Corporation which took the name Pivotal Software in March 2013.[10][4]

Technology[edit]

The Greenplum Database builds on the foundations of open source database PostgreSQL.[11] It primarily functions as a data warehouse and utilizes a shared-nothing, massively parallel (MPP) architecture. In this architecture, data is partitioned across multiple segment servers, and each segment owns and manages a distinct portion of the overall data; there is no disk-level sharing nor data contention among segments.

Greenplum Database's parallel query optimizer converts each query into a physical execution plan.[12] Greenplum's optimizer uses a cost-based algorithm to evaluate potential execution plans, takes a global view of execution across the computer cluster, and factors in the cost of moving data between nodes. [12] The resulting query plans contain traditional relational database operations as well as parallel "motion" operations that describe when and how data should be transferred between nodes during query execution.[13] Commodity Gigabit Ethernet and 10-gigabit Ethernet technology is used for the transfer between nodes. During execution of each node in the plan, multiple relational operations are processed by pipelining: the ability to begin a task before its predecessor task has completed, to increase effective parallelism. For example, while a table scan is taking place, rows selected can be pipelined into a join process.[14]

Internally, the Greenplum system utilizes log shipping and segment-level replication and provides automated failover. At the storage level, RAID techniques can mask disk failures. At the system level, Greenplum replicates segment and master data to other nodes to ensure that the loss of a machine will not impact the overall database availability.[15] In 2009 technology was announced to use parallel streams of data for extract, transform and load operations. This technology is exposed to customers via a programmable "external table" interface and a traditional command-line loading interface.[16]

In addition to traditional Structured Query Language (SQL), in 2008 support was announced for MapReduce queries within a parallel dataflow engine, to run analytics against datasets stored in and outside of the Greenplum Database.[17][18]

For each table (or partition of a table), database administrators can select the storage, execution and compression settings that suit the way that table will be accessed. Greenplum DB transparently abstracts the details of any table or partition, allowing a variety of underlying models: traditional row-oriented tables, optimized for read-mostly scans and bulk append loads, or column-oriented.[19] Database administrators also can tune the storage types and compression settings of different partitions within the same table.[20]

Greenplum HD is a supported version of Apache Hadoop. It includes Hadoop's Distributed File System (HDFS), Hive, Pig, HBase, and ZooKeeper.[21] Greenplum Chorus is a social network portal for data science teams.[22]

The Greenplum Data Computing Appliance (DCA) is a physical computer appliance to integrate structured data, unstructured data, and partner applications such as business intelligence.[23] A special version of DCA integrated with SAS software was released in April 2011.[24]

Greenplum Command Center software displays interactive dashboards to collect performance metrics and manage system health for Greenplum products. Monitored data is also stored for historical reporting.[25] Greenplum Analytics Lab was a data science consultation service, renamed Pivotal Data Labs in 2013.[26]

Greenplum Database was supported for production use on SUSE Linux Enterprise Server 10.2 (64-bit), Red Hat Enterprise Linux 5.x (64-bit), CentOS Linux 5.x (64-bit) and Sun Solaris 10U5+ (64-bit). Greenplum Database was supported on server hardware from a range of vendors including HP, Dell, Sun and IBM.[13] Greenplum Database was supported for non-production (development and evaluation) use on Mac OS X 10.5, Red Hat Enterprise Linux 5.2 or higher (32-bit) and CentOS Linux 5.2 or higher (32-bit).[27]

Greenplum had customers in vertical markets from financial services, telecommunications, Internet, retail, transportation and pharmaceuticals industries.[28] They included Silver Spring Networks, Zions Bancorporation, Reliance Communications, NYSE Euronext, Orbitz, Havas Digital, China Unicom, and Tagged.[29]

Greenplum provides a community edition of its database, and community forums.[30]

Greenplum DB has a limitation on indexing: Unique index and primary key index cannot be used at same time on a table.[31]

Partnerships included Impetus Technologies,[32] Cisco, Brocade Communications Systems, SAS (software), Factual, MicroStrategy, and Informatica.[33]

Competitors include Oracle Exadata, Teradata, Microsoft SQL Server Parallel Data Warehouse, Aster Data Systems, IBM Netezza, SAP, and Vertica.[34]

See also[edit]

References[edit]

  1. ^ "Paul Maritz To Lead New Group At EMC That Merges Greenplum With VMware’s Cloud Foundry, SpringSource, And Gemstome". Tech Crunch. Retrieved 4 October 2013. 
  2. ^ "EMC to Hadoop competition: “See ya, wouldn’t wanna be ya.”". Gigaom. Retrieved 4 October 2013. 
  3. ^ a b "EMC to Acquire Greenplum". Press release. July 6, 2010. Retrieved 2012-07-05. 
  4. ^ a b Cromwell Schubarth (December 4, 2012). "Paul Maritz to run EMC's new Greenplum, Pivotal Labs mashup". Silicon Valley Business Journal. Retrieved November 18, 2013. 
  5. ^ "Management Team". Old company web site. Archived from the original on August 13, 2008. Retrieved November 18, 2013. 
  6. ^ Maureen O'Gara (September 26, 2003). "Metapa Buys Didera". Linux Business News. Retrieved November 18, 2013. 
  7. ^ "Metapa Acquires Didera and Closes Additional Funding; Industry Pioneers in High-Performance Computing Combine to Create Breakthrough Linux Database Clustering Solution for Decision Support". Press release. September 23, 2003. Retrieved November 18, 2013. 
  8. ^ "Greenplum Unveils the Bizgres Project". Press release. April 18, 2005. Archived from the original on November 3, 2005. Retrieved November 18, 2013. 
  9. ^ a b "About Greenplum: History". Company website. Archived from the original on October 25, 2012. Retrieved November 18, 2013. 
  10. ^ Barb Darrow (March 13, 2013). "The Pivotal Initiative, in case you were wondering, is now official". GigaOm VMware blog. Retrieved November 18, 2013. 
  11. ^ Gonsalves, Antone (February 22, 2008). "Greenplum Updates Open-Source Based Database". 
  12. ^ a b "Understanding Greenplum". Retrieved August 7, 2012. 
  13. ^ a b "Greenplum Database Release 4.2.1.0 Administrator's Guide". February 17, 2012. Retrieved November 18, 2013. 
  14. ^ "gNet Software Interconnect". Company web page. Archived from the original on April 15, 2012. Retrieved November 18, 2013. 
  15. ^ "Multi-Level Fault Tolerance". Retrieved August 7, 2012. 
  16. ^ Dana Gardner (March 18, 2009). "Greenplum aims to eliminate massive data load 'choke points' with Scatter/Gather technology". Briefings Direct blog. ZDNet. Retrieved November 18, 2013. 
  17. ^ "Greenplum Brings MapReduce to the Enterprise". Press release. August 25, 2008. Retrieved November 18, 2013. 
  18. ^ Dana Gardner (September 29, 2008). "Greenplum pushes envelope with MapReduce and parallelism enhancements to its extreme-scale data offering". Briefings Direct blog. Retrieved November 18, 2013. 
  19. ^ "Greenplum is going hybrid columnar as well". DBMS2. October 14, 2009. Retrieved November 18, 2013. 
  20. ^ "Greenplum Adds Column-Oriented Table Feature to Greenplum Database". Press release. October 14, 2009. Retrieved November 18, 2013. 
  21. ^ "EMC Marries Isilon with Greenplum Hadoop Distribution". Retrieved August 7, 2012. 
  22. ^ "EMC Marries Social Networking And Big Data". Retrieved August 7, 2012. 
  23. ^ Timothy Prickett Morgan (September 21, 2011). "Greenplum appliances swing both ways: Spinning up data warehouses and Hadoop". The Register. Retrieved November 18, 2013. 
  24. ^ Larry Dignan (April 5, 2011). "EMC Greenplum inks SAS partnership, launches new appliances". Between the lines blog. Retrieved November 18, 2013. 
  25. ^ Timothy Prickett Morgan (March 1, 2012). "EMC cranks Greenplum database to 4.2: Goosing Hadoop links and warehouse backups". The Register. Retrieved November 18, 2013. 
  26. ^ "Pivotal Data Labs: Becoming a Predictive Enterprise". Company web site. Retrieved November 18, 2013. 
  27. ^ "Greenplum Database: Community Edition". Retrieved August 7, 2012. 
  28. ^ "ebay's two enormous data warehouses". Retrieved August 7, 2012. 
  29. ^ "Our Customers". Retrieved August 7, 2012. 
  30. ^ "Greenplum Communities". 
  31. ^ "Greenplum db - may not be as ready as you think". 
  32. ^ "Technology Partners". 
  33. ^ "Greenplum Partners". Retrieved October 29, 2010. 
  34. ^ "Magic Quadrant for Data Warehouse Database Management Systems". February 6, 2012. Archived from the original on February 25, 2012. Retrieved November 18, 2013.