Graph database

From Wikipedia, the free encyclopedia
Jump to: navigation, search

A graph database uses graph structures with nodes, edges, and properties to represent and store data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent element and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Contents

[edit] Structure

Graph databases are based on graph theory. Graph databases employ nodes, properties, and edges. Nodes are very similar in nature to the objects that object-oriented programmers will be familiar with.

GraphDatabase PropertyGraph.png

Nodes represent entities such as people, businesses, accounts, or any other item you might want to keep track of.

Properties are pertinent information that relate to nodes. For instance, if "Wikipedia" were one of the nodes, one might have it tied to properties such as "website", "reference material", or "word that starts with the letter 'w'", depending on which aspects of "Wikipedia" are pertinent to the particular database.

Edges are the lines that connect nodes to nodes or nodes to properties and they represent the relationship between the two. Most of the important information is really stored in the edges. Meaningful patterns emerge when one examines the connections and interconnections of nodes, properties, and edges.

[edit] Properties

Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements.

Graph databases are a powerful tool for graph-like queries, for example computing the shortest path between two nodes in the graph. Other graph-like queries can be performed over a graph database in a natural way (for example graph's diameter computations or community detection).

[edit] Graph database projects

The following is a list of several well-known graph database projects:[1]

  • AllegroGraph - a scalable, high-performance RDF and graph database.
  • Bigdata - a highly scalable RDF/graph database capable of 10B+ edges on a single node or clustered deployment for very high throughput.
  • CloudGraph - a disk- and memory-based, fully transactional .NET graph database that uses graphs and key/value pairs to store data.
  • Cytoscape - open-source platform, outgrowth of bioinformatics
  • DEX[2] - A high-performance graph database from Sparsity Technologies, a technology transition company from DAMA-UPC
  • Filament - graph persistence framework and associated toolkits based on a navigational query style.
  • GraphBase - a customizable, distributed, small-footprint, high-performance graph store with a rich tool set from FactNexus
  • Graphd, the proprietary backend of Freebase
  • Horton - a graph database from Microsoft Research Extreme Computing Group (XCG) based on the cloud programming infrastructure Orleans
  • HyperGraphDB - an open-source (LGPL) graph database supporting generalized hypergraphs where edges can point to other edges
  • InfiniteGraph - a highly scalable, distributed and cloud-enabled commercial product with flexible licensing for startups.
  • InfoGrid - an open-source / commercial (AGPLv3, free for small entities)[3] graph database with web front end and configurable storage engines (MySQL, PostgreSQL, Files, Hadoop)
  • Neo4j - an open-source / commercial (GPLv3 community edition, AGPLv3 advanced and enterprise edition)[4] graph database
  • OrientDB - a high-performance open source document-graph database
  • OQGRAPH - Graph computation engine (GPLv2 licensed) for MySQL, MariaDB and Drizzle
  • sones GraphDB - an open-source / commercial (AGPLv3)[5] graph database and universal access layer (funded by Deutsche Telekom AG)
  • VertexDB - high performance graph database server that supports automatic garbage collection.
  • Virtuoso Universal Server - a clustered high performance and scalable RDF graph database server
  • R2DF - R2DF framework for ranked path queries over weighted RDF graphs

[edit] Distributed Graph Processing (mostly in-memory-only)

  • Angrapa - graph package in Hama, a bulk synchronous parallel (BSP) platform
  • FlockDB - an open source distributed, fault-tolerant graph database based on MySQL and the Gizzard framework for managing Twitter-like graph data (single-hop relationships) at webscale FlockDB on GitHub.
  • Apache Hama - a Graph processing framework that runs on top of Apache Hadoop.
  • Giraph - a Graph processing infrastructure that runs on Hadoop (see Pregel).
  • GoldenOrb - Pregel implementation built on top of Apache Hadoop
  • Phoebus - Pregel implementation written in Erlang
  • Pregel - Google's internal graph processing platform, released details in ACM paper.
  • Trinity - Distributed in-memory graph engine under development at Microsoft Research Labs.

[edit] APIs and Graph Query/Programming Languages

  • Blueprints - a Java API for Property Graphs from TinkerPop and supported by a few graph database vendors.
  • Blueprints.NET - a C#/.NET API for generic Property Graphs.
  • Cypher - a Property Graph Query Language developed by Neo4j.
  • Gremlin - an open-source graph programming language that works over various graph database systems.
  • Pacer - is a Ruby dialect/implementation of the Gremlin graph traversal language.
  • Pipes - a lazy dataflow framework written in Java that forms the foundation for various property graph traversal languages.
  • Styx - (previously named Pipes.Net) a data flow framework for C#/.NET for processing generic graphs and Property Graphs.
  • PYBlueprints - a Python API for Property Graphs.
  • Rexster - a HTTP/REST API for accessing remote graph databases and supported by a few graph database vendors.

[edit] See also

[edit] References

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages