Jump to content

Graph Query Language

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by A James Green (talk | contribs) at 20:59, 3 November 2019 (More refs on PGQL, include Oracle Big Data Spatial and Graph). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

ISO/IEC JTC 1 International Standard project

In September 2019 a proposal for a GQL standard project (39075 GQL)[1]) was approved by a vote of national standards bodies which are members of ISO/IEC Joint Technical Committee 1[REF JTC 1] (IOS/IEC JTC 1, responsible for international Information Technology standards). GQL is intended to be a declarative query language, like SQL.

The GQL project proposal states

Using graph as a fundamental representation for data modeling is an emerging approach in data management. In this approach, the data set is modeled as a graph, representing each data entity as a vertex (also called a node) of the graph and each relationship between two entities as an edge between corresponding vertices. The graph data model has been drawing attention for its unique advantages. Firstly, the graph model can be a natural fit for data sets that have hierarchical, complex, or even arbitrary structures. Such structures can be easily encoded into the graph model as edges. This can be more convenient than the relational model, which requires the normalization of the data set into a set of tables with fixed row types. Secondly, the graph model enables efficient execution of expensive queries or data analytic functions that need to observe multi-hop relationships among data entities, such as reachability queries, shortest or cheapest path queries, or centrality analysis. There are two graph models in current use: the Resource Description Framework (RDF) model and the Property Graph model. The RDF model has been standardized by W3C in a number of specifications. The Property Graph model, on the other hand, has a multitude of implementations in graph databases, graph algorithms, and graph processing facilities. However, a common, standardized query language for property graphs (like SQL for relational database systems) is missing. GQL is proposed to fill this void.[2].

Managed alongside SQL by JTC 1/SC32 Working Group 3 (WG3)

The GQL project has a four-year timespan. Seven national standards bodies (those of the United States, China, Korea, the Netherlands, the United Kingdom, Denmark and Sweden) have nominated national subject-matter experts to work on the project, which is conducted by Working Group 3 (Database Languages) of ISO/IEC JTC 1's Subcommittee 32 (Data Management and Interchange), usually abbreviated as ISO/IEC JTC 1/SC 32 WG3, or just "WG3" for short. WG3 (and direct predecessor committees within JTC 1) has been responsible for the SQL standard since 1987.[3]

A working proposal for the scope and features of GQL [REF NEO4J TIGERGRAPH] was put forward by expert contributors from Sweden, the United Kingdom and the United States in the query languages standards teams at Neo4j Inc. and TigerGraph Inc. in September 2019, following the start of the project. This proposal, along with supplementary material detailing transaction demarcation semantics and syntax, and an approach to transaction isolation that would allow implementers to include isolation levels additional to those in the SQL standard, were agreed at the meeting of WG3 that took place in the same month in Arusha, Tanzania.

Extending existing graph query languages

The GQL project draws on multiple sources or inputs, notably existing industrial languages and a new section of the SQL standard. In preparatory discussions within WG3 surveys of the history[4] and comparative content of some of these inputs[5] were presented. GQL will be a declarative language with its own distinct syntax, playing a similar role to SQL in the building of a database application. Other graph query languages have been defined which offer direct procedural features such as branching and looping[REF TINKERPOP, GREMLIN], [REF GQSL], and the ability to traverse a graph iteratively[REF TINKERPOP, GREMLIN], but GQL will not incorporate such features. [6][REF MARCELO]. However, GQL is envisaged as a specific case of a more general class of graph languages, which will share a graph type system and a calling interface for procedures that process graphs.

SQL/PGQ Property Graph Query

Prior work by WG3 and SC32 mirror bodies, particularly in INCITS DM32, has helped to define a new planned Part 16 of the SQL Standard, which allows a read-only graph query to be called inside a SQL SELECT statement, matching a graph pattern using syntax which is very close to Cypher, PGQL and G-CORE, and returning a table of data values as the result. SQL/PGQ also contains DDL to allow SQL tables to be mapped to a graph view schema object with nodes and edges associated to sets of labels and set of data properties.[7][8][9]

Cypher 9

Cypher[10] is a language originally designed and implemented by Neo4j Inc., but since 2015 made available as an open source language description[REF CYPHER 9], with grammar tooling, a JVM front-end that parses Cypher queries, and a Technology Compatibility Kit (TCK) of over 2000 test scenarios, using Cucumber for implementation language portability. The TCK reflects the language description and an enhancement for temporal datatypes and functions documented in a Cypher Improvement Proposal[11].

Cypher allows reading, creation and updating of graph elements. The current version (including the temporal extension) is referred to as Cypher 9. Prior to the GQL project it was planned to create a new version, Cypher 10 [REF HEADING BELOW], that would incorporate features like schema and composable graph queries and views. The first designs for Cypher 10, including graph construction and projection, were implemented in the Cypher for Apache Spark project starting in 2016.[12]These features are part of the proposed scope of GQL [REF TIGERGRAPH/NEO4j DOCUMENT].

Cypher is implemented in Neo4j's database, by Redis Graph, by Cambridge Semantics Anzograph, by Bitnine's Agens Graph, by Memgraph, and in open source projects Cypher for Gremlin[13] maintained by Neueda Labs in Riga, and Cypher for Apache Spark (now renamed to Morpheus)[12][14][15], as well as in research projects such as Cypher.PL and Ingraph. Cypher as a language is governed as the openCypher project [REF WEBSITE, REF GITHUB ORG]by an informal community which has held five face-to-face openCypher Implementers' Meetings since February 2017 [REF WEBSITE EVENTS]

PGQL

PGQL[16] is a language designed and implemented by Oracle Inc., but made available as an open source specification[17], along with JVM parsing software[18]. PGQL combines familiar SQL SELECT syntax including SQL expressions and result ordering and aggregation with a pattern matching language very similar to that of Cypher. It allows the specification of the graph to be queried, and includes a facility for macros to capture "pattern views". It does not support insertion or updating operations, having been designed primarily for an analytics environment, such as Oracle's PGX product. PGQL has also been implemented in Oracle Big Data Spatial and Graph, and in a research project, PGX.D/Async[19]. .

G-CORE

G-CORE [REF ARXIV]

GSQL

GSQL [REF TG 2.5 DOCS]

Cypher 10 extensions in Cypher for Apache Spark

The opencypher Morpheus project[12] implements Cypher for Apache Spark users.

References

  1. ^ "ISO/IEC WD 39075 Information Technology — Database Languages — GQL". ISO. Retrieved September 29, 2019.
  2. ^ "ISO/IEC JTC 1/SC 32 N 3007 - ISO/IEC NP 39075 Information Technology -- Database Languages -- GQL". British Standards Institute. Retrieved September 29, 2019.
  3. ^ "JTC 1/SC 32 Data Management and Interchange". ISO/IEC JTC1. Retrieved October 6, 2019.
  4. ^ Lindaaker, Tobias (May 2018). "An overview of the recent history of Graph Query Languages" (PDF). opencypher.org. Retrieved October 6, 2019.
  5. ^ Plantikow, Stefan (May 2018). "Summary Chart of Cypher, PGQL, and G-Core" (PDF). opencypher.org. Retrieved November 3, 2019.
  6. ^ Wood, Peter T. "Query languages for graph databases. , SIGMOD Rec. 41, 1 (April 2012), 50-60. DOI: 10.1145/2206869.2206879". ACM. Retrieved October 25, 2019.
  7. ^ "ISO/IEC WD 9075-16 Information technology — Database languages SQL — Part 16: SQL Property Graph Queries (SQL/PGQ)". ISO. Retrieved October 6, 2019.
  8. ^ Hare, Keith; et al. (March 2019). "SQL and GQL, W3C Workshop on Web Standardization for Graph Data. Creating Bridges: RDF, Property Graph and SQL" (PDF). W3C. Retrieved October 6, 2019. {{cite web}}: Cite has empty unknown parameter: |1= (help)
  9. ^ Trigonakis, Vasileios (July 2019). "Property graph extensions for the SQL standard. LDBC 12th TUC" (PDF). LBDC. Retrieved October 6, 2019.
  10. ^ Francis, Nadime; et al. "Cypher: An Evolving Query Language for Property Graphs. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 1433-1445. DOI: 10.1145/3183713.3190657". ACM. Retrieved October 25, 2019.
  11. ^ "CIP2015-08-06 - Date and Time". opencypher.org. Retrieved October 25, 2019.
  12. ^ a b c Rydberg, Mats; et al. (July 2016). "Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.". openCypher. Retrieved November 3, 2019. {{cite web}}: Cite has empty unknown parameter: |1= (help)
  13. ^ Novikov, Dmitry; et al. (January 2018). "Cypher for Gremlin adds Cypher support to any Gremlin graph database.". openCypher. Retrieved November 3, 2019. {{cite web}}: Cite has empty unknown parameter: |1= (help)
  14. ^ Green, Alastair; Junghanns, Martin (April 2019). "Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark". Databricks Inc. Retrieved November 3, 2019. {{cite web}}: Cite has empty unknown parameter: |1= (help)
  15. ^ "Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark (continues)".
  16. ^ van Rest, Oskar; et al. (June 2016). "PGQL: a property graph query language. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems (GRADES '16). ACM, New York, NY, USA, Article 7, 6 pages. DOI: 10.1145/2960414.2960421". pgql.org. Retrieved October 25, 2019.
  17. ^ "PGQL". pgql.org. Retrieved October 6, 2019.
  18. ^ van Rest, Oskar; et al. (September 2015). "PGQL is an SQL-based query language for the Property Graph data model". pgql.org. Retrieved November 3, 2019.
  19. ^ Roth, Nicholas P.; et al. "2017. PGX.D/Async: A Scalable Distributed Graph Pattern Matching Engine. In Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems (GRADES'17). ACM, New York, NY, USA, Article 7, 6 pages. DOI: 10.1145/3078447.3078454". ACM. Retrieved October 29, 2019.