Jaql

Jaql
Paradigm	Functional
Designed by	Vuk Ercegovac (Google)
First appeared	October 9, 2008; 16 years ago
Stable release	0.5.1 / July 12, 2010; 14 years ago
Implementation language	Java
OS	Cross-platform
License	Apache License 2.0
Website	code.google.com/p/jaql/m
Major implementations
	IBM BigInsights

Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.

It started as an open source project at Google^[1] but the latest release was on 2010-07-12. IBM^[2] took it over as primary data processing language for their Hadoop software package BigInsights.

Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.

A comparison^[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.

Jaql supports^[4] lazy evaluation, so expressions are only materialized when needed.

Syntax

The basic concept of Jaql is

source -> operator(parameter) -> sink ;

where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:

source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;

Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter Scalding:

source -> operator1(parameter)
-> operator2(parameter)
-> operator2(parameter)
-> operator3(parameter)
-> operator4(parameter)
-> sink ;

Core operators

Source:^[5]

Expand

Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [ [ T ] ] and produces an output array [ T ], by promoting the elements of each nested array to the top-level output array.

Filter

Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause. Example:

data = [
  {name: "Jon Doe", income: 20000, manager: false},
  {name: "Vince Wayne", income: 32500, manager: false},
  {name: "Jane Dean", income: 72000, manager: true},
  {name: "Alex Smith", income: 25000, manager: false}
];

data -> filter $.manager;

[
  {
    "income": 72000,
    "manager": true,
    "name": "Jane Dean"
  }
]

data -> filter $.income < 30000;

[
  {
    "income": 20000,
    "manager": false,
    "name": "Jon Doe"
  },
  {
    "income": 25000,
    "manager": false,
    "name": "Alex Smith"
  }
]

Group

Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.

Join

Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.

Sort

Use the SORT operator to sort an input by one or more fields.

Top

The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first k elements.

Transform

Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.

References

^ Original Jaql project
^ Initial Publication
^ Stewart, Robert J.; Trinder, Phil W.; Loidl, Hans-Wolfgang (2011). "Comparing High Level MapReduce Query Languages". Advanced Parallel Processing Technologies. Lecture Notes in Computer Science. Vol. 6965. pp. 58–72. doi:10.1007/978-3-642-24151-2_5. ISBN 978-3-642-24150-5.
^ JAQL in Hadoop, a brief introduction
^ IBM BigInsights Documentation

External links

[1] Original Jaql project

[2] Initial Publication

[3] Stewart, Robert J.; Trinder, Phil W.; Loidl, Hans-Wolfgang (2011). "Comparing High Level MapReduce Query Languages". Advanced Parallel Processing Technologies. Lecture Notes in Computer Science. Vol. 6965. pp. 58–72. doi:10.1007/978-3-642-24151-2_5. ISBN 978-3-642-24150-5.

[4] JAQL in Hadoop, a brief introduction

[5] IBM BigInsights Documentation

[1]

[2]

[3]

[4]

[5]

v t e Query languages
In current use	.QL ALPHA CQL Cypher DAX DMX Datalog GraphQL Graph Query Language Gremlin ISBL LDAP LINQ MQL MDX OQL OCL QUEL RDF SMARTS SPARQL SQL XQuery XPath YQL
Proprietary	YQL LINQ
Superseded	CODASYL