Comparison of regular expression engines
From Wikipedia, the free encyclopedia
|
|
This article may require cleanup to meet Wikipedia's quality standards. The specific problem is: redlinks etc for nonnotable programs. (January 2013) |
|
|
This article's use of external links may not follow Wikipedia's policies or guidelines. (January 2013) |
This is a comparison of regular expression engines.
Contents |
Libraries [edit]
Languages [edit]
| Language | Official website | Software license | Remarks |
|---|---|---|---|
| .NET | MSDN | Proprietary | |
| C++11 (C++) | C++ standards website | ? | since ISO14822:2011(e) |
| D | D | Boost Software License[Note 1] | |
| Go | Golang.org | BSD-style | |
| Haskell | Haskell.org | BSD3 | Omitted in the language report, and in GHC's Hierarchical Libraries |
| Java | Java | GNU General Public License | REs are written as strings in source code: all backslashes must be doubled, harming readability |
| JavaScript (ECMAScript) | ECMA-262 | BSD3 | Limited but REs are first-class citizens of the language with a specific /.../mod syntax |
| Lua | Lua.org | MIT License | Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg |
| Mathematica | Wolfram | Proprietary | |
| Free Pascal (Object Pascal) | www.freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin and 2 other regular expression libraries; see http://wiki.lazarus.freepascal.org/Regexpr |
| Cocoa (Objective-C) | Apple | Proprietary | As of 2012, available on only iOS 4+ and OS X 10.7+ |
| OCaml | Caml | LGPL | |
| Perl | Perl.com | Artistic License, or GNU General Public License | Full, central part of the language |
| PHP | PHP.net | PHP License | Has two implementations, with PCRE being the more efficient in speed, functions |
| Python | python.org | Python Software Foundation License | |
| Ruby | ruby-doc.org | GNU Library General Public License | Ruby 1.8 and 1.9 use different engines; 1.9 integrates Oniguruma |
| SAP ABAP | SAP.com | Proprietary | |
| Tcl 8.4 | tcl.tk | Tcl/Tk License (BSD-style) |
|
| ActionScript 3 | ActionScript Technology Center | Free |
Language features [edit]
NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.
Part 1 [edit]
| "+" quantifier | Negated character classes | Non-greedy quantifiers[Note 1] | Shy groups[Note 2] | Recursion | Lookahead | Lookbehind | Backreferences[Note 3] | >9 indexable captures | |
|---|---|---|---|---|---|---|---|---|---|
| Boost.Regex | Yes | Yes | Yes | Yes | Yes [Note 4] | Yes | Yes | Yes | Yes |
| Boost.Xpressive | Yes | Yes | Yes | Yes | Yes [Note 5] | Yes | Yes | Yes | Yes |
| CL-PPCRE | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| EmEditor | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
| FREJ | No [Note 6] | No | Some [Note 6] | Yes | No | No | No | Yes | Yes |
| GLib/GRegex | Yes | ? | Yes | ? | No | ? | ? | ? | ? |
| GNU Grep | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | ? |
| Haskell | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| ICU Regex | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| Java | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| JavaScript (ECMAScript) | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
| JGsoft | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| .NET | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| OCaml | Yes | Yes | No | No | No | No | No | Yes | No |
| OmniOutliner 3.6.2 | Yes | Yes | Yes | No | No | No | No | ? | ? |
| PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| PHP | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Python | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| Qt/QRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
| R [Note 7] | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| re2 | Yes | Yes | Yes | Yes | No | No | No | No | Yes |
| Ruby | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| TRE | Yes | Yes | Yes | Yes | No | No | No | Yes | No |
| Vim Patch 7.3.1004 (2013-05-22) [±] | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No |
| RGX | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes |
| TRegExpr | Yes | ? | Yes | ? | ? | ? | ? | ? | ? |
| XRegExp | Yes | Yes | Yes | Yes | No | Yes | No | Yes | Yes |
- ^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all
- ^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the groups content needs not be accessed later.
- ^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab"
- ^ http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
- ^ http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference
- ^ a b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier
- ^ Regular Expressions as used in R
Part 2 [edit]
| Directives [Note 1] | Conditionals | Atomic groups [Note 2] | Named capture [Note 3] | Comments | Embedded code | Unicode property support [4] | |
|---|---|---|---|---|---|---|---|
| Boost.Regex | Yes | Yes | Yes | Yes | Yes | No | Some [Note 4] |
| Boost.Xpressive | Yes | No | Yes | Yes | Yes | No | No |
| CL-PPCRE | Yes | Yes | Yes | Yes | Yes | Yes | No |
| EmEditor | Yes | Yes | ? | ? | Yes | No | ? |
| FREJ | No | No | Yes | Yes | Yes | No | ? |
| GLib/GRegex | Yes | Yes | Yes | Yes | Yes | No | Some [Note 4] |
| GNU Grep | Yes | Yes | ? | Yes | Yes | No | No |
| Haskell | ? | ? | ? | ? | ? | No | No |
| ICU Regex | Yes | No | Yes | No | Yes | No | Yes |
| Java | Yes | No | Yes | Yes [Note 5] | No | No | Some [Note 4] |
| JavaScript (ECMAScript) | No | No | No | No | No | No | No |
| JGsoft | Yes | Yes | Yes | Yes | Yes | No | Some [Note 4] |
| .NET | Yes | Yes | Yes | Yes | Yes | No | Some [Note 4] |
| OCaml | No | No | No | No | No | No | No |
| OmniOutliner 3.6.2 | ? | ? | ? | ? | No | No | ? |
| PCRE | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Perl | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| PHP | Yes | Yes | Yes | Yes | Yes | No | No |
| Python | Yes | Yes | No | Yes | Yes | No | No |
| Qt/QRegExp | No | No | No | No | No | No | No |
| re2 | Yes | No | ? | Yes | No | No | Some [Note 4] |
| Ruby | Yes | No | Yes | Yes | Yes | Yes | Some [Note 4] |
| TRE | Yes | No | No | No | Yes | No | ? |
| Vim | Yes | No | Yes | No | No | No | No |
| RGX | Yes | Yes | Yes | Yes | Yes | No | Yes |
| XRegExp | Leading only | No | No | Yes | Yes | No | Yes |
- ^ Also known as Flags modifiers or Option letters. Example pattern: "(?i:test)"
- ^ Also called Independent sub-expressions
- ^ Similar to back references but with names instead of indices
- ^ a b c d e f g Unicode property support may be incomplete (products are continuously updated!). **All will be incomplete** when a new Unicode revision is released *until* they are updated to comply.
- ^ Available as of JDK7.
API features [edit]
| Native UTF-16 support [Note 1] | Native UTF-8 support [Note 1] | Multi-line matching | |
|---|---|---|---|
| Boost.Regex | No | No | Yes |
| GLib/GRegex | Yes | Yes | Yes |
| ICU Regex | Yes | No | Yes |
| Java | No | Partial [Note 2] | Yes |
| .NET | No [Note 3] | No | Yes |
| PCRE | Yes [Note 4] | Yes | Yes |
| Qt/QRegExp | Yes | No | No |
| TRE | No | ? | Yes |
| RGX | No | No | Yes |
| XRegExp | Yes | ? | Yes |
- ^ a b Means the format can be used internally without explicit conversion
- ^ Supports Unicode 4.0 standard from 2003; latest plans for JDK7 include Unicode 6.0 (2011) support [1]
- ^ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010). [2].
- ^ Since version 8.30
See also [edit]
External links [edit]
- Regular Expression Flavor Comparison — Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary
- Online Regular Expression Testing - with support for Java, JavaScript, .Net, PHP, Python and Ruby