Comparison of regular expression engines

From Wikipedia, the free encyclopedia
Jump to: navigation, search

This is a comparison of regular expression engines.

Libraries[edit]

List of regular expression libraries
Name Official website Programming language Software license Used by
Boost.Regex[Note 1] Boost C++ Libraries C++ Boost
Boost.Xpressive Boost C++ Libraries C++ Boost
CL-PPCRE Edi Weitz Common Lisp BSD
cppre Jeff Stuart C++ GPL
DEELX RegExLab C++ Free personal and commercial use
FREJ[Note 2] Fuzzy Regular Expressions for Java Java LGPL
GLib/GRegex[Note 3] Marco Barisione C LGPL
GRETA Microsoft Research C++ ?
ICU International Components for Unicode C, C++ [Note 4] ICU
Jakarta/Regexp The Apache Jakarta Project Java Apache
JRegex JRegex Java BSD
Oniguruma Kosako C BSD
Pattwo Stevesoft Java (compatible with Java 1.0) LGPL
PCRE pcre.org C, C++[Note 5] BSD Nginx, Julia
Qt/QRegExp Digia C++ Qt GNU GPL v. 3.0,

Qt GNU LGPL v. 2.1, Qt Commercial

Kate, Kile
regex - Henry Spencer's regular expression libraries ArgList C BSD
re2 Google Code C++ BSD
Henry Spencer's Advanced Regular Expressions Tcl C BSD
TRE [Note 2] Ville Laurikari C BSD
TPerlRegEx TPerlRegEx VCL Component Object Pascal MPLv1.1
TRegExpr RegExp Studio Object Pascal Dual-license: freeware, or LGPL with static linking exception
RGX RGX C++ based component library P6R
XRegExp XRegExp JavaScript MIT
  1. ^ formerly called Regex++
  2. ^ a b one of fuzzy regular expression engines
  3. ^ included since version 2.13.0
  4. ^ ICU4J, the Java version, does not support regular expressions
  5. ^ C++ bindings were developed by Google and became officially part of PCRE in 2006

Languages[edit]

List of languages and frameworks including regular expression support
Language Official website Software license Remarks
.NET MSDN Proprietary
C++11 (C++) C++ standards website ? since ISO14822:2011(e)
D D Boost Software License[Note 1]
Go Golang.org BSD-style
Haskell Haskell.org BSD3 Omitted in the language report, and in GHC's Hierarchical Libraries
Java Java GNU General Public License REs are written as strings in source code: all backslashes must be doubled, harming readability
JavaScript (ECMAScript) ECMA-262 BSD3 Limited but REs are first-class citizens of the language with a specific /.../mod syntax
Julia JuliaLang.org MIT License REs are part of the language core library; uses PCRE
Lua Lua.org MIT License Uses simplified, limited dialect; can be bound to more powerful library, like PCRE or an alternative parser like LPeg
Mathematica Wolfram Proprietary
Free Pascal (Object Pascal) www.freepascal.org LGPL with static linking exception Free Pascal 2.6+ ships with TRegExpr from Sorokin and 2 other regular expression libraries; see http://wiki.lazarus.freepascal.org/Regexpr
Cocoa (Objective-C) Apple Proprietary As of 2012, available on only iOS 4+ and OS X 10.7+
OCaml Caml LGPL
Perl Perl.com Artistic License, or GNU General Public License Full, central part of the language
PHP PHP.net PHP License Has two implementations, with PCRE being the more efficient in speed, functions
Python python.org Python Software Foundation License
Ruby ruby-doc.org GNU Library General Public License Ruby 1.8 and 1.9 use different engines; 1.9 integrates Oniguruma
SAP ABAP SAP.com Proprietary
Tcl tcl.tk Tcl/Tk License
(BSD-style)
Tcl library doubles as a regular expression library
ActionScript 3 ActionScript Technology Center Free

Language features[edit]

NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookahead support, though PCRE does.

Part 1[edit]

Language feature comparison (part 1)
"+" quantifier Negated character classes Non-greedy quantifiers[Note 1] Shy groups[Note 2] Recursion Lookahead Lookbehind Backreferences[Note 3] >9 indexable captures
Boost.Regex Yes Yes Yes Yes Yes [Note 4] Yes Yes Yes Yes
Boost.Xpressive Yes Yes Yes Yes Yes [Note 5] Yes Yes Yes Yes
CL-PPCRE Yes Yes Yes Yes No Yes Yes Yes Yes
EmEditor Yes Yes Yes Yes No Yes Yes Yes No
FREJ No [Note 6] No Some [Note 6] Yes No No No Yes Yes
GLib/GRegex Yes ? Yes ? No ? ? ? ?
GNU Grep Yes Yes Yes Yes No Yes Yes Yes ?
Haskell Yes Yes Yes Yes No Yes Yes Yes Yes
ICU Regex Yes Yes Yes Yes No Yes Yes Yes Yes
Java Yes Yes Yes Yes No Yes Yes Yes Yes
JavaScript (ECMAScript) Yes Yes Yes Yes No Yes No Yes Yes
JGsoft Yes Yes Yes Yes No Yes Yes Yes Yes
Lua Yes Yes Yes No No No No Yes No
.NET Yes Yes Yes Yes No Yes Yes Yes Yes
OCaml Yes Yes No No No No No Yes No
OmniOutliner 3.6.2 Yes Yes Yes No No No No ? ?
PCRE Yes Yes Yes Yes Yes Yes Yes Yes Yes
Perl Yes Yes Yes Yes Yes Yes Yes Yes Yes
PHP Yes Yes Yes Yes Yes Yes Yes Yes Yes
Python Yes Yes Yes Yes No Yes Yes Yes Yes
Qt/QRegExp Yes Yes Yes Yes No Yes No Yes Yes
R [Note 7] Yes Yes Yes Yes No Yes Yes Yes Yes
re2 Yes Yes Yes Yes No No No No Yes
Ruby Yes Yes Yes Yes Yes Yes Yes Yes Yes
TRE Yes Yes Yes Yes No No No Yes No
Vim 7.4b.000 (2013-07-28) [±] Yes Yes Yes Yes No Yes Yes Yes No
RGX Yes Yes Yes Yes No Yes Yes Yes Yes
Tcl Yes Yes Yes Yes No Yes Yes Yes Yes
TRegExpr Yes ? Yes ? ? ? ? ? ?
XRegExp Yes Yes Yes Yes No Yes No Yes Yes
  1. ^ Non-greedy quantifiers match as few characters as possible, instead of the default as many. Note that many older, pre-POSIX engines were non-greedy and didn't have greedy quantifiers at all
  2. ^ Shy groups, also called non-capturing groups cannot be referred to with backreferences; non-capturing groups are used to speed up matching where the groups content needs not be accessed later.
  3. ^ Backreferences enable referring to previously matched groups in later parts of the regex and/or replacement string (where applicable). For instance, ([ab]+)\1 matches "abab" but not "abaab"
  4. ^ http://www.boost.org/doc/libs/1_47_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.recursive_expressions
  5. ^ http://www.boost.org/doc/libs/1_47_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.grammars_and_nested_matches.embedding_a_regex_by_reference
  6. ^ a b FREJ have no repetitive quantifiers, but have "optional" element which behaves similar to simple "?" quantifier
  7. ^ Regular Expressions as used in R

Part 2[edit]

Language feature comparison (part 2)
Directives [Note 1] Conditionals Atomic groups [Note 2] Named capture [Note 3] Comments Embedded code Unicode property support [1]
Boost.Regex Yes Yes Yes Yes Yes No Some [Note 4]
Boost.Xpressive Yes No Yes Yes Yes No No
CL-PPCRE Yes Yes Yes Yes Yes Yes No
EmEditor Yes Yes ? ? Yes No ?
FREJ No No Yes Yes Yes No ?
GLib/GRegex Yes Yes Yes Yes Yes No Some [Note 4]
GNU Grep Yes Yes ? Yes Yes No No
Haskell ? ? ? ? ? No No
ICU Regex Yes No Yes No Yes No Yes
Java Yes No Yes Yes [Note 5] No No Some [Note 4]
JavaScript (ECMAScript) No No No No No No No
JGsoft Yes Yes Yes Yes Yes No Some [Note 4]
Lua No No No No No No No
.NET Yes Yes Yes Yes Yes No Some [Note 4]
OCaml No No No No No No No
OmniOutliner 3.6.2 ? ? ? ? No No ?
PCRE Yes Yes Yes Yes Yes Yes Yes
Perl Yes Yes Yes Yes Yes Yes Yes
PHP Yes Yes Yes Yes Yes No No
Python Yes Yes No Yes Yes No No
Qt/QRegExp No No No No No No No
re2 Yes No  ? Yes No No Some [Note 4]
Ruby Yes No Yes Yes Yes Yes Some [Note 4]
Tcl Yes No Yes No Yes No Yes
TRE Yes No No No Yes No ?
Vim Yes No Yes No No No No
RGX Yes Yes Yes Yes Yes No Yes
XRegExp Leading only No No Yes Yes No Yes
  1. ^ Also known as Flags modifiers or Option letters. Example pattern: "(?i:test)"
  2. ^ Also called Independent sub-expressions
  3. ^ Similar to back references but with names instead of indices
  4. ^ a b c d e f g Unicode property support may be incomplete (products are continuously updated!). **All will be incomplete** when a new Unicode revision is released *until* they are updated to comply.
  5. ^ Available as of JDK7.

API features[edit]

API feature comparison
Native UTF-16 support [Note 1] Native UTF-8 support [Note 1] Multi-line matching Partial match [Note 2]
Boost.Regex No No Yes Yes
GLib/GRegex Yes Yes Yes ?
ICU Regex Yes No Yes ?
Java No Partial [Note 3] Yes Yes
.NET No [Note 4] Yes Yes ?
PCRE Yes [Note 5] Yes Yes ?
Qt/QRegExp Yes No No ?
Tcl Yes Yes [Note 6] Yes ?
TRE No ? Yes ?
RGX No No Yes ?
XRegExp Yes ? Yes ?
  1. ^ a b Means the format can be used internally without explicit conversion
  2. ^ Partial match of the whole regular expression. For example the pattern ".*END$" will match any string partially, but only strings ending with END fully. [1]
  3. ^ Supports Unicode 4.0 standard from 2003; latest plans for JDK7 include Unicode 6.0 (2011) support [2]
  4. ^ Implementation uses original UCS-2 support/features, so it only recognizes 64K chars total (vs UTF-16's 1,112,064 characters). A Microsoft developer-representative answered a bug report on this as "will not fix" in 2010). [3].
  5. ^ Since version 8.30
  6. ^ Tcl includes facilities to convert to and from UTF-8

See also[edit]

References[edit]

External links[edit]