User:PerfektesChaos/js/stringLib

From Wikipedia, the free encyclopedia

JavaScript function library of utilities to analyze and manipulate strings.

Usage[edit]

Import[edit]

  • Include the following lines into your common.js:
mw.loader.load("https://en.wikipedia.org/w/index.php?title=User:PerfektesChaos/js/stringLib/r.js&action=raw&bcache=&maxage=604800&ctype=text/javascript",
               "text/javascript");
  • This works also with non-WMF sites using MediaWiki.
  • Actually, it might be used anywhere since it is not depending from anything. It works even out of browsers. Please respect license statements: CC-BY-SA and GNU FDL.

Activation[edit]

  • In MediaWiki context the library establishes itself as PerfektesChaos_stringLib component of the mw.libs collection.
  • Otherwise, within interactive environment the library is built on-the-fly under window.mw.libs component.
  • Under non-interactive circumstances mw.libs is put into global object without var declaration.

After loading, you are supposed to integrate the library into your own application object by calling something like

  • yourAppObj.str = mw.libs.PerfektesChaos_stringLib

Within yourAppObj the functions can be referred now as

  • yourAppObj.str.fff()

MediaWiki environment[edit]

In Mediawiki environment the following hook function may be declared:

mw.hook( "PerfektesChaos_stringLib.ready" ).add( callback );

That callback function (e.g. myTask) triggers the actual functionality of the user application. It is called as soon loading was successfully completed.

function myTask( application ) may use one parameter. That is the application object for the library. It is supposed to be mapped into mw.libs.PerfektesChaos_stringLib also. It might be mapped by yourAppObj.str = application into your environment.

Codes[edit]

Source code
ResourceLoader
  • user:PerfektesChaos_stringLib
  • Dependencies: —
mw.libs PerfektesChaos_stringLib
mw.hook PerfektesChaos_stringLib.ready

Documentation[edit]

yourAppObj is represented by leading dot here.

The following items may be overwritten by user, defaulting to false:

.charEnt5single[edit]

.charEnt5single

HTML 5 single character entities

object

.locateEntities[edit]

.locateEntities

User option: Expect any HTML entity

true or false

.sortLang[edit]

.sortLang

User option: Language used for sorting

string or false

.sortMode[edit]

.sortMode

User option: Special sorting mode

string like 'de-DIN31638' or false

.spaces[edit]

.spaces

Various types of spaces

string with all spaces except ASCII

.sticks[edit]

.sticks

Various types of horizontal dashes and lines

string with all dashes including ASCII hyphen-minus in front

.camelCasing()[edit]

.camelCasing(alter)

Upcase first character, keep inner camelCasing

Parameters
alter – string to be camelCased
Returns
camelCased string

.capitalize()[edit]

.capitalize(alter)

Upcase first character, downcase anything else

Parameters
alter – string to be capitalized
Returns
capitalized string

.charEntity()[edit]

.charEntity(adjust)

Retrieve character code (UCS) for named HTML4 or numeric entity

Parameters
adjust – string to be examined
Returns
information about character
false if not resolved
number UCS code of single character
Since
JavaScript 1.3 String.charCodeAt()

.charEntityAt()[edit]

.charEntityAt(adjust, address, advance)

Retrieve character code of ML entity at position

Parameters
adjust – string to be examined
address – position in adjust
advance – true: '&' at address; false: ';' at address
Returns
Array with entity information, or false
[0] code value
[1] entity position
[2] length of entity
Since
JavaScript 1.3 String.charCodeAt()

.charEntityCode()[edit]

.charEntityCode(adjust)

Retrieve character code (UCS) for numeric ML entity

Parameters
adjust – string with character entity like "&#xHH;" or "&#NN;"
first two characters are assumed to be '&#'
third character may be 'x' or digit
last character is assumed to be ';'
Returns
information about character
false if not resolved
number UCS code of single character
Since
JavaScript 1.3 String.charCodeAt()

.charEntityHTML4()[edit]

.charEntityHTML4(adjust)

Retrieve character code (UCS) for named HTML4 (or similar) entity

deprecated – replaced by #.charEntityHTML5single()
Parameters
adjust – string with character named entity "&xyz;"
first character is assumed to be '&'
last character is assumed to be ';'
Returns
information about character
false if not resolved
number UCS code of single character

.charEntityHTML5single()[edit]

.charEntityHTML5single(adjust)

Retrieve single character code (UCS) for named HTML5 entity

Parameters
adjust – string with character named entity "&xyz;"
first character is assumed to be '&'
last character is assumed to be ';'
Returns
information about character
false if not resolved
number UCS code of single character

.deCapitalize()[edit]

.deCapitalize(alter)

Downcase first character; keep inner camelCasing

Parameters
alter – string to be decapitalized
Returns
decapitalized string

.decodeOctet()[edit]

.decodeOctet(assembly, address)

Retrieve hexadecimal value of octet similar to parseInt() base 16 but consider uppercase A-F only

Parameters
assembly – string to be analyzed
address – index in string
Returns
parsed number 0...15, or -1 if invalid
Since
JavaScript 1.3 String.charCodeAt()

.decodeXML()[edit]

.decodeXML(alter)

Convert string with XML entities as unescaped string

Parameters
alter – string to be analyzed
Returns
string, may be unchanged
Since
JavaScript 1.3 String.charCodeAt() String.fromCharCode()

.escapeLight()[edit]

.escapeLight(alter)

Minimal escaping for HTML

Parameters
alter – string to be escaped
Returns
string with escaping

.fromCharCode()[edit]

.fromCharCode(apply)

Extended fromCharCode for UCS > 0xFFFF (4 bytes/char)

Parameters
apply – number, UCS
Returns
single character, which might have a string length of 2 instead of 1
Since
JavaScript 1.3 String.fromCharCode() 2 byte chars only

.fromNum()[edit]

.fromNum(adjust)

Format number as string

Parameters
adjust – number to be formatted
Returns
adjust as string

.hexcode()[edit]

.hexcode(amount, align, allow)

Retrieve hexadecimal representation

Parameters
amount – number: decimal
align – left padded number of digits, or false
allow – true: use lowercase letters
Returns
string with hex number

.isASCII()[edit]

.isASCII(ask)

Test for ASCII only characters

Parameters
ask – string to be examined
Returns
true iff ask consists of ASCII characters only

.isBlank()[edit]

.isBlank(ask, any)

Test for invisible character

Parameters
ask – character code to be examined
anytrue: include zero width and marks
Returns
true iff ask is any space or other invisible character code

.isLetter()[edit]

.isLetter(ask)

Test whether a character is a letter (latin based, greek, cyrillic)

Parameters
ask – character code to be examined, or string (first char)
Returns
true iff ask is identified as any kind of letter
Since
JavaScript 1.3 String.charCodeAt()

.isWhiteBlank()[edit]

.isWhiteBlank(ask, any, against)

Test for invisible character or newline

Parameters
ask – character code to be examined
anytrue: include zero width and direction marks
againsttrue: behave like .isBlank()
Returns
true iff ask is any whitespace or other invisible
See
.isBlank()

.makeString()[edit]

.makeString(apply, amount)

Return string of certain length with repeated character

Parameters
apply – character code to be set
amount – number of repeated characters apply
Returns
new string

.parseIntNumber()[edit]

.parseIntNumber(apply, assign)

Parse integer number string, but do not return NaN

Parameters
apply – string to be manipulated, or undefined
assign – number base: 10 or 16; if false detect leading 'x'
Returns
number, 0 if not caught
Since
JavaScript 1.3 String.charCodeAt()

.setChar()[edit]

.setChar(array, apply, address)

Set character or string at certain string position

Parameters
array – string to be manipulated
apply – character code or string to be set
address – single character position to be replaced
Returns
modified string
Since
JavaScript 1.3 String.fromCharCode() One day direct array[i] setting might work in a JavaScript String.

.setString()[edit]

.setString(array, address, adjust, apply)

Modify string in certain range

Parameters
array – string to be manipulated
address – character position to start replacement
adjust – range specification number of characters to be removed at address string (adjust.length is used as number)
apply – string to replace range
Returns
modified string
Since
JavaScript 1.3 String.fromCharCode()

.sortAppropriate()[edit]

.sortAppropriate(adjust)

Retrieve sortable character(s) in particular local environment (hook)
(RegExp is not modified)

Parameters
adjust – character code of a single character
196 * Ä
197 * Å
198 * Æ *always*
228 * ä
229 * å
230 * æ *always*
208 * Ð
272 * Dstroke
240 * ð
273 * dstroke
568 * db digraph *always*
452 * D with Z caron *always*
497 * D with Z *always*
453 * D with z caron *always*
498 * D with z *always*
454 * d with z caron *always*
499 * d with z *always*
455 * L with J *always*
456 * L with j *always*
457 * l with j *always*
458 * N with J *always*
459 * N with j *always*
460 * n with j *always*
214 * Ö
246 * ö
338 * OElig *always*
339 * oelig *always*
546 * OU *always*
547 * ou *always*
569 * qp digraph *always*
223 * ß *always*
7838 * capital sharp S *always*
222 * Þ *always*
254 * þ *always*
220 * Ü
252 * ü
Returns
information about sortable character
false no particular local request
true remove character from sort key
number with ASCII code of single character
string of two ASCII characters, (first) character case will be kept, second char (if any) downcase.
See
.sortLang

.sortChar()[edit]

.sortChar(adjust)

Retrieve sortable character(s) for non-ASCII Latin based Unicode
(RegExp is not modified)

Parameters
adjust – character code of a single character
(expecting adjust from 160 up)
Returns
information about sortable character
false if nothing to do
true remove character from sort key
number with ASCII code of single character
string of two ASCII characters, (first) character case will be kept, second char (if any) downcase.
Only glyphs used in any (European) language considered.

.sortLocale()[edit]

.sortLocale(adjust, area)

Retrieve sortcode char or string for Unicode

Parameters
adjust – string to be checked
area – language code, or false
de German DIN 31638 (DIN 5007) requests umlaut "Ae" when sorting names of persons,
Returns
sortable string or character
false no particular local request
Replace by two character string for German umlauts or scandinavian "aa" for Aring.
See
.sortMode

.sortString()[edit]

.sortString(adjust, advanced)

Retrieve sortable string for non-ASCII Latin based Unicode
Trailing or multiple whitespace shrinks.

Parameters
adjust – string to be checked or modified
advanced – optional
true Replace two character string for German umlauts and scandinavian Aring.
German DIN 31638 (DIN 5007) requests umlaut "Ae" when sorting names of persons, and scandinavian languages use the same transscription as well as "aa" for aring.
Returns
information about sortable string
false if nothing to do, adjust is fine
string changes against adjust
Only glyphs used in any (European) language considered.
Since
JavaScript 1.3 String.charCodeAt() String.fromCharCode()

.spaced()[edit]

.spaced(adjust, any, allow)

Turn spacing charcodes of any kind into ASCII spaces, and trim

Parameters
adjust – string to be standardized
any – true: remove also zero width and direction marks
allow – true: keep entities
Returns
modified string // .isWhiteBlank() // .charEntityAt()

.substrEnd()[edit]

.substrEnd(apply, amount, after)

Retrieve last characters from string like Mozilla substr(-n, n)

Parameters
apply – string
amount – position counted from end
after – optional: number of chars, if not amount
Returns
string at end
Since
JavaScript 1.0 String.substr()

This function has been included for compatibility reasons. With ECMA.3, String.slice() with negative start argument will work. String.slice() with negative argument wasn’t defined in earlier JS. String.substr() with negative argument does not go with IE.

.substrExcept()[edit]

.substrExcept(apply, amount)

Retrieve all but last characters from string

Parameters
apply – string
amount – position counted from end
Returns
string until near end
See
.substrEnd()

.terminated()[edit]

.terminated(adjust, at)

Return substring terminated by separator, or entirely

Parameters
adjust – string to be extracted
at – string with separator to be excluded
Returns
modified string, excluding at

.trim()[edit]

.trim(adjust, any, aware, allow)

Remove heading or trailing spacing charcodes of any kind

Parameters
adjust – string to be trimmed
any – true: remove also zero width and direction marks
aware – true: remove also trailing line breaks
allow – true: keep entities
Returns
modified string

.trimL()[edit]

.trimL(adjust, any, aware, allow)

Return string without heading spacing charcodes of any kind

Parameters
adjust – string to be trimmed
any – true: remove also zero width and direction marks
aware – true: remove also line breaks
allow – true: keep entities
Since
JavaScript 1.3 String.charCodeAt()
See
.locateEntities

.trimR()[edit]

.trimR(adjust, any, aware, align, allow)

Return string without trailing spaces charcodes of any kind

Parameters
adjust – string to be trimmed
any – true: remove also zero width and direction marks
aware – true: remove also line breaks
align – true: re-establish line breaks after trimming
allow – true: keep entities
Since
JavaScript 1.3 String.charCodeAt()
See
.locateEntities

.uniques()[edit]

.uniques(adjust, against)

Return string with unique sequence of items

Parameters
adjust – string to be reduced, items separated by against
against – string with character for separation
Returns
string with all items in adjust, separated by against (no leading nor trailing against)