Talk:Numeric character reference

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Numeric character conversor[edit]


For usual needs, there are a "1 line code" conversor for Perl:

while (<STDIN>) {
       s/(.)/(ord($1)>127)? ('&#'.ord($1).';'): $1/ge;
       print $_;

(use %perl < fileIn.txt > fileOut.txt)

It converts unicode or ISO Latim to XML-compatible ASCII.


function unicode_to_ncr(text){
  var ncr_text = ""
  var text_length = text.length
  for(var index = 0; index < text_length; index++) {
     var character = text.charAt(index)
     var ncr_character = character.charCodeAt(0)
     if(ncr_character < 128) {
        ncr_text += character
     else {
        ncr_text += "&#"+ncr_character+";"
  return ncr_text

It, also, converts unicode or ISO Latin to XML-compatible ASCII.


The nomenclature used in this article is not the same as the basic SGML one. SGML has two proper names, "character reference", which is the numeric character reference described here, and "entity reference", which is a macro resolving to any sequence of characters.

The list of entity references used in HTML all resolve to exactly one character. But that doesn't make them special cases, as the phrase character entity reference implies; they just all happen to be one-character strings. Pim 2 (talk) 11:26, 11 December 2011 (UTC)