Base36

From Wikipedia, the free encyclopedia
  (Redirected from Base-36)
Jump to: navigation, search

Base36 is a binary-to-text encoding scheme that represents binary data in an ASCII string format by translating it into a radix-36 representation. The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0–9 and the Latin letters A–Z[1] (the ISO basic Latin alphabet).

Each base36 digit need less than 6 bits of information to be represented.

Conversion[edit]

Signed 32- and 64-bit integers will only hold at most 6 or 13 base-36 digits, respectively (that many base-36 digits overflow the 32- and 64-bit integers). For example, the 64-bit signed integer maximum value of "9223372036854775807" is "1Y2P0IJ32E8E7" in base-36.

Java implementation[edit]

Java SE supports conversion from/to String to different bases from 2 up to 36. For example, [1] and [2]

C implementation[edit]

static char *base36enc(long unsigned int value)
{
	char base36[36] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
	/* log(2**64) / log(36) = 12.38 => max 13 char + '\0' */
	char buffer[14];
	unsigned int offset = sizeof(buffer);

	buffer[--offset] = '\0';
	do {
		buffer[--offset] = base36[value % 36];
	} while (value /= 36);

	return strdup(&buffer[offset]); // warning: this must be free-d by the user
}

static long unsigned int base36dec(const char *text)
{
	return strtoul(text, NULL, 36);
}

Python implementation[edit]

def base36encode(integer):
    chars, encoded = '0123456789abcdefghijklmnopqrstuvwxyz', ''

    while integer > 0:
        integer, remainder = divmod(integer, 36)
        encoded = chars[remainder] + encoded

    return encoded

Perl implementation[edit]

sub base36encode {
     my @map = split//,"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
     my $number = shift;
     my $output = "";
     my ($q,$r);

     do {
          ($q,$r) = (int($number/36),$number%36);
          $number /= 36;
          $output = $map[$r] . $output;
          } while ($q);

     return $output;
}

C++ implementation[edit]

std::string to_base36(unsigned int val)
{
	static std::string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
	std::string result;
	result.reserve(14);
	do {
		result = base36[val % 36] + result;
	} while (val /= 36);
	return result;
}

C# implementation[edit]

private static string ToBase36(ulong value)
{
    const string base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    var sb = new StringBuilder(13);
    do
    {
       sb.Insert(0, base36[(byte)(value % 36)]);
       value /= 36;
    } while (value != 0);
    return sb.ToString();
}

bash implementation[edit]

value=$1
result=""
base36="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
while true; do
	result=${base36:((value%36)):1}${result}
	if [ $((value=${value}/36)) -eq 0 ]; then
		break
	fi
done
echo ${result}

Visual Basic implementation[edit]

Public Function ToBase36String(i as UInteger) As String
    Const rainbow = "0123456789ABCDEFGHIJLKMNOPQRSTUVWXYZ"
    i = Math.Abs(i)
    Dim sb = New StringBuilder()
    Do
        sb.Insert(0, rainbow(i Mod 36))
        i /= 36
    Loop While i <> 0
    Return sb.ToString()
End Function

Swift implementation[edit]

extension IntegerType {
    // can convert any integer type to any base (2–36)
    func toBase(b:Int) -> String
    {
        guard b > 1 && b < 37 else {
            fatalError("base out of range")
        }
        let digits = ["0","1","2","3","4","5","6","7","8","9","A",
                      "B","C","D","E","F","G","H","I","J","K","L",
                      "M","N","O","P","Q","R","S","T","U","V","W",
                      "X","Y","Z"]
        var result = ""
        
        if let v = self as? Int {
            var value = abs(v)
            repeat {
                result = digits[value % b] + result
                value = value / b
            } while (value > 0)
        }
        return self > 0 ? result : "-" + result
    }
}

// Swift 3
String(myInt, radix: 36)

Uses in practice[edit]

  • The Remote Imaging Protocol for bulletin board systems used base 36 notation for transmitting coordinates in a compact form.
  • Many URL redirection systems like TinyURL or SnipURL/Snipr also use base 36 integers as compact alphanumeric identifiers.
  • Geohash-36, a coordinate encoding algorithm, uses radix 36 but uses a mixture of lowercase and uppercase alphabet characters in order to avoid vowels, vowel-looking numbers, and other character confusion.
  • Various systems such as RickDate use base 36 as a compact representation of Gregorian dates in file names, using one digit each for the day and the month.
  • Dell uses a 5- or 7-digit base 36 number (Service Tag) as a compact version of their Express Service Codes.
  • The software package SalesLogix uses base 36 as part of its database identifiers.[2]
  • The TreasuryDirect website, which allows individuals to buy and redeem securities directly from the U.S. Department of the Treasury in paperless electronic form, serializes security purchases in an account using a 4-digit base 36 number. However, the Latin letters A–Z are used before the Arabic numerals 0–9, so that the purchases are listed as AAAA, AAAB... AAAZ, AAA0, AAA1... AAA9, AABA...
  • The E-mail client program PMMail encodes the UNIX time of the email's arrival and uses this for the first six characters of the message's filename.
  • MediaWiki stores uploaded files in directories with names derived from the base-36 representation of an uploaded file's checksum.[3]
  • Siteswap, a type of juggling notation, frequently employs 0–9 and a–z to signify the dwell time of a toss (which may roughly be thought of as the height of the throw). Throws higher than 'z' may be made but no notation has widespread acceptance for these throws.
  • In SEDOL securities identifiers, the check digit is computed from a weighted sum of the first six characters, each character interpreted in base-36.
  • In the International Securities Identification Number (ISIN), the check digit is computed by first taking the value of each character in base-36, concatenating the numbers together, then doing a weighted sum.
  • Reddit uses base-36 for identifying posts and comments.
  • QRcode with alphanumeric encoding mode, it is not exactly base-36, but as URL shortening are the most used, base-36 and alphanumeric is the best-fit.[4]

References[edit]

  1. ^ Hope, Paco; Walther, Ben (2008), Web Security Testing Cookbook, Sebastopol, CA: O'Reilly Media, Inc., ISBN 978-0-596-51483-9 
  2. ^ Sage SalesLogix base-36 identifiers: http://www.slxdeveloper.com/page.aspx?action=viewarticle&articleid=87
  3. ^ FileStore "Archived copy". Archived from the original on 2008-12-02. Retrieved 2009-05-06. 
  4. ^ "QR Code encode mode for short URLs"

External links[edit]