Tamil All Character Encoding
![]() | This article contains content that is written like an advertisement. (June 2016) |
Tamil All Character Encoding (TACE16) is a 16-bit Unicode-based character encoding scheme for the Tamil language.[1][2] This encoding isn't used on the web. Other encodings, such as Unicode, i.e. UTF-8, have been used on the web.
Keyboard drivers and fonts[edit]
The keyboard driver for this encoding scheme is freely available on the Tamil Virtual University website.[3][4] It uses Tamil99 and Tamil Typewriter keyboard layouts, which are approved by the Tamil Nadu Government, and maps the input keystrokes to their corresponding characters in the TACE16 scheme.[2] The corresponding Unicode Tamil fonts for this encoding scheme are also available in the same website.[4][3] These fonts are also for the present Unicode encoding for both ASCII and Tamil characters, which provides backward compatibility for the present Unicode encoding scheme for Tamil.
Character set[edit]
All characters of this encoding scheme are located in the private use area of the Basic Multilingual Plane of Unicode's Universal Character Set.
Consonants→ Vowels ↓ |
E10 | E18 | E1A | E1F | E20 | E21 | E22 | E23 | E24 | E25 | E26 | E27 | E28 | E29 | E2A | E2B | E2C | E2D | E2E | E2F | E30 | E31 | E32 | E33 | E34 | E35 | E36 | E37 | E38 | E39 | E3A | E3B | E3C | E3D | E3E | E3F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ௳ | ௦ | அரைக்கால் | ் | க் | ங் | ச் | ஞ் | ட் | ண் | த் | ந் | ப் | ம் | ய் | ர் | ல் | வ் | ழ் | ள் | ற் | ன் | ||||||||||||||
1 | ௴ | ௧ | கால் | அ | க | ங | ச | ஞ | ட | ண | த | ந | ப | ம | ய | ர | ல | வ | ழ | ள | ற | ன | ||||||||||||||
2 | ௵ | ௨ | அரை | ா | ஆ | கா | ஙா | சா | ஞா | டா | ணா | தா | நா | பா | மா | யா | ரா | லா | வா | ழா | ளா | றா | னா | |||||||||||||
3 | ௶ | ௩ | முக்கால் | ி | இ | கி | ஙி | சி | ஞி | டி | ணி | தி | நி | பி | மி | யி | ரி | லி | வி | ழி | ளி | றி | னி | |||||||||||||
4 | ௷ | ௪ | அரைவீசம் | ீ | ஈ | கீ | ஙீ | சீ | ஞீ | டீ | ணீ | தீ | நீ | பீ | மீ | யீ | ரீ | லீ | வீ | ழீ | ளீ | றீ | னீ | |||||||||||||
5 | ௸ | ௫ | வீசம் | ு | உ | கு | ஙு | சு | ஞு | டு | ணு | து | நு | பு | மு | யு | ரு | லு | வு | ழு | ளு | று | னு | |||||||||||||
6 | ௹ | ௬ | மூவீசம் | ூ | ஊ | கூ | ஙூ | சூ | ஞூ | டூ | ணூ | தூ | நூ | பூ | மூ | யூ | ரூ | லூ | வூ | ழூ | ளூ | றூ | னூ | |||||||||||||
7 | ௺ | ௭ | அரைமா | ெ | எ | கெ | ஙெ | செ | ஞெ | டெ | ணெ | தெ | நெ | பெ | மெ | யெ | ரெ | லெ | வெ | ழெ | ளெ | றெ | னெ | |||||||||||||
8 | பௌர்ணமி | ௮ | ஒருமா | ே | ஏ | கே | ஙே | சே | ஞே | டே | ணே | தே | நே | பே | மே | யே | ரே | லே | வே | ழே | ளே | றே | னே | |||||||||||||
9 | அமாவாசை | ௯ | இரண்டுமா | ை | ஐ | கை | ஙை | சை | ஞை | டை | ணை | தை | நை | பை | மை | யை | ரை | லை | வை | ழை | ளை | றை | னை | |||||||||||||
A | கார்த்திகை | ௰ | மும்மா | ொ | ஒ | கொ | ஙொ | சொ | ஞொ | டொ | ணொ | தொ | நொ | பொ | மொ | யொ | ரொ | லொ | வொ | ழொ | ளொ | றொ | னொ | |||||||||||||
B | ராஜ | ௱ | நாலுமா | ோ | ஓ | கோ | ஙோ | சோ | ஞோ | டோ | ணோ | தோ | நோ | போ | மோ | யோ | ரோ | லோ | வோ | ழோ | ளோ | றோ | னோ | |||||||||||||
C | ௐ | ௲ | முந்திரி | ௌ | ஔ | கௌ | ஙௌ | சௌ | ஞௌ | டௌ | ணௌ | தௌ | நௌ | பௌ | மௌ | யௌ | ரௌ | லௌ | வௌ | ழௌ | ளௌ | றௌ | னௌ | |||||||||||||
D | அரைக்காணி | ஃ | ||||||||||||||||||||||||||||||||||
E | காணி | |||||||||||||||||||||||||||||||||||
F | முக்காணி |
Note: | |
---|---|
Newly added. Not present in Unicode_v6.3. | |
Allocated for researches(NLP) | |
For future use |
Comparison[edit]
![]() |
ACE16 over the present Unicode standard for the Tamil language:[1]
- Unicode code Tamil has code positions for 31 out of 247 Tamil Characters. These 31 characters include 12 vowels, 18 agara-uyirmey, and one aytham, not including five Grantha agara-uyirmey, which are also provided code space in Unicode Tamil. The Uyir-meys that are left out in the present Unicode Tamil are the ka, kA, ki, kI, etc., characters of Tamil.
- It uses multiple code points to render a single character.
- It requires ZWJ or ZWNJ type hidden characters.
- A sequence of characters may correspond to a single glyph, that is, ச + ெ◌ + ◌ா = ெசா. According to Unicode, ெசா is a grapheme, which is false.
- The Unicode Tamil standard includes the vowel signs as combining characters. These signs would be displayed as is by engines that detect a blank space between them and a base character. Unicode introduces the dotted circle as a Tamil character.
There was a proposal to re-encode Tamil.[5] This was rejected by Unicode, who said that the reencoding would be "damaging." These encoding methods follow the Tamil grammar that consonant+vowel=vowel-consonant (UyirMei).
Method 1 (By simple arithmetic operations): க் + இ = கி E210 (க்) + E203 (இ) – E200 (Constant) = E213 (கி) Method 2: க் (E210) + இ (E203) = கி (E213) E210 (க்) | (E203 (இ) & 000F (Constant)) = E213 (கி)
- To divide a vowel-consonant (UyirMei) character into its corresponding vowel and consonant.
/* To get Vowel */ E213 (கி) & 'F20F (Constant)' = E203 (இ) /* To get Consonant */ E213 (கி) & 'FFF0 (Constant)' = E210 (க்)
- To find whether a character is vowel or consonant or vowel-consonant (UyirMei) or numbers.
/* | - Bitwise OR * & - Bitwise AND * ! - Bitwise NOT * ^ - Bitwise XOR * ||- Conditional OR * &&- Conditional AND */ c = the TACE16 encoding for a Tamil character /* To check whether a character is vowel */ /* Method 1 */ ((c >= E201) && (c <= E20C)) == true // => Vowel /* Method 2 - If code positions E200, E20E, E20F are not used for any other purpose*/ (((c & 'E20F (Constant)')==c) && (c != E20D)) == true // => Vowel ((!((c & 'E20F (Constant)')^c)) && (c != E20D)) == true // => Vowel /* To check whether a character is consonant or Vowel-consonant (UyirMei) */ x = (c & '000F (Constant)') // If c is Vowel or Vowel-Consonant, then x = Unique number for each vowel starting from 1 (((c >= E210) && (c <= E38C)) && (x == 0)) == true // => Consonant (((c >= E210) && (c <= E38C)) && ((x >= 1) && (x <= 12))) == true // => Vowel-Consonant(UyirMei) /* To check whether a character is Tamil number */ /* Method 1 */ ((c >= E180) && (c <= E18C)) == true // => Tamil Number /* Method 2*/ //If code positions E18D-E18F are not used for any other purpose (c & 'E18F (Constant)') == c // => Tamil Number (!((c & 'E18F (Constant)')^c)) == true // => Tamil Number //If code positions E18D-E18F are used for any other purpose, then either Method 1 or below method can be used*/ ((!((c & 'E18F (Constant)')^c)) && ((c & '000F (Constant)') <= 12)) == true // => Tamil Number
- To convert numbers to Tamil numbers and vice versa.
/* To convert a number to new format of Tamil number and vice versa, direct digit to digit conversion is enough. */ /* To convert a number to new format of Tamil number */ n = single digit number (0-9) /* Method 1 */ (n & 'E18F (Constant)') // => Tamil Number /* Method 2 */ (n | 'E180 (Constant)') // => Tamil Number /* To convert new format of Tamil number to a number */ c = single digit Tamil number character(௦-௯) (c & '000F (Constant)') // => Number
Alternative claims[edit]
Open-Tamil[edit]
The Open-Tamil project[6] provides many of the common operations, e.g. to extract letters from Unicode UTF-8 encoded string, sorting, searching, etc. Even though the project claims Level-1 compliance of Tamil text processing without using TACE16, the project is written on top of extra programming logic which is needed for present Unicode Standard for Tamil.
#!/usr/bin/env python
import codecs
import tamil.utf8 as utf8
with codecs.open('singl', 'w', encoding='utf-8') as ff:
letters = utf8.get_letters(u"கூவிளம் என்பது என்ன சீர்")
for letter in letters:
ff.write(letter)
print(letter)
ff.write(' ')
ff.close()
generates the output, "output: கூ வி ள ம் எ ன் ப து எ ன் ன சீ ர்"
See also[edit]
- TSCII (Tamil Script Code for Information Interchange)
- AnyTaFont2UTF8 – An Open source project for all Tamil Encoding/Font Mapping characters.
References[edit]
- ^ a b Report on the final recommendations of the task force on TACE16
- ^ a b Tamil Nadu Government's Tender Document for development of Tamil fonts and Tamil keyboard driver for 16-bit encodings (Unicode and TACE16)
- ^ a b Tamil Nadu Government's Order(G.O.), Keyboard Drivers and Fonts
- ^ a b "தமிழ் எழுத்துருக்கள் | தமிழ் இணையக் கல்விக்கழகம் Tamil Virtual Academy".
- ^ https://www.unicode.org/L2/L2012/12033-tamil-presentation.pdf[bare URL PDF]
- ^ https://pypi.org/project/Open-Tamil/ open-tamil project