Talk:Bi-directional text

From Wikipedia, the free encyclopedia
Jump to: navigation, search
          This article is of interest to the following WikiProjects:
WikiProject Writing systems (Rated Start-class, High-importance)
WikiProject icon This article falls within the scope of WikiProject Writing systems, a WikiProject interested in improving the encyclopaedic coverage and content of articles relating to writing systems on Wikipedia. If you would like to help out, you are welcome to drop by the project page and/or leave a query at the project’s talk page.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 
WikiProject Typography (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
 
WikiProject Computing / Software (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (marked as Mid-importance).
 

vertical[edit]

What about mixing top/bottom and l/r or r/l on the same page? Somewhere, I think I remember reading something about this... does anyone know the details?

CJK are the only languages I know that can be written vertically. CJK are traditionally written in columns from right to left. This vertical writing is not a requirement, so they are often display horizontally from left to right.
When you purchase CJK fonts, most fonts come w/a vertical variation (name of the vertical variation would start with @ symbol). They are the same glyphs but rotated 90 degrees counterclockwise. It is not meant to be read on screen, because you will have to tilt your head 90 degrees to read it. These rotated fonts are for printing. When you print, the text will come out vertically.
I know there are better implimentation of this in CSS for screen display, but I haven't seen any site using it except for demonstartion. --Voidvector 08:40, Nov 23, 2004 (UTC)
Yokogaki and tategaki touches on the subject of vertical and horizontal writing in CJK (but mostly J). —Tokek 06:37, 25 September 2005 (UTC)

What tools Wiki provide for bi-directional text ?[edit]

I am very interested to know what tools are there for making the edit pages bi-directional. That is to provide a facility in the form of a button, so the user can choose which direction he/she is going to write. Any one knows where or how this can be done? Mehrdad 03:29, 5 May 2006 (UTC)

As far as I know, bi-directional support is handled by the browser (client-side), not wiki server (server-side). If your browser supports bi-directional text, then the text would be displayed correctly, otherwise, no luck.--Voidvector 19:29, 6 July 2007 (UTC)
The statement about the Hebrew text rendering wrong in some browsers seems like incorrect "Original Research" to me. The ordering issue isn't about characters, but text orientation and text within controls. ABC encapsulated in P tags isn't going to render CBA, but is it [ABC ] or [ ABC]. Whether it is CBA is about whether or not you typed CBA or ABC. Secondly within input boxes and others it is about whether when I hit A it goes: [A ] or [ A] and when I hit B does it go [AB ] or [ BA]. This: http://www.i18nguy.com/markup/right-to-left.html is a good explanation completed by this: http://www.i18nguy.com/MiddleEastUI.html. Reboot (talk) 19:09, 23 September 2008 (UTC)

"Bidi" as printing term[edit]

This article mentions the term 'bidi' for bidirectional (of texts that include multiple scripts written in more than one direction). It is also (perhaps only historically now) a term for e.g. dot-matrix printers that can print more quickly by placing ink onto the page whichever way the print head is moving (i.e. not just printing left to right and then "inklessly" moving back to the left-hand side). Since "bidi" redirects here, should there be some sort of disambiguation page dealing with this sense? —Preceding unsigned comment added by 86.131.102.65 (talk) 21:38, 13 November 2007 (UTC)

Unicode logical store vs presentational rendering[edit]

The article says, "In Unicode encoding, all non-punctuation characters are stored in writing order." This is not true: text in the Indic ranges is stored in pronunciation order, with the rendering system responsible for rearranging glyphs as the orthography demands.

Section 9.1 of the Unicode Standard 4.0 says:

The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types: consonant letters, independent vowels, and dependent vowel signs. In a text sequence, these characters are stored in logical (phonetic) order.
(page 220)

Additionally, a few Devanagari characters cause a change in the order of the displayed characters. This reordering is not commonly seen in non-Indic scripts and occurs independently of any bidirectional character reordering that might be required.
(page 220)

The greatest variation among different Indic scripts is found in the way that the dependent vowels are applied to base letterforms. Devanagari has a collection of nonspacing dependent vowel signs that may appear above or below a consonant letter, as well as spacing dependent vowel signs that may occur to the right or to the left of a consonant letter or consonant cluster. Other Indic scripts generally have one or more of these forms, but what is a nonspacing mark in one script may be a spacing mark in another. Also, some of the Indic scripts have single dependent vowels that are indicated by two or more glyph components—and those glyph components may surround a consonant letter both to the left and right or may occur both above and below it.
The Devanagari script has only one character denoting a left-side dependent vowel sign: U+093F DEVANAGARI VOWEL SIGN I. Other Indic scripts either have no such vowel signs (Telugu and Kannada) or include as many as three of these signs (Bengali, Tamil, and Malayalam).
(page 220)

Because Devanagari and other Indic scripts have some dependent vowels that must be depicted to the left side of their consonant letter, the software that renders the Indic scripts must be able to reorder elements in mapping from the logical (character) store to the presentational (glyph) rendering. For example, if Cn denotes the nominal form of consonant C, and Vvs denotes a left-side dependent vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as just shown.
(page 228)

Anyone want to edit the article to allow for the above without straying from the scope of the article? Christian Campbell (talk) 01:57, 30 November 2007 (UTC)

҉[edit]

The section “҉” is not about that Unicode code point. It's about a sort of parlour trick where invisible Unicode characters make text display backwards. It deserves perhaps one sentence in this article, if an actual reference can be found. Michael Z. 2008-05-29 00:54 z

And should we delete the entire article about bi-directional text because no suitable references have yet been found? Your indiscriminate attitude towards this article (and the previous article ҉ ) is detrimental to Wikipedia as a whole. We should encourage the citation of proper references, not the deletion of anything and everything that dare encroach on your "territory" that you deem unimportant. StarburstCreator (talk) 11:45, 29 May 2008 (UTC)StarburstCreator
Would please explain why this is important? That it "it slipped out into the World of Warcraft messageboards, after which it exploded, so to speak" doesn't make it notable. I have never heard of this, apart from your addition to Wikipedia. Since you are taking credit for popularizing this "meme",[1] you appear to be a one-issue editor tooting your own horn (see WP:SOAP).
We should delete trivia with no notability and unsupported by verifiable references in reliable sourcesMichael Z. 2008-05-29 15:15 z
You have never heard of this? Is that now the sole criteria for what does and does not belong in Wikipedia? The majority of article you have contributed to or created are subjects that I have never heard of, and after a quick scan, a large number of them do not cite their sources either. And yet, I am not arrogant enough to assume that because I have no interest in the subject, that it is not notable enough Wikipedia.
And as for being a "one-issue editor", every editor starts out as a one-issue editor. Your clique-ish, discriminatory view towards editors who do not contribute the same vast volumes that you do is detrimental to Wikipedia. As I have stated multiple times, the correct attitude is to improve on articles that need citation, not delete every article that you don't find engrossing. Otherwise, Wikipedia would consist solely of Cyrillic miscellanea. StarburstCreator (talk) 17:25, 29 May 2008 (UTC)StarburstCreator
The Internet meme thing is possibly of relevance to articles on internet memes or possibly some modernization of ASCII_Art#Unicode. It just isn't of great relevance to an article about bidirection text. It depends on how widespread it is. (Which I cannot say). I can say with great authority it is not in at least 99.99999% of all bidirection text printed or electronic. Therefore it is irrelevant to this article on both policy and logical grounds. There really should be a master article on electronic text art and references to unicode art and ascii art and ebdic art or whatever and links down from there. Perhaps you can contribute it in those places? Reboot (talk) 19:57, 25 December 2008 (UTC)

Strong vs Weak characters[edit]

The article confuses 'weak' (numbers, mainly) with 'neutral' (punctuation and symbols) characters. The specification is here: http://unicode.org/reports/tr9/

Weak characters are sequenced together, but don't affect neutrals and can be embedded in runs of strong characters without breaking those runs. Neutral characters are sequenced within runs according to the direction of that run, or by the overall document order when they sit on the edge of a run. Iamcal (talk) 22:36, 3 March 2009 (UTC)

Unicode for "TM" seems wrong[edit]

The Unicode U+8482 for symbol TM as mentioned seems wrong.

I convert it to decimal (3392), and use "&#3392 ;" in a html file, then it display an unknow symbol (ീ) in my browser.

AFAIK, both U+E150 and U+E143 are for TM symbol.

For example, H E150 == D 57680, and "&#57680 ;" display as: .--Lovelywcm (talk) 00:09, 3 June 2008 (UTC)

8482 is the correct decimal code for TM, hexa being 2122. E143 and E150 are in a 'private use' section, meaning that there is no guarantee that these would display this sign in all situation. Clpda (talk) 20:02, 12 September 2008 (UTC)

Top-Heavy article[edit]

This article is quite good in terms of content. Thank you to all those involved in writing it. But it is a bit top-heavy. Most of the content is in the introductory paragraph, before the table of contents. The opposite should be true. Introductory sections are important, for example for mobile platforms such as cell phones and PDAs where the rest of the article might not be loaded until explicitly requested. But they should be a few paragraph long rather than an entire screen. May I recommend then to summarize the current content into a couple of paragraphs and move the rest under the Table of Content, possibly breaking it down in a couple of subsections? -- manu3d (talk) 12:08, 17 September 2008 (UTC)

category for languages written in right-to-left scripta[edit]

see bugzilla:000745#c4. Best regards ‫·‏לערי ריינהארט‏·‏T‏·‏m‏:‏Th‏·‏T‏·‏email me‏·‏‬ 22:31, 25 October 2009 (UTC)

TM-example not convincing[edit]

The Tm-example (trade mark sign) is not convincing, since the single character wil not change an 'order' with vs. without the formatting character. -DePiep (talk) 08:53, 14 September 2010 (UTC)

Web standards[edit]

In analogy to the Unicode section, there should be a Web standards section. AFAIK there are the CSS unicode-bidi and direction properties, the HTML dir attribute and the HTML bdo and bdi elements.--88.73.48.106 (talk) 12:51, 12 November 2011 (UTC)

Merger proposal: RLM into LRM[edit]

Proposed: merge bidi Right-to-left mark (RLM) into LRM article. See: Talk:Left-to-right mark#Merger proposal: RLM into LRM. -DePiep (talk) 21:45, 28 February 2012 (UTC)

Eastern Arabic Numerals[edit]

Editor BIL added the line, Eastern Arabic numerals are written left-to-right, and if they are considered part of Arabic script, then Arabic script can written in both directions. The problem with this statement is the hidden Western bias in assuming that numbers written in the decimal system are "properly" read from the element with the highest order of magnitude to the one with the lowest. There's no particular mathematical reason for this convention, and in fact the linked article on Eastern Arabic numerals explicitly states, Numbers are traditionally read with the smallest element first (e.g., "four-and-twenty" instead of "twenty-four"). Written numerals are arranged with their lowest-value digit to the right, with higher value positions added to the left. This is identical to the arrangement used by Western texts using Western Arabic numerals, even though Arabic script is read from right to left. Thus this is not an example of bidirectional text, and I have accordingly removed it from the article. JudahH (talk) 14:18, 14 March 2012 (UTC)

One can read (speak) "four-and-twenty", and still write "24" (in Eastern Arabic: ‭٢٤‬, correct?). In bidi text, the writing is what it is about. So, since the numeric value "24" is written →"24" or →"‭٢٤‬", (presenting the numeric decimal value 24), it is a L-to-R direction. In surrounding Arabic regular text (R-to-L), this an alteration, and therefor is bi-directional writing. The text could be back in then. -DePiep (talk) 23:39, 14 March 2012 (UTC)
What makes you say that the numeric value "24" is written →"24"? Is there something about the number that implies an order? Using "24" to denote "two tens and four ones" (or "four ones and two tens") implies a correlation between "left/right" and "larger order of magnitude/smaller order of magnitude"; neither of these inherently implies anything about "first/second". JudahH (talk) 00:59, 15 March 2012 (UTC)
Since we are talking EA numerals, well that is what the article (and its picture) says: "Written numerals are arranged with their lowest-value digit to the right" (i.e. the four in 24). So, it is the decimal-radix system we know well, only the individual digit faces (glyphs) are changed: 2=٢. From there on: of course within a decimal number the order is important. It says which is the tenners (2) and the one'rs (4). The counted value of "24 sheep" would change if it is reversed. In Unicode this is shown by the character properties "Numeric type" and "numeric value", which is "Decimal" and "2" in both examples. (If it were a fancy numerical system like Roman Numerals, they would not be "Decimal"). So far about the numbering system.
Now there could be a EA numbers writing rule that says: "write the sequence R-to-L" to describe our 24 sheep: ‮٢٤‬←. If this rule exists, I would be wrong: because then between Arabic text the number does not change diretionality of the text, the whole would be single-directonality. I have not found this rule, and the article doesn'says different.
Now there is one big confusing aspect that makes bidi writing so difficult to understand: it is not about the "speaking direction". That does not exist: a radio transmitting say in Arabic and then in German will produce one single "direction" (time sequence): second 1 to 2 to 10 for each language. Even browser technology uses, confusingly, "Arabic language" to get a R-to-L effect, instead of "Arabic script". To add to this confusion ;-) Your example of spoken language translates nicely into German: "Es gibs vier-und-zwanzig (24) Schafe". So the pronounciation says "four" first, while the actual number is not changed in translation (we are still looking at the same flock of sheep). And the "vier" and "zwanzig" are written in straight L-to-R order (no bidi in German then, all is L-to-R). -DePiep (talk) 10:05, 15 March 2012 (UTC)
[O]f course within a decimal number the order is important. It says which is the tenners (2) and the one'rs (4):
With all due respect, I think you're missing my point, DePiep. Within a decimal number, left-right orientation is important, saying which place is the tens, which is the ones, etc. I'm saying that left-right orientation is not the same thing as "order". In "24", the rightmost digit stands for the ones' place, the next-to-rightmost digit stands for the tens' place. Which of those digits is written first? Well, that depends on whether you're writing from right to left or from left to right, doesn't it?
Here's what I think the key question is: would an Arabic writer setting down the number "24" (or "٢٤‬") naturally switch the direction of his writing to write the number from left to right, setting down the 2 first, or would he continue going from right to left, setting down the 4 first? Not being an Arabic speaker, I don't know the answer to that, but I don't see a reason to assume that he would change directions when he got to numerals.
You and BIL have mentioned text editors and Unicode, which may be a different story. If most (or all?) text editors using Unicode treat (Eastern) Arabic numerals as having a left-to-right orientation, I can see how that might be worth adding to the article, but in that case, I think it should be made clear that we're talking about text that's treated as bidirectional in a standard text editor, as opposed to something that's inherent in Arabic script. JudahH (talk) 16:21, 16 March 2012 (UTC)
OK, so it is about entering sequence (typing sequence). I don't do R-to-L scripts (Hebrew, Arab), but if I'm correct this is how it works in a modern browser: of one enters Arabic text, andthen starts typing a (decimal) number, one types →24 and the browser puts them in the right order. So in the process, one sees the "4" being put in place between the earlier text and the (already present) "2". Then entering a space and more regular Arabic text, this is added to the left of the "24". I do not think that one has to reorder the sequence yourself, nor change any direction. The browser does that for you. The same can be said about entering telephone numbers, right? One spells, reads, phone-typing-enters, and writes telephone numbers always in one same sequence. Even in handwriting. Since they are decimal numbers (could be EA?), the browser knows how to handle that. -DePiep (talk) 17:07, 16 March 2012 (UTC)
Your last point here: I am not that sure about reentering the text in the page (don't know if it is in the right context and the 2nd part is dubious), but I do think it is correct. -DePiep (talk) 17:07, 16 March 2012 (UTC)
This is mainly a technical article, not so much a lingustic one. If the Eastern Arabic numerals are entered through keyboard and stored left-to-right, then they are considered left-to-right from technical point of view. In my editor, an arabic text with an Eastern Arabic number in it is stepped through right-to-left, except that the number is stepped through left-to-right. --BIL (talk) 11:06, 15 March 2012 (UTC)
Agree, bidi text is a technical topic. Although the page title does not say so, bidi writing is mainly an Unicode issue, which browsers use as a standard. Unicode describes & defines it in UAX#9 "Unicode Bidirectional Algorithm". One of the first things to know: the logical sequence of characters, aka memory representation, is allways L-to-R and does not change (most likely that is also the sequence of entering characters, and the sequence that is saved in a file). The browser does this: it adds a presentation sequence of characters parallel to that memory sequence. In that sequence, all the bidi-rules are applied (e.g. an Arab text string is inversed, R-to-L). Then this presentation sequence is fed to the screen for showing. Still, the memory sequence is unchanged.
I thought I was supporting part of your edit (although I cannot understand or support the part that says "Arabic text may be L-to-R too").
If you mean to say "keyboard entering sequence can be L-to-R", this is confusing (and wrong): the entering sequence is a time sequence (and is put directly in the memory sequence I mentioned). Just think of what you would type with your eyes closed, or when spelling it through a telephone: would you have to care about direction? No. As for your what you see in your edit screen: that is the browser (or text editor) at work: it already has done its job getting the presentation sequence right, and shows it. -DePiep (talk) 11:30, 15 March 2012 (UTC)

Article title[edit]

Wikipedia has an article entitled "Bidirectional" with no hyphen, and that spelling seems consistent with predominant usage. Should the title of this article be modified accordingly? Matchups 14:18, 25 August 2014 (UTC)

Agree. Both Unicode [2] and W3C (following Unicode) [3] do not use the hyphen (and no capital either, btw). These two are the major standards for electronic text processing. -DePiep (talk) 15:24, 25 August 2014 (UTC)