Bush hid the facts

From Wikipedia, the free encyclopedia

  (Redirected from Bush Hid The Facts)
Jump to: navigation, search

Bush hid the facts (sometimes also this app can break) is the common name for a bug present in the function IsTextUnicode of Windows NT 3.5 and its successors, including Windows XP, which causes a file of text encoded in Windows-1252 or similar encoding to be interpreted by applications that use it (such as Notepad) as if it were UTF-16LE, resulting in mojibake. When "bush hid the facts" is put in a new Notepad document and saved, closed, and reopened, the words "畢桳栠摩琠敨映捡獴" appear instead.

While "Bush hid the facts" is the sentence that is most commonly presented on the Internet to induce the error, it does not exclusively occur with that phrase. The bug can be triggered by many sentences with alphabetic characters and spaces in a particular order (4-space-3-space-3-space-5), (4-space-5-space-3-space-5), and (1-space-4-space-3-space-3) as well as other combinations that can be parsed into valid (if nonsensical) Chinese characters in Unicode.

The bug occurs when the ANSI string is passed to the Win32 charset detection function IsTextUnicode with no other characters. Because of this bug, IsTextUnicode will return TRUE, which means that applications that uses it will incorrectly interpret it as UTF-16LE. For example, if you load a text file with the string into a text editor that uses IsTextUnicode, the text will be displayed as nine {4-3-3-5}, ten {4-5-3-5}, or seven {1-4-3-3} Chinese characters—or rectangles if the language pack has not been installed. To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select ANSI in the "Encoding" list box, and click Open.

[edit] Discovery

The bug appeared for the first time in Windows NT 3.5 but was not discovered until early 2004[1] and has since risen in popularity on the Internet.[citation needed]

Clearing the content by selecting, cutting and then pasting back the text does not prevent reproduction as long as it is carefully done.

Because of this bug in IsTextUnicode, Notepad misinterprets the encoding of the file when it is reopened. If the file is originally saved as "UTF-8" rather than "ANSI" the text displays correctly - this is because Notepad prepends the UTF-8 byte order mark, which is a different pattern that does not trigger this bug. The bug is also avoided by saving as "Unicode", which really saves as UTF-16.

Older versions of Notepad such as those that came with Windows 95, 98, ME, and NT 3.1 do not include Unicode support so the bug does not occur.

Many text editors and tools exhibit this behavior because they use IsTextUnicode as well.

This bug does not occur in Windows Vista and Windows 7 because their version of IsTextUnicode is fixed and does not show this bug.

[edit] External links

[edit] References

  1. ^ Cumps, David (February 27, 2004). "Notepad bug? Encoding issue?". #region .Net Blog. http://weblogs.asp.net/cumpsd/archive/2004/02/27/81098.aspx. Retrieved February 15, 2009. 
Languages