Bush hid the facts
From Wikipedia, the free encyclopedia
Bush hid the facts (sometimes also this app can break) is the common name for a bug present in the function IsTextUnicode of Windows NT 3.5 and its successors, which causes a file of text encoded in Windows-1252 or similar encoding to be interpreted by applications that use it (such as Notepad) as if it were UTF-16LE, resulting in mojibake.
While "Bush hid the facts" is the sentence that is most commonly presented on the Internet to induce the error, it does not exclusively occur with that phrase. The bug can be triggered by many sentences with alphabetic characters and spaces in a particular order (4-space-3-space-3-space-5), (4-space-5-space-3-space-5), and (1-space-4-space-3-space-3) as well as other combinations that can be parsed into valid (if nonsensical) Chinese characters in Unicode.
Here's the example of a 1-4-3-3 code, type in: (i want you sir), without parentheses, save it, close it, and reopen it. It will do the same thing as the other codes.
The bug occurs when the ANSI string is passed to the Win32 charset detection function IsTextUnicode with no other characters. Because of this bug, IsTextUnicode will return TRUE, which means that applications that uses it will incorrectly interpret it as UTF-16LE. For example, if you load a text file with the string into a text editor that uses IsTextUnicode, the text will be displayed as nine {4-3-3-5}, ten {4-5-3-5}, or seven {1-4-3-3} Chinese characters—or squares if the language pack has not been installed. To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select ANSI in the "Encoding" list box, and click Open.
[edit] Discovery
The bug appeared for the first time in Windows NT 3.5 but was not discovered until early 2004[1] and has since risen in popularity on the Internet.[citation needed]
Clearing the content by selecting, cutting and then pasting back the text does not prevent reproduction as long as it is carefully done.
Because of this bug in IsTextUnicode, Notepad misinterprets the encoding of the file when it is re-opened. If the file is originally saved as "UTF-8" rather than "ANSI" the text displays correctly - this is because Notepad prepends the UTF-8 byte order mark, which is a different pattern that does not trigger this bug. The bug is also avoided by saving as "Unicode", which really saves as UTF-16.
Older versions of Notepad such as those that came with Windows 95, 98, ME, and NT 3.1 do not include Unicode support so the error does not occur.
Many text editors and tools exhibit this behavior because they use IsTextUnicode as well.
This bug does not work in Windows Vista and Windows 7, it is because IsTextUnicode is fixed and it does not show a bug.
[edit] External links
[edit] References
- ^ Cumps, David (February 27, 2004). "Notepad bug? Encoding issue?". #region .Net Blog. http://weblogs.asp.net/cumpsd/archive/2004/02/27/81098.aspx. Retrieved on February 15, 2009.

