Jump to content

Bush hid the facts: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Tag: repeating characters
Line 3: Line 3:
'''Bush hid the facts''' is a common name for a [[Software bug|bug]] present in some [[Microsoft Windows]] applications, which causes a file of text encoded in [[ASCII]] or its superset (such as in a [[Windows code page]]) to be interpreted as if it were [[UTF-16|UTF-16LE]], resulting in [[mojibake]]. When "Bush hid the facts" (without newline) is put in a new [[Notepad (Windows)|Notepad]] document and saved, closed, and reopened, the nonsensical [[Hanzi|words]] "{{linktext|畂|桳|栠|摩|琠|敨|映|捡|獴}}" (''Liu Benrenmotian Touyingjianmeng'') appear instead.
'''Bush hid the facts''' is a common name for a [[Software bug|bug]] present in some [[Microsoft Windows]] applications, which causes a file of text encoded in [[ASCII]] or its superset (such as in a [[Windows code page]]) to be interpreted as if it were [[UTF-16|UTF-16LE]], resulting in [[mojibake]]. When "Bush hid the facts" (without newline) is put in a new [[Notepad (Windows)|Notepad]] document and saved, closed, and reopened, the nonsensical [[Hanzi|words]] "{{linktext|畂|桳|栠|摩|琠|敨|映|捡|獴}}" (''Liu Benrenmotian Touyingjianmeng'') appear instead.


While "[[George W. Bush|Bush]] hid the facts" is the sentence most commonly presented on the [[Internet meme|Internet]] to induce the error, the bug can be triggered by many sentences with characters and spaces in a particular order so that the bytes match the UTF-16LE encoding of valid (if nonsensical) Chinese Unicode characters. Other popular strings are "this app can break", “acre vai pra globo”, and "aaaa aaa aaa aaaaa".
While "[[George W. Bush|Bush]] hid the facts" is the sentence most commonly presented on the [[Internet meme|Internet]] to induce the error, the bug can be triggered by many sentences with characters and spaces in a particular order so that the bytes match the UTF-16LE encoding of valid (if nonsensical) Chinese Unicode characters. Other popular strings are "this app can break", “acre vai pra globo” (Portuguese for acre goes to Globe), and "aaaa aaa aaa aaaaa".


The bug occurs when the string is passed to the Win32 [[charset detection]] function <code>[[IsTextUnicode]]</code> with no other characters. <code>IsTextUnicode</code> sees what it thinks is valid UTF-16LE Chinese and returns true, and the application then incorrectly interprets the text as UTF-16LE.
The bug occurs when the string is passed to the Win32 [[charset detection]] function <code>[[IsTextUnicode]]</code> with no other characters. <code>IsTextUnicode</code> sees what it thinks is valid UTF-16LE Chinese and returns true, and the application then incorrectly interprets the text as UTF-16LE.

Revision as of 02:30, 2 April 2012

Bush hid the facts is a common name for a bug present in some Microsoft Windows applications, which causes a file of text encoded in ASCII or its superset (such as in a Windows code page) to be interpreted as if it were UTF-16LE, resulting in mojibake. When "Bush hid the facts" (without newline) is put in a new Notepad document and saved, closed, and reopened, the nonsensical words "" (Liu Benrenmotian Touyingjianmeng) appear instead.

While "Bush hid the facts" is the sentence most commonly presented on the Internet to induce the error, the bug can be triggered by many sentences with characters and spaces in a particular order so that the bytes match the UTF-16LE encoding of valid (if nonsensical) Chinese Unicode characters. Other popular strings are "this app can break", “acre vai pra globo” (Portuguese for acre goes to Globe), and "aaaa aaa aaa aaaaa".

The bug occurs when the string is passed to the Win32 charset detection function IsTextUnicode with no other characters. IsTextUnicode sees what it thinks is valid UTF-16LE Chinese and returns true, and the application then incorrectly interprets the text as UTF-16LE.

Many text editors and tools exhibit this behavior because they use IsTextUnicode as well.

Discovery

The bug appeared for the first time in Windows NT 3.5 but was not discovered until early 2004.[1] Older versions of Notepad such as those that came with Windows 95, 98, ME, and NT 3.1 do not include Unicode support so the bug does not occur.

The bug exists in all successive versions of Windows from Windows XP through to Windows Vista but not in Windows 7.

Workarounds

Editing the text to not be a pattern that triggers this bug will fix it, for instance adding a new line in the first 20 characters will work.

If the file is saved as "UTF-8" rather than "ANSI" (which in reality means Windows-1252 on systems using western European languages) the text displays correctly, because Notepad prepends the UTF-8 byte order mark, which is a different pattern that does not trigger this bug. UTF-8 without the byte order mark would still trigger the bug, as it is identical to ASCII.

The bug is also avoided by saving as "Unicode", which in Microsoft Windows usually means UTF-16LE.

To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. (Under Windows 2000, Notepad lacks the "Encoding" list box. Notepad2 makes the same error (by trusting IsTextUnicode), and also lacks an option to override encoding when opening a file. However, WordPad opens the text file correctly by default.)

See also

References

  1. ^ Cumps, David (February 27, 2004). "Notepad bug? Encoding issue?". #region .Net Blog. Retrieved February 15, 2009.