Noisy text
| This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (August 2008) |
|
|
This article contains weasel words: vague phrasing that often accompanies biased or unverifiable information. Such statements should be clarified or removed. (August 2008) |
Noise in text can be defined as any kind of difference between the surface form of a coded representation of the text[disambiguation needed
] and the intended, correct, or original text.
Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this text used in such discourses.
Gartner estimates that unstructured data constitutes 80% of the whole enterprise data. A huge proportion of this unstructured data comprises chat transcripts, emails and other informal and semi-formal internal and external communications.
Usually such text is meant for human consumption. However, now with huge amounts of such text being present, both online and within the enterprise, it is important to mine such text using computers.
[edit] Techniques for correction
There are many spell checkers and grammar checkers available today. Many word processors like MS Word include this in the editing tool. Online, Google in its search interface tries to include a correction engine to guide users when they make mistakes with their queries.