|WikiProject Linguistics / Applied Linguistics||(Rated Stub-class)|
- I put it back to a redir. The other thing appears to be a hoax; there are no sources of it that google finds, other than this article. Friday (talk) 05:22, 12 August 2005 (UTC)
Need good frequency source
Many sources have slightly different lists for the most common trigrams. I looked into the reference used for the table given in the article, and I'm not sure it's very good. The link was broken, so I used an internet archive to find it and it said the source of the data is "15000 characters from three documents: The license agreement from Sun for JDK 1.2.1; The teaching philosophy of a computer science professor from a liberal arts college in Minnesota; A letter of recommendation for a national competition for innovative uses of technology in collegiate teaching".
In particular, the license agreement is likely to have many repeated technical words which throw this list off.
My recent edit added frequencies from a site I found, . Looking into the source they use, it seems to be a good bit better, but I wonder if there is a relatively standard list of the most common trigrams. — Preceding unsigned comment added by Jbeyerl (talk • contribs) 19:03, 9 June 2017 (UTC)
- First, I added footnotes (refs) to Rank and Frequency to sort out their sources. Really the same source should be used for Rank and Frequency.
- Second, there can be no standard analysis, per the note I prepended to the table, context is important. Analysis of writing samples drawn from different stages in a single author's timeline will vary, too. But not to be confused with modern writing analysis, where in an author can be fingerprinted with varying levels of certainty— think chapters and even verses of the Bible, or all of Shakespeare's works, where authorship may be in question, or even alterations/ammendments. WurmWoodeT 19:02, 28 July 2017 (UTC)