Wordcount

Wordcount Home Page

Wordcount Home Page

I found a site on the Internet called “Wordcount.” It describes itself as an interactive presentation of the 86,800 most frequently used English words. It ranks the words based on usage – the more a word is used, the higher its ranking. The word usage data come from something called the British National Corpus. The BNC is a one hundred million word cross-section of current British English, both written and spoken. It contains a broad range of samples of spoken and written language from the later part of the twentieth century.

The written part of the BNC comprises 90% of the corpus. Among many other forms of text it includes extracts from newspapers, journals and periodicals for all ages, text books, fiction, and essays written at schools and universities. The oral 10% comes from radio shows and phone-ins, formal meetings and informal conversations recorded by volunteers, etc.

Wordcount presently includes the 86,800 words that are used at least twice in the BNC. In the future, the site claims that wordcount will be modified to sample any chosen text or website, and eventually the whole Internet. I doubt that because the site seems to be stuck in time.

But let’s try it out anyway. First, let’s check whether Green Comet’s leading man is on the list. He is. “Elgin” comes in at word number 28,411. It is preceded by “lichens” and followed by “joystick.” I don’t think we can read anything into that. How about Elgin’s beloved? “Frances” appears at position 9,860. It is preceded by “excuses” and followed by “dusk.” As for this blog, “green” is number 671 and “comet” follows at a distant 16,896. “Green” is preceded by “planning” and followed by “students,” while “comet” is between “stafford” and “pol.”

The last word in the Wordcount archive, coming in as the 86,800th most used word, is “conquistador.” It’s preceded by “recrossed,” “workless,” “Carniola” and “tangency.” Carniola is a mountainous region in southwestern Slovenia. For the most used words we’ll do ten of them because they’re smaller, starting with number ten, “was.” Number nine is “is.” Eight is “it.” Seven, “that.” Six, “in.” Five, “a.” Four, “to.” Three, “and.” Two, “of.” And the number one word in the archive, the most used word in British English, is “the.”

Strangely, the word “wordcount” is not in the Wordcount archive.

rjb

About arjaybe

Jim has fought forest fires and controlled traffic in the air and on the sea. Now he writes stories.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

4 Responses to Wordcount

  1. emmylgant says:

    The or duh?
    How about the ranking for like, you know?
    Fun piece to read… Still I wonder about the usefulness of such research. Must be a lack of imagination on my part.

  2. arjaybe says:

    Curiosity, I think. I understand that, being a very curious person myself. And I mean that in every sense of the word.-)

    rjb

  3. Laird Smith says:

    This goes to show you that there is no thing that has not been researched, that is no thing that we are aware of.

Please let us know what you think. No registration required.