13 abril 2006

Are there really 988,968 words in the English language?

How many words are there in the English language?

No one knows. But that hasn't stopped an operation known as the Global Language Monitor from proclaiming that—as of this writing—there are exactly 988,968 words in English. GLM has done a remarkable job suckering even the respectable press into believing that we're on the verge of adding the millionth word to English—at which point we'll presumably see another flurry of articles about GLM. Even so, its claim is a bogus one.

The problem with trying to number the words in any language is that it's very hard to agree on the basics. For example, what is a word? If run is a verb, is the noun run another word? What about the inflected forms ran, runs, and running? What about words with run as a base, such as runner and runnable and runoff and runway? Are compounds, such as man-bites-dog, man-child, man-eater, manhandle, man-hour, man of God, man's man, and men in black, to be counted once or many times?

Another question: What is English? The word veal, borrowed from French in the 14th century, seems to be English, as does spaghetti, a 19th-century Italian borrowing. But what about pho, a Vietnamese soup found from the 1930s but only recently common? Or the yet-more-recent banh mi sandwich? What about shurpa, a Bukharian soup, which can apparently be eaten in New York? What about words used by non-native English speakers in Singlish?

Even sticking with something that we can agree is English, what about obsolete words? Variant spellings? Regional dialects? What about words that are widespread, but only in a highly limited subgroup, such as bone, "a pre-1946 Martin guitar made of Brazilian rosewood having herringbone purfling on its top," GAS, "to ardently desire to purchase guitars" (from 'Guitar Acquisition Syndrome'), or hog "a guitar having a mahogany top, back, and sides," used among collectors of vintage guitars?

What about Frizzie, "student of Ms. Frizzle" or busigator, "the Magic School Bus transformed into an alligator," in the books I'm reading to my daughter? What about Giant, "a player on the N.Y. Giants football team"? The most comprehensive abbreviations-dictionaries include about 500,000 entries, most of which wouldn't be found in standard dictionaries. The American Chemical Society has a registry of over 84 million named chemical substances, and there are about a million named species of insects alone; surely these must count as words?

What about obvious forms? Dictionaries include great-grandfather but not great-great-great-great-great-great-grandfather, which is real enough to get over 3,500 Google hits. Only the most basic numbers are typically included; Merriam-Webster, for example, includes twenty-one and twenty-two, but not twenty-three or thirty-one. In fact, if you were to count every number between 0 and 999,999 as a word, you'd have a cool million right there—and still have the rest of the English language to account for.

At the other end of the scale, estimates of the number of words that an average person uses range from a few thousand (the number a person might actively use in a week) to many tens of thousands (the number an educated person might understand) or more. College-size dictionaries typically include almost 200,000 words (using a formula that counts each separately listed word or word-form); unabridged dictionaries from 300,000 to 600,000 or so. But each of these words is listed not for any intrinsic reason, but because a lexicographer decided it was useful to include. Twenty-three is just as real a number as twenty-two, but it doesn't have a common bullet caliber associated with it, so it often gets left out. Team names, as a class, generally fail to make the cut. We could always add words to the dictionary if there were no limitations on time or space.

So, where does that leave us? It's probably possible to devise criteria that would allow us to conclude that there are about a million words in English. (The dictionary publisher Merriam-Webster goes for "roughly 1 million words" in its discussion of this particular question, although elsewhere, they suggest that the figure could be many millions.) But there's no possible way to count the actual number of words in the language, and the idea of having a running counter, as is found on GLM's home page, is absurd. So, why have journalists fallen for the claim? I think it's the pseudo-scientific nature of GLM's "methodology": The company claims to use an "algorithm" called the "Predictive Quantities Indicator," so its figures must be right. According to the company's Web site, though, the PQI's count of English words is based on the entry list of a number of major dictionaries, so from the outset we know we're just getting a summation of lexicographers' judgment calls—including scientific, obsolete, and dialect forms—rather than an accurate, independent analysis of current English. Still, it sounds impressive to some. I recently got a call about GLM from a reporter, and when I explained why the million-word claim is bogus, he practically shouted, "But they have an algorithm!"

And they'll have a good party this summer, for the credulous among us.

Slate

Sem comentários: