Circular Cruises/You Could Look It Up

From Eccentric Flower

«Circular Cruises

File:Cruises_sidebar.gif

21 May 2009 - In order to fully appreciate this essay you must know the following:

Aussie was an online friend who kept an online journal.

HotBot and Altavista, O Best Beloved, were search engines B.G. (Before Google). Other similarly defunct web-search spots mentioned here will be obvious in context.

The main section of this website was once called "Alewife Bayou." (And incidentally this was written B.G. so no one had coined Googlewhacking yet.)

The rest of it is a mite dated, but the gist still stands as written. Merriam-Webster have apparently backed down on their plans to produce a Fourth Unabridged. My comment about M-W selling a dictionary plug-in has come true in a then-unexpected way; I use their search tool in my web browser all the time. As for the final sentence, it took Google to make everyone else see it, and I believe I should be awarded a prize for foresight.


You Could Look It Up

5 December 1997


Not too long ago, I felt that there was but One True Dictionary. Any of the dictionaries Merriam-Webster (the heirs of Noah) put out were excellent, Webster's Third Unabridged was the archetype of those, and every other company's dictionaries were useless.

Then I started hanging out with a gentleman who makes crossword puzzles for a living. You want to find someone who knows something about dictionaries, a puzzle creator is the person to ask. He has heaven alone knows how many dictionaries, reference books, et cetera.

He showed me a book about the controversy which was caused when Webster's Third appeared, in 1961. Now, to understand this, you need a little history.

Webster's Second had been regarded as the American dictionary - the undisputed authority, best in its class. Some will say it was the best English-language dictionary in the world at that time. I don't know; I wasn't there. The point is, it was (and is) venerated and well-beloved.

Webster's Second, originally published in 1934, was an upgrade to Webster's First (1909) in that it added more entries, but didn't alter the format or methodology much. No one had any reason to think that Webster's Third would be any different - more entries, same successful formula.

To many people's horror and dismay, when Webster's Third appeared, it had switched from being a prescriptive dictionary to a descriptive one. What's the difference? A prescriptive dictionary says "this is the correct way to use this word." A descriptive dictionary says "most people use the word this way, some people use it this way, a few people have been known to use it this way - you decide for yourself."

I myself feel that a dictionary is a document of This Is The Way The Language Is Now, and is not meant to be a usage guide - but I recognize that for many people, the dictionary is the authority they use to determine whether they're making a word error or not.

At any rate, Webster's Third caused a big commotion. People wrote scathing public commentary and hate mail to Merriam-Webster. Nero Wolfe burned his copy. In the process of all this thrashing, though, people were forced to take descriptive dictionaries seriously for the first time.

Now very few people know there was ever a controversy at all ... but despite the widespread acceptance of Webster's Third, a descriptive dictionary cannot really perform the duties of a prescriptive one. People who are serious about their dictionaries often have both, if they can afford it. (Webster's Third isn't cheap; Webster's Second not only isn't cheap, it's nearly impossible to find unless you visit used and rare book stores habitually.)

The odd thing is that people who are serious about their dictionaries will also have other general-purpose dictionaries - prescriptive or descriptive. I asked my friend: If you have one general-purpose dictionary which suits your needs (prescriptive or descriptive flavor), why buy another of the same type?

That is how I learned that dictionaries have distinctive styles. Have to have, in fact; if a dictionary were to duplicate another's definition, word for word, the cry of copyright infringement would go up. No two dictionaries define the same word the same way. In some cases, there is only so much latitude you can achieve; in other cases, the definitions are so different that you may wonder if both books are talking about the same word.

Different dictionaries, then, are better at different jobs. Sometimes you need a specialty dictionary, for medical or professional terms, for slang words (my friend adores slang dictionaries, and there are many), for obscure words, and such. Sometimes you don't need a dictionary at all; you need a thesaurus, which answers the question "Find me other words that can serve the same purpose as this word." (If you don't see yourself ever asking that question, then you've never written an essay like this one.)

For each job, there is a tool.

- - -

I read a column from dear old Aussie today which commented on the uselessness of Yahoo's web categories. I didn't write all the above just to refute her - her column made me think, that's all, and this is what bubbled out.

Web search engines are tools I work with a lot. I have a number of generalized and specialized reference books, but I don't use them half as often as I use search engines. And, as with the dictionaries, there is a proper tool for each job.

When one is searching the web for specific words and phrases in web pages - when one wants to look at the contents of web pages in a literal, analytic way - the spider engines like HotBot and AltaVista are the way to go.

These are "spiders" or "robots" because they go out and automatically find new web pages. Every time they find one, they take all the words in that page and add them to an ever-growing index. This means that you can theoretically search the whole web for a very small phrase, even one word - but it also means that there is, to put it politely, an information filtering problem - meaning that you get back way too much information to be useful.

Searching using these engines is at its most useful when you have a unique key phrase - some words which you know are likely to occur in that order only in the documents you want. For example, it is not very likely that other documents contain the words alewife and bayou, right next to each other, in that exact order. It's an odd juxtaposition, which makes it a good search (were you to want to find this set of pages).

Many people's frustration with engines like this is due to their not understanding the peculiar language of the search - they'd just type in alewife bayou, not understanding that to the search engine, this will (usually) be construed as "containing both the word 'alewife' and the word 'bayou,' but not necessarily next to each other."

Meanwhile, hierarchical listings such as Yahoo's category portion or Excite's category portion are checked and categorized by humans. These are more like thesauruses (thesauri?) - they're useful for saying, "I know what kind of thing I want, but I can't give you any specific examples," or (since in both cases the search engine portion feeds into the category listings) "I've found one site of this type by looking for specific text - can you show me others of the same type?"

The problem with category lists, as Aussie pointed out, is that you have to get inside the mind of the categorizer in order to be able to use them well. Or, using big words: All taxonomies are somewhat arbitrary. If you file things into pigeonholes, sooner or later the person doing the filing is going to have to make a judgement call about where to put something.

(What is Alewife Bayou? It's all written by one person - is it a "personal page?" The Mining Company thinks so. Is it opinion/commentary? Yahoo thinks so. Is it humor? Is it art? Is it tofu? Categories are tricky.)

The point is that there is no One Best Search Engine, just as there is no One Best Dictionary. However, you are certainly permitted to have a personal favorite.

- - -

After leaving the issue in doubt for several years, the Merriam-Webster people announced this year that there will eventually be a Webster's Fourth. Date estimates vary, but the Merriam-Webster comments, plus the usual thirty-to-forty year gap, seem to imply sometime around 2000.

It remains a matter of speculation whether a print version of the dictionary will actually appear - and whether it will sell if it does. Last year sales of encyclopedias in electronic format (CD-ROMs) outpaced sales of paper encyclopedias for the first time. Granted, it's still faster to use a paper dictionary, so they may not end up on the same road.

(What if the dictionary were part of the system? What happens when Merriam-Webster sells a dictionary plug-in for your personal computer?)

Sooner or later, these arguments about differences in dictionaries and differences in search engines will merge, in the process raising a new set of issues.

Having once or twice dabbled at writing search engines, I know that it is not unheard-of for a customer to say, in effect, "I love the data, but I hate the search engine," and reject a body of information for this reason. Soon this will be a much more common occurrence.

Ideally, one would be able to divorce the engine from the content - say "I'd like to search AltaVista's database, but use someone else's search engine."

My professional opinion: Don't hold your breath waiting. If information is wealth, then the search engine is the door to the vault.


Copyright © December 1997. All rights reserved.

Personal tools
eccentric flower
fiction