Module 3: Finding and Evaluating Information on the World Wide
Web
Tips for Evaluating Search Engine
Hit Lists
Search engines provide a fast and easy--perhaps too easy--way
to find materials on the web. But no search engine is able to evaluate
and make judgments about the materials it finds in the same way
that human users can.
The following questions and discussions can help you to understand
what you see in a search engine hit list and to make some critical
judgments from what you see. These questions are arranged roughly
from general to specific; from those you should apply to all or
groups of items listed to those you must apply to each item individually.
Why are these items listed in the hit list?
If all the words or phrases in your search query appear in the
Title of a web document--and, in particular, if they appear in exactly
the same way in the Title, then these documents will be listed.
Unfortunately, the Title of a web document is not necessarily
what most users expect it to be. For instance, the Title of this
document--that is, the Title that a search engine would "see"
if it looked at this document--is "CAD Center for Academic
Development | Al Akhawayn University," and nearly all of the
documents on the SSK 1203 web site (and throughout the CAD web site)
have this same Title. This Title appears across the top of the browser
window when users go to this web page, but it does not appear
on the document itself. Also, this Title is set by the author/creator
of the web document (in what is called a meta tag) and it can affect
whether a document shows up in a hit list.
The words or phrases in your search query may appear anywhere within
web documents, and they will also be listed. Here are just a few
places where search engines may find search query words or phrases.
You need to judge how each of the locations for finding the search
text will affect the usefulness of an item:
Actual title (as it appears on the document itself), subtitle,
heading, subheading of a document
First paragraph or introduction of a document
Anywhere in the body of a document
Last paragraph or conclusion of a document
Meta tags (like the Title, these are "hidden" areas
of a document where the document's creator can store key words
for search engines to find)
Links to other web documents (which may or may not be on the
same web site)
The document's URL
Also, depending on the way in which you have phrased your query,
the search engine may list documents that have only one, or some,
of the words or phrases you have included. And, of course, these
words or phrases may be in very different places--and widely separated--within
the document.
Finally, remember that search engines can not distiniguish among
words that are spelled the same but have many different meanings.
For example, "right" may refer to the opposite of left,
a political stance, a legal obligation, something correct, and so
forth. Usually, only a human user can make this distinction from
the context in which the word appears. Read carefully to decide.
NB: If you can not immediately tell from the hit list why
an item is there, that does not mean it will be either useful or
useless to you. You will need to investigate further before you
are sure.
Why are these items listed in this order?
The order in which items appear in a search engine hit list depends
on how that search engine ranks items or establishes their relevance.
In general, the closer an item comes to accurately matching the
search query you put in, the higher the item will be in the hit
list.
However, some search engines also use other factors in their
ranking. For instance, some search engines note how many other web
pages link to a specific document, and use this in calculating their
rankings. (The logic is that, if many pages link to a specific document,
it must be of higher value than a document that is not linked to.)
Finally, some search engines return results that include "sponsored
links." These are links to materials or sites that may be
useful. However, the owners of the linked materials have usually
paid to get their materials listed--either in a higher location
or even a special window--in the hit list.
NB: If you can not immediately tell from the hit list why an item
is listed in the order in which it appears, that does not
mean it will be either useful or useless to you. You will need to
investigate further before you are sure.
What alternative words, phrases, or ways of searching do these
items suggest?
Look carefully at hit lists to find synonyms, related terms, more
specific--or more general--terms or phrases that you could search
for.
Also, in some cases, the best way to identify words or phrases
that you want to avoid or exclude in a search is by
analyzing a hit list. These may be words or phrases that occur with
one meaning of a word--a meaning that you are not searching
for.
What does the linked text tell me?
Most search engine results pages identify individual items with
some text, which may be the document's title, that creates
a hyperlink to the document. Remember: This linked text may be a
descriptive title of the document's contents, or it may not
be.
What does the "snippet" tell me?
Most hit lists give a short piece of text (called a snippet) from
the document. The snippet may come from anywhere within the document,
and it may tell you much information or nothing. Here are just a
few situations to consider in making your judgement based upon a
snippet.
If the search terms or phrases appear in the linked text (the document's
title), but they do not appear anywhere in the document itself,
the snippet you see might be a part or all of the first sentence
(something beginning with a capital letter and ending with a period)
of the document. The search engine's logic is that this sentence
is likely to tell users something about the content of the document.
Only a user can judge if this is true, and if the document is helpful.
If the search terms or phrases appear in the linked text (the document's
title) and they appear elsewhere in the document itself,
the snippet you see might be a part--or parts--of one or more sentences
where the search terms are found. The search engine's logic is that
showing users the words in their context will help them determine
if the document is worthwhile.
If the search terms or phrases appear in the linked text and
in the exact same way and in several other locations throughout
the document, then the snippet may show you several examples of
this "exact match" so that you will be aware of the frequency
of the matches. (This situation may not, however, increase
the ranking of this item in the hit list! And it may simply show
you that something you are not interested in appears many times
in the document!)
Finally, remember that a snippet can only show a very small context.
Before you can decide that a document is exactly what you
are looking for, you will need to examine it in more detail.
What does the URL tell me?
The URL or address of a web document can reveal some information
about the source of the document, which, in turn, can help users
make some predictions about the usefulness of the document. But,
you must know how to read a URL to get this information. Here is
an explanation, based on the URL for this web document:
Type of transfer protocol; that is, the way it
is "shipped" to users
Not much; most documents are retrieved using http://
or ftp://, so there is not much to tell from this.
mail.alakhawayn.ma
Domain name of the location where the document
is stored. These are read from right to left, with the right-most
being the top-level domain.
The top-level domain, .ma,
indicates that this document is stored in Morocco. (Some top-level
domain names indicate the type of activity that the owner of
the domain engages in, or the physical location, or both.) The
alakhawayn indicates
it is at AUI. (User must know, however that AUI "owns"
the domanin name alakhawayn.ma
as well as the domain name aui.ma
to predict this.) The mail
indicates it is on a server named "mail," and that
there are likely to be other servers at AUI with other names.
(Yes, there are others, but not all of them contain web documents.)
~A.Cads
Username of the document owner; that is, the userid
of the person/group that controls the space on the server in
which this document is stored. (Usernames on the mail.alakhawayn.ma
server contain a "first initial" followed by a period
and a "last name." In this case, the user is not an
individual, but the username was created to fit these rules
for usernames.)
The document resides within a web site "owned"
by a user named A.Cads who is granted space on this server.
Only some URLs reveal this information, but the appearance
of the tilde (~) is a very strong hint that what follows identifies
an individual user on a server. In this case, the "user"
is actually the Center for Academic Development.
/1203/READINGS/
M3READ/M3_hit_lists.htm
Path to the specific document
The document is stored inside a series of folders
in the CAD area of the server. The file extension .htm
(or .html) indicates
a document saved in hypertext markup language--the standard
for web documents. (An experienced user of the web would correctly
predict that there might be many other documents on this site
because it has so many folders and subfolders to organize them.
But the number of documents does not necessarily give any hints
about usefulness.)
Even very skilled and experienced users, those who have spent hundreds
of hours working with search engines and web sites, may not be able
to accurately predict the usefulness of any given document by taking
apart its URL. However, this information does help them to
create their own human ranking of the results delivered by
a search engine.