Searching by Subject
From ChemicalInformationSources
Chemical Information Sources Wiki
Introduction
Almost all abstracting and indexing services, not to mention many other secondary and primary works, have subject indexes. In this chapter we will look closely at the subject indexes for some of the major works already covered, as well as note the existence of specialized abstracting and indexing services devoted to a particular document type and full-text databases of primary and other literature types. Discussion of the type of subject search that uses the name of a specific chemical compound is deferred to a later topic, although words that stand for classes of compounds are discussed here.
The searches dealt with here are word searches. We must often find the right word(s) or group of words (phrases) to pull needed information from a given reference tool. Such searches cover techniques, processes, types of reactions, equipment, etc. The searcher has to be aware of variant spellings, the use of initialisms and acronyms, synonyms, and other complicating factors in such a subject or topic search. In addition, the interpretation that the search system gives to the form in which the search statement is input is critical. For example, does the search system interpret two adjacent words as a phrase that must always have the words in that order? Or does it assume that either of those words could be present in a record in order to be a valid hit?
A fundamental question in conducting a subject search is whether all possible words, including synonyms, acronyms, abbreviations, etc. should be used in a subject search or whether the search can be conducted using a set of preferred terms selected by the indexers of the documents. As computers have become more and more powerful, the techniques of FULL-TEXT SEARCHING have become popular, with every word in a document being a potential subject term. Unfortunately, the number of false drops yielded in this type of UNCONTROLLED VOCABULARY search can be quite voluminous. Therefore, searching with terms selected from the CONTROLLED VOCABULARY of a THESAURUS or other subject term authority list is often preferable. An example is the MeSH Medical Subject Heading List that is used with the National Library of Medicine's Medline database. Another NLM effort that is even broader in its scope is the development of a Unified Medical Language System. Included in that project is the UMLS Metathesaurus. Chemical Abstracts Service uses the Index Guide to control search terms in the printed product, and the CA Lexicon on the STN system (See STNotes #25) shows the underlying structure of the CAS vocabulary control system.
Most keyword searches, such as those in Science Citation Index, impose on the searcher the burden of selecting alternate names, acronyms, etc. for the concept of interest when performing the subject search. For example, Electron Spectroscopy for Chemical Analysis (ESCA) and X-Ray Photoelectron Spectroscopy (XPS) are both names for the same technique. Therefore, a search for all references to the technique in a keyword subject index would force the searcher to use both ESCA and XPS in the search strategy.
The distinction between uncontrolled (keyword) searching and searching using controlled vocabulary is important and is the main point of this lesson, but that distinction is blurred in a tool like SciFinder Scholar. The searcher simply types into the Research Topic search window the natural language expression that defines the search, without even trying to insert Boolean search terms. Also, with SciFinder Scholar, no truncation is used. The SciFinder Scholar search algorithm has some built-in intelligence to look for relevant word forms for the search. For instance, the search system automatically searches for both singular and plural subject words.
Let's see an example of a search on SciFinder Scholar for the analytical technique "Electron Spectroscopy for Chemical Analysis (ESCA)," including results from both the CAPlus and Medline databases. At the time it was run, the search as entered found 4395 references where the two concepts "electron spectroscopy" and "chemical analysis" were closely associated with each other and only 582 where the phrase as entered was found. In this case, let's repeat the search using the acronym for the analytical technique (ESCA) and also use a synonomous acronym, XPS. (The technique is also known as X-Ray Photoelectron Spectroscopy.) We have the option of entering synonomous words in parentheses, following a term or phrase. Thus, entering the research topic search on SciFinder Scholar as:
XPS (ESCA)
would imply to the system that you are looking for synonymous terms (an OR search). This search found considerably more documents: 114,511 at the time of the search on October 3, 2004. Unfortunately, many of the 35,609 records pulled by the ESCA part of the search are false drops that match the word "escape"! Entering ESCA by itself pulls 7516 records with the term "as entered," and it appears that all but the oldest (a 1918 record) are relevant. Thus, the technique of entering synonyms in parentheses must be used with caution on SFS.
Keyword Searches
Let us restrict the phrase KEYWORD SEARCH to the type of uncontrolled vocabulary searching that is done when the terms are not selected from an authoritative subject list. Keyword indexes are often computer-produced indexes that result in every significant word in the document (or in certain fields of the document) becoming a KEYWORD. Such indexes exist in the weekly printed issues of Chemical Abstracts and in the Science Citation Index in its "Permuterm Subject Index". The same is true of the Web of Science subject searches and searches on ingenta. However, Science Citation Index has for a number of years included the capability to enhance the keyword searches using their KeyWords Plus feature.
SCI generates KeyWords Plus terms for many articles. KeyWords Plus are words or phrases that frequently appear in the titles of an article's references, but do not necessarily appear in the title or abstract of the article being indexed. SCI also utilizes keywords that authors sometimes provide in their articles that they feel best represent the content of the paper. Thus, KeyWords Plus may be present for articles that have no author keywords and may include important terms not listed among the title, abstract, or author keywords. All of these keywords are contained in the SCI record and are searchable.
Controlled Vocabulary Indexes: Chemical Abstracts "Index Guide"
One of the virtues of a keyword subject index is that the index terms reflect the current, ever-changing vocabulary of science. As soon as a new name for a concept, technique, etc, is used in a document, it could become an indexing term. Controlled vocabulary lists, on the other hand, are slower to adapt to changes in scientific terminology, but their greatest benefit is that they guide you to the preferred term for the concept. Hence, the searcher need only identify the preferred indexing term to find documents of interest.
The printed tool that controls the vocabulary in the Chemical Abstracts six-month volume and five-year collective General Subject and Chemical Substance Indexes is the INDEX GUIDE. For example, looking in the "E" section of the "Index Guide" for ESCA reveals the following:
ESCA (electron spectroscopy for chemical analysis)
See Photoelectric emission
x-ray
See Photoelectron spectroscopy
x-ray
Likewise, looking in the "X" section of the Index Guide for XPS leads to the same preferred phrases:
XPS (x-ray photoelectron spectroscopy)
See Photoelectric emission
x-ray
See Photoelectron spectroscopy
x-ray
Thus, the searcher would know that documents on this topic can be found in the "P" section of the "General Subject Index" to Chemical Abstracts. It is important to use the CA "Index Guide" before using the "General Subject Index" because there are no "see" references in the "General Subject Index" itself. Furthermore, each five-year collective index period has its own "Index Guide". There is a guide to Hierarchies of General Subject Headings to assist in selecting terms.
Chemical Abstracts Printed Subject Indexes and CA File Subject Searches vs. SciFinder Subject Searches
Prior to 1972, there were five- and ten-year Subject Indexes to Chemical Abstracts. Beginning with the 9th Collective Index period for 1972-76, the chemical name index entries for single chemical substances were put into a new work, the CHEMICAL SUBSTANCE INDEX. Everything else, including names for classes of substances (e.g., ethers), went into the GENERAL SUBJECT INDEX. Thus, searches for terms referring to classes of compounds, reactions, processes, equipment, or plant and animal species should be searched in the "General Subject Index" after the proper term or phrase has been found in the "Index Guide". Another way of finding the proper General Subject Index terms for recent CA entries is to utilize the CA Lexicon on STN. The 15th Collective Index period refers to the years 2002-2006. You must keep in mind that the terminology rules may change from one collective index period to another. For example, the 14th CI Period moved significantly toward the current terminology in various fields, preferring "DNA" to the previous "Deoxyribonucleic acids" and "Drugs" to "Pharmaceuticals". From 2007, CAS no longer categorizes information by collective index periods, so the new CA index names no longer have a "CI" label. It is important to check the "Index Guide" that corresponds to the period you are searching in order to be sure of finding the correct term for use in the "General Subject Index".
Not every preferred term or phrase is found in the "Index Guide," and if you do not find a listing there, assume that you have chosen the correct preferred term and look in the appropriate section of the "General Subject Index". Always be aware that preferred terms may change when the boundaries of the Collective Index periods are crossed.
Look at a sample record from the CA Student Edition on OCLC, paying particular attention to the index terms and the use of abbreviations.
For most online commercial bibliographic databases, the database vendors will define a default subject index (BASIC INDEX) in which subject words are searched. In the CA File on the STN system, the Basic Index contains subject words from the titles, keywords, abstracts, and controlled vocabulary of the documents (and so-called TEXT MODIFICATIONS of the controlled vocabulary entries), plus CAS Registry Numbers used to index the documents. The vendors will define in the database summary sheets exactly what terms are included in a Basic Index search.
As seen in the sample record from the CA Student edition, the text modifications to the controlled vocabulary terms were sometimes difficult to interpret, e.g., "(intramol., of silyloxytetradecatrienoate and silyloxytetradecatrienal, stereochem. of)". So beginning in October 1994, CAS introduced a format that is easier to read.
Old style:
- Adsorbed substances
- (carbon monoxide and water and nitric oxide, on copper-silica catalysts, reactions of)
New style:
- Adsorbed substances
- (adsorption and reactions of carbon monoxide and water and nitric oxide on copper-silica catalysts)
As noted above, the SciFinder Scholar topic search will do some behind-the-scenes work to find appropriate terms to include in a search, so people who use that search tool do not have to worry as much about controlled or uncontrolled vocabulary when they perform a research topic search. However, with some caution, as noted above, you may use synonyms in parentheses next to a related concept, for example, ESCA (XPS).
Section Codes for Online Searches
Since the information in Chemical Abstracts is classified into 80 major subject sections, the section numbers and codes can be used on STN and Dialog to limit a subject search. For example, works dealing primarily with enzymes are found in sections 3 and 7 of the weekly Chemical Abstracts. Other documents are assigned to one of the 80 subject categories divided into the following gross categories:
| Section Name |
Section Code |
Section Numbers |
|---|---|---|
| Biochemistry | BIO/CC | 1-20 |
| Organic Chemistry | ORG/CC | 21-34 |
| Macromolecular Chemistry | MAC/CC | 35-46 |
| Applied Chemistry & Chemical Engineering | APP/CC | 47-64 |
| Physical, Inorganic, & Analytical Chemistry | PIA/CC | 65-80 |
Thus, a strategy that included in an online search on STN:
=> S L4 AND (3 OR 7)/CC
or
=>S L4 AND BIO/CC
would have the effect of limiting the retrieved documents in answer set L4 to those dealing with enzymes (found in sections 3 or 7 of the printed CA) in the first case, and those of a biochemical nature found anywhere in sections 1-20 of the printed product in the second case.
Refining Searches on SciFinder
SciFinder Scholar searches can be refined by many other options, as seen below.
(Reproduced with permission of CAS, a division of the American Chemical Society.)
Similar refinements are possible with Web of Science and other database searches.
Another way to modify a subject search is to analyze by index term on SciFinder Scholar. The example below shows the results of analyzing the 11,126 records from the XPS(ESCA) search found by first limiting the search to the CAplus database, then limiting to the period 2003- (performed on October 3, 2004). Once the analysis has been done, it is possible to select terms of interest simply by checking the boxes and getting the results.
(Reproduced with permission of CAS, a division of the American Chemical Society.)
Specialized Abstracting and Indexing Services for Subjects or Document Types
There are many specialized abstracting or indexing services that cover either a subset of chemistry, e.g., Analytical Abstracts, or a particular format, e.g., Proquest's Dissertation Abstracts International and their online dissertation services. Many of the techniques for subject searching discussed in this chapter are applicable to those works, but acquainting yourself with the guides, database summary sheets, and other user aids for any tools you choose to search is a very good idea.
Full Text Databases
Special techniques, particularly the use of proximity operators, are critical to success in searching text databases. Electronic primary journal databases are now widely available on the Web. American Chemical Society journals can be searched by subject on the Web only by words in the article titles or in the full text of the articles. More sophisticated searching is reserved to the Chemical Abstracts database and a link through CAS's ChemPort service to the articles themselves. The ACS Electronic Supporting Information (formerly called Supplementary Material), containing more detailed data and other supplements not found in the printed journals, is also available to subscribers of ACS journals on the ACS Publications Web site. Links to the Supporting Information can be found in the table of contents for those issues that include such data or linked to the HTML version of the articles themselves.
Elsevier Science makes available on the Web a search engine named Scirus that covers both Elsevier journals and Web resources.
Link to Internet Sources for Searching by Subject
This wiki page was originally created by Gary Wiggins. If you have a legitimate desire to contribute to its contents, please request an account from the sysop, Dr. David J. Wild, by e-mailing him at djwild @ indiana.edu
