Issues in Science and Technology Librarianship
The primary goal of this article is to assist researchers and librarians to accurately and completely search Chemical Abstracts Service's (CAS) SciFinder web-based system for inorganic substances, excepting coordination compounds. CAS indexing policies and conventions in handling the wide variety of inorganic materials are described in detail. Detailed search strategies for retrieving salts, oxides, sulfides, minerals, alloys, metal clusters, tabular inorganics, indeterminate derivatives, and unknown/variable compositions are reviewed. Although this article is of primary value for SciFinder researchers, the analysis of how CAS handles inorganics will be of use to searchers regardless of the platform used to access Chemical Abstracts information.
Although organic compounds covered by the Chemical Abstracts Service (CAS) Registry System (Dittmar et al. 1976) are numerous and diverse, inorganic compounds are often more challenging to search and retrieve successfully. Many inorganic compounds have imprecise compositions, ranges of weight percentages, or varying ratios for each component. Often it is impossible to draw out a structure. Such compounds (e.g., alloys) are usually represented by a composition table of individual components. CAS treats ionic bonds and salts in ways that are often different from common conventions.
This article focuses on searching Chemical Abstracts on SciFinder, their new web-based platform, as of August 2010. All inorganic classes excepting coordination compounds are covered. Only CAPLUS (Chemical Abstracts literature database) indexing/records are considered. MEDLINE records are automatically searched and included in SciFinder results. However, MEDLINE records are indexed by the National Library of Medicine independent of CAS's rules, though many MEDLINE records contain CAS Registry Numbers.
Most of this discussion will be also applicable to searchers still using the older client software system, but the search examples use the web-based SciFinder terminology. A basic familiarity with the SciFinder system is assumed. Good overviews of this system are available including a number of excellent online tutorials produced by CAS (Chemical Abstracts Service 2010a; Haldeman et al. 2005; Schwall and Zielenbach 2000). A paper focused on topical searching also briefly reviews the development of the SciFinder system including a timeline through 2005 (Wagner 2006). The appendix is a quick reference table that summarizes all the search techniques described in this article.
SciFinder has several options for retrieving substance information. The following query screens are cited throughout this article. For brevity's sake, the screens are usually referred to only by their second-level designations, e.g., Substance Identifier.
Figure 1: Abridged screen shot of Explore References default query screen.
Figure 2: Abridged screen shot of Explore Substances default query screen.
For STN International users, detailed discussion of dictionary (i.e., non-structural) and structure searching can be found in various REGISTRY File online and print documentation (Chemical Abstracts Service 2010b). However, since SciFinder and REGISTRY are based on the same underlying data, the explanations and strategies described in this article will be of direct value to the STN searcher.
Little has been written about searching SciFinder specifically for inorganics. Registry System conventions for inorganics are briefly described in a few articles, but either predate SciFinder or say little about how to actually search for these compounds (Cooke and Ridley 2004; Moulton 1993; Ryan and Stobaugh 1982). Only the recent book by Ridley (2009) provides both an excellent overview of the entire system and also discusses how to search for specific inorganic compounds in the book's Appendix 4. Unfortunately, the most complete documentation of Registry System occurs in old print STN International REGISTRY File documentation that is now hard-to-find (STN International 1993; STN International 1991; STN International 1990).
Before launching into a detailed discussion of searching for individual categories of inorganic compounds, a few definitions and general comments about CAS conventions are in order, especially regarding molecular formulas. CAS assigns every compound to one or more categories, called Class Identifiers. These Class Identifiers underlie many of the refine (limit) features in SciFinder, inform correct search strategies, and are used as the underlying structure for this article.
Table 1 lists the official classes used by CAS to categorize inorganic substances, excepting coordination compounds, and provides the number of substances assigned to each class.
Table 1: REGISTRY File record counts for classes containing inorganics as of August 16, 2010
CAS Class Identifier Field (/CI)
Generic registration/CI (GRS/CI)
Incompletely defined substance/CI (IDS/CI) 2, 3
Manual registration/CI (MAN/CI)
Mixture/CI (MXS/CI) 2
Tabular inorganic/CI (TIS/ci)
Unknown or variable composition or biological substance/ci (UVCB/ci)
1Substances may appear in more than one category.
2SciFinder has option to pre-limit searches (see Fig. 2) and post-refine substance sets for this class.
3Less than 400 inorganic compounds are classed as IDS, many of them boron cage compounds.
Many inorganic compounds are treated by CAS as multicomponent substances with no attempt to describe the overall structure of the material as a whole. Instead, as far as possible, a composition table lists each component, its structure, component ratio or weight percent, and, possibly the CAS Registry Number for the component. The exact format of this table varies depending on the class of the compound, as assigned by CAS.
Multicomponent Substances include many salts, hydrates, addition compounds, mixtures, alloys, many minerals, and intermetallic compounds. They are any substance containing dot-disconnect molecular formulas, where each component with a known structure has its own connection table, i.e., structure. The component structures give no indication as to how the components are bonded together. Molecular formula conventions for multicomponent substances are discussed in the next subsection.
A major CAS class of multicomponent materials is Tabular Inorganic Substances. By CAS definition (STN International 1993, p. 2.27), Tabular Inorganic Substances are inorganic compounds that do not receive a structure-based atom level connection table representing the entire material because one of the following is true:
There is a fine distinction between Tabular Inorganic Substances and multicomponent inorganic substances. Not all multicomponent inorganic substances are tabular inorganics. However, all tabular inorganics are multicomponent. Alloys are an example of multicomponent substances that are not formally classed by CAS as Tabular Inorganics. This distinction will become apparent as each class of compounds is discussed.
Six general tips for searching inorganic compounds are worth noting initially:
To retrieve the base element, simply search the element name on the Substance Identifier query screen.
To retrieve the element with all its ions and isotopes, enter the element symbol on the Molecular Formula query screen. This result set can then be limited to all isotopes of the element using the 'Refine By: Isotope-Containing' feature. The only way to search for a specific isotope is to enter the precise CAS index name in the Substance Identifier query screen. All ions of the isotope will also be retrieved.
Example: 'Germanium isotope of mass 76'
Searching for elemental ions requires special limiters. First, one draws the element and attaches the desired charges in the Structure Editor window. To avoid false drops, the searcher must specify Exact Search and check both the Show Precision Analysis and the Single Component boxes. In the pop-up Precision Candidates window, check the Conventional Exact box. These results can also be limited to isotope-containing species.
For all the precision of the CAS Registry System, CAS does not assigned separate registry numbers to allotropes. The only exceptions known to this author are the allotropes of carbon allotropes such as graphite, diamond, and chaoite. The allotropic form is generally specified as a text term (text modifier) immediately following the registry number as shown in this index entry from a CAPLUS (Chemical Abstracts) record where 7723-14-0 is the registry number for elemental phosphorus:
7723-14-0 Red phosphorus, uses
red; PVC/ABS alloy material with improved fluidity and heat and fire resistance
To assure complete retrieval, one should perform two independent keyword-based searches, save the results, and combine them using "Combine Answer Sets' feature.
At the time this search was run, the first approach yielded 1,169 hits and the second yielded 3,065 hits. Combining the sets (Boolean Or) created a master set of 3,066 hits. This particular example only added one hit to the text search of 'red phosphorus' so it does not demonstrate the usual value of trying multiple approaches. However, what if that one extra hit was the key item that answered the original question? In addition, if a searcher had tried only the first approach based on an inspection of index terms, 1,897 likely relevant articles would have been missed.
Searching for simple ionic species (e.g., a sulfate ion) is analogous to searching for an elemental ion. If one wants only the base sulfate ion, one can search for 'sulfate ion' on the Substance Identifier query screen. Assuming one has exactly matched the CA Index Name or synonym, a single hit will be retrieved.
To retrieve all forms of the sulfate ion, first one draws the ion with the correct bonding and attaches the two negative charges to the appropriate oxygen atoms in the Structure Editor window. To avoid false drops, the searcher must specify Exact Search, and check both the Show Precision Analysis and the Single Component boxes. In the pop-up Precision Candidates window, check the Conventional Exact box. The results include hydrates. These results can also be limited (Refine By) to isotope-containing species.
It is easy to forget that these simple ion registry numbers usually have a large number of references associated with them. The base sulfate ion substance record [14808-79-8] has over 34,000 literature references. A searcher should consider including these ionic species registry numbers in their search query. CAS often indexes literature references only to the ion if that is the central focus of a given article, even if the original source of that ion was a common salt like sodium sulfate. Hence, many searches for sodium sulfate should likely use the sulfate ion CASRN, if the researcher's focus is on the sulfate portion of the salt.
Retrieval of elementary particles is straightforward. Simply enter the name of the particle in the 'Explore Substances: Substance Identifier' query screen. Below is the Registry record for top quark:
CAS Registry Number: 183748-11-0
CA Index Name: Quark, qt
Synonyms: Quark, t; Quark, top; Top quark; t quark
CAS Class Identifier: Manual
No Structure Diagram Available
Number of Literature Reference: ~5,482 References
Note that the sample Registry records provided in this article do not have the bolded field tags except for the CAS Registry Number. These tags are provided for clarity and to assist those searching File REGISTRY on other platforms using field codes. Although the goal of keeping SciFinder simple is commendable, this is a case of oversimplification. Adding these field tags to the SciFinder substance display should be a priority for SciFinder programmers.
Obviously with subatomic particles, no structure searching is possible.
Salts are a generic class of compounds that can be considered to be formed by the replacement of hydrogen(s) in the acidic function of acids by a metal or its equivalent (e.g., an NH4+ ion) (Ryan and Stobaugh 1982). Stretching this definition to include water, simple hydroxides are also considered in this section since the search techniques are essentially the same. By CAS convention, many salts are considered multicomponent substances, but are not tabular inorganics. Note that some materials that a searcher might consider a salt will actually be classed as a tabular inorganic (Section VII), should they fit that definition.
For such a simple class, retrieval is surprising complex. Different strategies are needed for:
In addition to a molecular formula which follows the strict Hill convention, salts may have a line formula following the CA Index Name in parenthesis that follows more common conventions (e.g., 'NaBr' rather than the Hill convention 'BrNa').
Line formulas are often used for resolving ambiguity where two or more inorganic substances have the same base CA Index Name (STN International 1993, p. 4.36). Line formulas generally follow a metal/nonmetal or cation/anion pattern, with parentheses around ionic groups having two or more elements.
Index Name Examples: Iron chloride (FeCl) Iron cyanide (Fe(CN)3)
These formulas are not posted to (i.e., indexed in) the molecular formula field. They are nomenclature terms that are searchable in SciFinder via the Substance Identifier query screen, provided the term retrieves less than 100 records. The line formulas are posted both as a complete unit, and if the line formula contains punctuation (dot-disconnect), as individual segments created by the removal of punctuation.
Example: Indium chloride cyanide (InCl2(CN)) [603957-48-8] can be retrieved either by searching InCl2 or InCl2(CN) on the Substance Identifier query screen. Searching for NaCl as a substance identifier fails because there are more than 100 records with this fragment. Unfortunately, SciFinder simply returns zero hits with no further explanation at this point in time.
Simple ionic salts can readily be searched by common name, line formula, or the molecular formula in Hill Convention order (strict alphabetical). For MF queries, proper capitalization and spaces between elements will remove ambiguities ( e.g., 'OS' could be oxygen-sulfur or osmium). However, the system automatically detects any ambiguities and asks the user to revise the query.
A name search will miss any slight variation of the substance including isotopes and mineral forms.
SciFinder will usually reorder query MF not conforming to the Hill convention and retrieve the desired compound(s). However, it is best to know and follow the CAS conventions to assure proper retrieval, especially as one deals with more complicated cases, such as, multicomponent substances. Dot-disconnect formulas are never used for these simple ionic salts.
Compounds with multiple anions that are single elements, hydroxyl ions, or cyanide ions are still treated as a single component substance. For example, indium chloride cyanide (InCl2(CN) [603957-48-8] under CAS conventions has a MF of 'C Cl2 In N'. CAS actually classifies this substance as a coordination compound, but it can be searched as if it were a simple ionic salt. This is a great example of why searchers are advised to find a simple, common analog of desired target compounds to double check any assumptions about how CAS registers the substance.
Compounds with multiple metal cations typically have a dot-disconnect formula. These are discussed under Section VII (Tabular Inorganics).
|Table 2: Simple Ion Salt Searching|
Substance Identifier Query
Fe H3 O3
Fe H3 O3
Searching the line formula 'Fe(OH)3' as a Substance Identifier will also retrieve all multicomponent substance containing this species.
More complex hydroxides and similar materials, such as iron hydroxide (Fe2(OH)5) [187544-98-5], are treated as tabular inorganics and are assigned only the molecular formula 'Fe . H O' (see Section VII). The presence of an alternate MF '(Fe2(OH)5' cannot be assumed, even if that line formula appears in the CA Index Name, searchable via the Substance Identifier query screen.
Salts derived from oxygen-containing acids are treated as multicomponent substances represented by dot-disconnect MF with the acidic hydrogens retained in the formula. This convention has often caused confusion in molecular formula searching since even a simple compound like sodium sulfate [7757-82-6] is assigned the CAS MF 'H2 O4 S . 2 Na'. Note the period after the S. This is the all important 'dot' in the dot-disconnect formula.
The reason for this goes back to the earliest days of the Registry System in the mid-1960s. Print indexes were still the only means of access to Chemical Abstracts. This convention permitted all the salts of sulfuric acid to appear in one place in the Chemical Substance Index.
Molecular formula searching in SciFinder is an exact search. Searching the MF of copper(II) sulfate 'H2 O4 S . Cu' [7785-98-7] will not pick up any of the hydrates.
|Table 3: Oxygen-containing acid salts|
Substance Identifier Query
H2 O4 S . Na
H3 O2 P . K
If one wants a specific hydrate with a fairly simple name, say, sodium chromate tetrahydrate [10034-82-9], one can simply search for that name on the Substance Identifier query screen. If this fails or one is uncertain which hydrates, if any, exist, one must resort to either a MF or a structure search.
All hydrates receive a dot-disconnect formula. For example, copper(II) sulfate pentahydrate [7758-99-8] is assigned the MF 'Cu . H2 O4 S . 5 H2O'. Provided that one adheres to the Hill convention and ordering of components, this molecular formula can be searched directly.
However, if nomenclature and MF searching fails or the searcher wants to retrieve all known hydrates of a base salt, one must resort to an exact structure search combined with several limiters to eliminate most of the false drops. It is most unfortunate that SciFinder can not limit to a specific number of components as can be done with File REGISTRY on STN International.
To retrieve all hydrates in SciFinder, one draws in each component structure as an individual fragment, even if that fragment is a single atom. Remember that dot-disconnect compounds only have structures for each isolated component. For example, to retrieve all hydrates of copper (I or II) sulfates one must:
At the time this search was run by the author, it retrieved 40 hits. The set retrieves all copper sulfate hydrates including number of mineral forms and the generic copper sulfate hydrate CA MF 'Cu . x H2 O4 S . x H2 O' [10237-72-6], more conventionally written as Cu(SO4)x.xH2O. The false drops were fifteen copper hydroxide sulfate hydrates since even doing an exact search did not distinguish between the hydroxyl group and the water molecule.
Though this technique is successful, it is admittedly convoluted and counterintuitive. In particular, checking the Single Component box is counterintuitive since hydrates have a dot-disconnect molecular formula and display as individual component structures. It was only by trial-and-error that the author discovered that this particular strategy eliminates most false drops whereas other strategies do not.
This is an area that SciFinder programmers should reexamine in order to simplify the search process.
Salts of cations with multiple valences have a substance record for each valence state (e.g., Fe(OH)2 and Fe(OH)3) as well as a generic record for iron hydroxide where the reference did not specify which valence state existed or where there was an indeterminate mixture of the two. These "generic" records should not be ignored. Iron Hydroxide (Fex(OH)x) [11113-66-9] had over 2,000 literature references as of August 2010.
These unspecified ratio compounds are treated as tabular inorganics which means the MF is a dot-disconnect 'Fe . H O' with structures only for the isolated components. Contrast this to the record for ferric hydroxide which, as shown in Section III.A. has an MF of 'Fe H3 O3' and a fully drawn out structure of three hydroxyl groups attached to the iron atom.
This example illustrates that non-metallic anion species are often treated as a group rather than individual atoms, even for tabular organics. To retrieve the unspecified Fex(OH)x, one must search the dot-disconnect formula 'Fe . H O' Note there is no dot between the H and O atoms. This search retrieves eight compounds; the target compound plus seven iron hydroxides ranging from 1-5 iron atoms and 1-12 hydroxyl groups.
It is extremely important to remember that "generic" records may exist when searching for almost any compound in the Registry System. The Registry System was created to index the chemical literature. CAS indexers can only be as specific as the original reference is in describing the compound. To use an example for the organic field, there are four records for dichlorobenzene; 1,2-; 1,3-; and 1,4- isomers and one for unspecified mixtures of the three isomers, CA Index Name 'Dichlorobenzene' [25321-22-6]. Nearly 2,000 literature references are linked to this generic record where the original document simply referred to "dichlorobenzene" or used an indeterminate mixed isomer material.
Searching simple oxides and sulfides is identical to searching for binary salts. One enters a common name or a Hill convention MF. Hence, zinc disulfide [12402-34-5] can readily be retrieved by that name or the MF 'S2 Zn'. An MF search retrieves all related isotopes, minerals, and charged species.
Searchers may not realize that CAS assigns a registry number to each naturally occurring mineral form in addition to the synthetic/manufactured material of the same composition. For example, a molecular formula search for titanium dioxide using 'O2 Ti' retrieved 21 substances including the base titanium dioxide [13453-67-7] with over 226,000 literature references, eight isotopic compounds, nine charged species, and three mineral forms: Brookite [12188-41-9], Rutile [1317-80-2], and Anatase [1317-70-0]. The three minerals had a combined total of over 15,000 references. Unless the literature reference clearly reports a specific mineral form, the base titanium dioxide registry number [13453-67-7] is assigned to the record.
CAS defines a mineral as "a naturally formed chemical element or compound having a definite chemical composition, and usually, a characteristic crystalline form" (STN International 1993, p. 2.33). In STN, mineral substance records are assigned the tag 'MNS' in the searchable Class Identifier Field (/CI).
Minerals can be a) single component or b) multicomponent, tabular inorganic substances. If they are multicomponent, the substance record is assigned to both the Mineral and Tabular Inorganic classes and contains a composition table giving ratios and registry numbers for each component. As with alloys (Section VI), these composition tables are displayable, but not searchable, in SciFinder. The CA Index Name typically contains the line formula representation for the mineral.
Name searching is the easiest and most precise way to retrieve a given mineral. However it is also possible to draw each component as a disconnected fragment and execute an exact structure search.
For example, in SciFinder, Kaolinite [1318-74-7] can be retrieved by:
Substance sets in SciFinder cannot be analyzed or refined (e.g., limited to) the mineral class.
CAS defines an alloy as "a mixture of metals with other metals, nonmetals, gases, or nonmetallic compounds that is miscible when molten and does not separate on cooling" (STN International 1993, p 2.27). Unlike minerals, any retrieved substance set can be limited to alloys using the 'Refine by: Only retrieve substances that: Are in specific substance classes: Alloys' check box. This Alloys check box is also available as a pre-search limit on the Chemical Structure query screen. Alloys by definition are multicomponent substances and receive a dot-disconnect molecular formula. However, they are not classed as tabular inorganics.
Although the definition is straightforward, the way CAS has indexed alloys over time is complex, requiring a fair amount of explanation before discussing search techniques. Much of the information in this section is derived from an STN Technical Note published in 1990 (STN International 1990).
Unless the composition is completely unknown, all alloys include a material composition table giving the weight percentages (or ranges) and the registry number for each element/component. Although this table is very similar to the composition information contained in tabular inorganic records, alloys and tabular inorganics are treated as two distinct classes of compounds by CAS. In File REGISTRY on STN International, this table is searchable via the Material Composition field (/MAC). Unfortunately in SciFinder, the composition table is not searchable, but it does display in the detailed substance record. This severely limits search specificity when searching for alloys in SciFinder. In STN, alloy substance records are assigned the tag 'AYS' in the searchable Class Identifier Field (/CI).
Most critically, SciFinder has no current option to specify the exact number of components when doing a structure search, other than limiting the search to single component compounds.
CAS made a major change in how it indexed alloys in 1972. Prior to 1972, alloys were handled qualitatively; i.e., concentrations of the components were not listed. Hence, alloys were not originally assigned registry numbers. Until CAS completed its retrospective assignment of registry numbers back to 1907, literature references to alloys prior to 1972 could only be retrieved using CA Index Terms. Now that the retrospective project has been completed, many pre-1972 references have been enriched with additional CASRN entries. Hence, registry number or substance searching is generally the preferred approach across the entire file, though it can be backed up by keyword searching to assure that more generally described and indexed alloys are not missed.
Since 1972, the element in greatest concentration in an alloy is designated as the "base" and all other components designated "non-base". For more general discussions of alloys in references, CAS often assigned chemical substance index terms such as 'Copper alloy base' without assigning a registry number. Because these terms describe substances, SciFinder detailed record displays the term under the right-side Substances Index Column, not the left-side Indexing Column used only for what CAS calls "General Subject Terms". The practical impact of this distinction is that one can search for 'copper alloy base' in Explore Reference choosing the 'as entered' option to retrieve a set mainly containing this substance indexing term from 1972 forward. However, an 'Analyze by: Index Terms' lists no 'copper alloy base' entry because this analysis works only on the Indexing Column, i.e. general subject headings. Furthermore, these substance indexing terms do display when one does a Categorize analysis.
A detailed description of REGISTRY File fields searchable only in STN (e.g., Material Composition (/MAC), Relative Composition (/RC), and Class Identifier (/CI)) is beyond the scope of this paper (STN International 1993).
Prior to 1990, each element had to present an amount equal to or greater than 1/10 of 1% in order to be listed. Since 1990, weight percentages are listed in the Material Composition Table down to trace levels up to six decimal places, as reported by the literature reference.
About 5% of all alloys in the REGISTRY file are manual registrations. Manually registered alloys have no specified composition, and hence, have no molecular formula or structure. Many are trade-named alloys where the composition has not been published. At other times, all one has is a name though the composition may in fact be published. In these cases, the only option is to search the name or designation via the Substance Identifier query screen.
Standard industry code codenames and trade names may appear as the CA Index Name or as synonyms; (e.g., AISI H11). Standard codes are sometimes used to describe alloys both known and unknown composition. Some of these codes may appear as part of the CA index name for the alloy (e.g., Steel, (AISI C1080) [12725-38-1]). In this case, the code also appears alone a synonym. Other codes may appear only as synonyms for the alloy.
Common alloys such as brass, bronze, and cast-iron are registered as a "generic", that is, without any composition data. One example is Brass [12597-71-6].
In some instances, alloys contain one or more components that are not unique chemical substances. Such alloys are not assigned CASRN, but can be searched via keyword terms on the Research Topic query screen.
In SciFinder, this type of search can only be done via a molecular formula search. Simply input the elements and nonmetallic components, if present, in alphabetic order using the dot-disconnect convention. To retrieve alloys containing only iron, nickel, and manganese, search the MF 'Fe.Mn.Ni'. Refine the results by the 'Chemical Structure: Are in specified classes: Alloys' option. To activate this Refine option, one must draw the three elements as isolated atoms into the Structure Drawing Editor pane also displayed in the Refine column. Since SciFinder does not access the material composition table, ratios cannot be specified. Hence, a large numbers of alloys may be retrieved. One must search the REGISTRY file on STN International to be able to specify ratios or ranges of ratios.
In SciFinder, go to the structure drawing screen and draw in each individual element or non-metallic component as a separate entity. If all iron-manganese-nickel-tantalum containing alloys are desired, simply draw in these four isolated atoms. Do an exact search and check the Alloys class box. Although elements cannot be eliminated, the fairly new Analyze by Elements feature can be used to browse and select (limit) the set to additional elements not specified in the structure drawing. As with all multicomponent substances, specific ratios and ranges can only be searched on STN International. Alloy search techniques are summarized in Table 4.
|Table 4: Search Strategies for alloys|
Trade, common, code name is known
Explore by Substances: Substance Identifier name search (exact search)
Alloys limited to a fixed number of elements or components
MF search of a dot-disconnect formula (e.g., 'Al.Co.Cr.W'). Refine by Alloys class. Ratios and ranges of ratios can only be searched in STN International.
Alloys of specific components; others may be present
Chemical Structure. Draw disconnected atoms/components for each desired entity. Check 'Alloys' box & do Exact search. Analyze by Element useful in browsing & limiting to additional elements present. Ratios and ranges of ratios can only be searched in STN International.
Cermets are alloys containing nonmetallic compounds (STN International 1990, p.20). The materials composition table contains line formulas for most multi-atom alloy components (e.g., metal oxides, nitrides, salts, etc.).
An example of a simple cermet is an 80% tungsten carbide-20% cobalt alloy [37193-29-6] having the dot-disconnect MF 'C W . Co' and a CA Index Name 'Tungsten carbide (WC), alloy, WC 80,Co 20'. Clearly a molecular formula search is the quickest and most precise way to retrieve cermets, once one understands how the MF is constructed.
Interestingly, this substance can be retrieved by inputting the line formula at the end of the index name 'WC 80,Co 20' using the Substance Identifier query screen. However, this name fragment must be entered exactly as written with no spaces around the comma. This strategy is not recommended as cermets are often expressed with ranges of composition for the various components and also depends on absolute uniformity in assignment of CA Index Names.
Intermetallic compounds have two different definitions (Hawley and Lewis 2002).
In CAS, intermetallic compounds are multicomponent, tabular inorganic substances with a composition table. They are assigned dot-disconnect MF. However, they also typically have an alternate non-dot-disconnect MF providing the ratio between the metals. In this sense, they share some conventions with alloys and with salts. Note that CAS does not class intermetallics as alloys. The example below makes this clear.
CA Index Name: Tin, compd. with copper (1:3) (8CI)
CAS Registry Number: 12019-61-3
Standard MF: Cu . Sn
Alternate MF: Cu3 Sn
Synonyms: Copper, compd. with tin (3:1)
CAS Class Identifier: Tabular Inorganic Substance<
Composition [tabular inorganic composition table]
This record can be retrieved by either the dot-disconnect formula or the alternate formula which, it is important to note, does not have any dots. Searching the MF Cu3Sn retrieves only the compound shown above. Searching MF 'Cu . Sn' retrieves all copper-tin binary intermetallic compounds regardless of ratio. However, it also retrieves all copper-tin binary alloys. The alloy records can easily be eliminated by using 'Refine by: Only retrieve substances that: Are in specific substance classes: Organics, and others not listed.' Note that use of this Refine option requires drawing the two elements as isolated atoms in the Structure Drawing Editor pane also displayed in the Refine column.
Note that the tabular inorganic composition table gives component ratios whereas the alloy composition table gives component (weight) percent. This is one important clue that the record is a tabular inorganic although the detailed substance record display contains the 'Tabular Inorganic Substance' tag directly below chemical names.
Homogeneous (single component) metal clusters are registered provided that they are well characterized and can be described by a discrete molecular formula (Shively and Roth 2008). A specific cluster can readily be retrieved by searching the molecular formula in either the MF or the Substance Identifier query screen. Metal clusters appear to be consistently assigned a synonym in the form of '[element name] cluster ([element symbol]x)'. Hence, a Substance Identifier search for 'gold cluster' retrieves 27 gold clusters ranging from Au2 up through Au55. Examples of metal clusters are:
Metal clusters are manually registered by CAS which the process they use for compounds that have a structural description that cannot be processed by the CAS Registry System and hence, receive no connection table.
Metal clusters are assigned a registry number not because they are nanoscale material, but rather because they are homogeneous and have a specific MF. CAS does not separately register nanomaterials or nanoscale objects as defined by external dimensions of the material. Nanomaterials are noted in the subject and keyword indexing.
Minerals and intermetallic compounds have already been discussed. However, there are many other types of tabular inorganic substances including ceramics and other complex oxides, hydroxides, cyanides, carbides, and nitrides, especially those with mixed metals. It is difficult to fully categorize all the compounds that are registered under this class. The key is that the material fits the definition, and thereby, receives a composition table with component registry numbers and component ratios (or ranges), when known. Each component can be an element or a species such as hydroxyl or carbonate.
The standard molecular formulas are always dot-disconnect with no ratios included. An alternate MF is also often assigned and appears in the CA Index Name as a line formula at the end of the name. However, there is no guarantee that a particular tabular inorganic will be assigned an alternate MF. Alternate MF will have integer, integer ranges, or decimal compositions/ranges given for each component, if known. Many of these substances receive an index name that includes the phrase "compd. with".
The following record is a typical tabular inorganic where exact ratios are known:
CAS Registry Number: 154948-23-9
Standard MF: C N . C O3 . Al . Fe . H O . Mg
Alternate MF: C0.45 H2 Al0.33 Fe0.07 Mg0.67 N0.43 O2.06
CA Index Name: Aluminum iron magnesium carbonate cyanide hydroxide (Al0.33Fe0.07Mg0.67(CO3)0.02(CN)0.43(OH)2)
CAS Class Identifier: Tabular Inorganic Substance
To review, tabular inorganics do not have an overall structure drawn out in the Registry System (i.e., do not receive a connection table) because at least one of the following is true:
Although this CAS class covers a vast variety of materials, they are all searched in the same way, regardless of whether a researcher would consider a particular substance as a ceramic, mixed metal oxide, complex hydroxide, etc.
Molybdenum silver oxides will be used as a model compound in the following examples.
If only a specific ratio such as Ag2Mo2O7 is desired, then one simply searches the MF 'Ag2 Mo2 O7'. This strategy is dependent on there being an alternate MF assigned to the record in addition to the standard dot-disconnect formula. If the searcher gets no hits, the standard MF 'Ag . Mo . O' must be searched. This forces one to sort through all silver molybdenum oxides to see if the desired ratio is actually registered, but was not assigned an alternate MF.
If it is especially important to not miss any references, one can follow up this or any reasonably distinctive MF with a Research Topic search using the query 'ag2mo2o7 not 22914-54-1'. At the time this query was tested, it retrieved a single reference where Ag2Mo2O7 was in the abstract, but no registry numbers had been assigned to the reference. The reference was a 2010 reference so it is possible that CAS has not completed the indexing for this item.
Where all molybdenum silver oxides are desired in any ratio, search the dot-disconnect MF 'Ag . Mo . O'. Any silver-molybdenum-oxygen compound that is a tabular inorganic will be retrieved.
Draw in the individual, isolated elements (Ag, Mo, O) in the structure drawing window. Do an Exact Search checking the Show Precision Analysis and the 'Organics, and others not listed' boxes. Choose the Conventional Exact option in the Precision Candidate pop up window.
The silver molybdenum oxide example above has only elemental components, but the process is exactly the same when searching for tabular inorganics with multi-atom species such as carbonates and cyanides. Do not put a dot (.) between atoms in the multi-atom component groups. Nomenclature can be misleading since it may contain terms that would lead a searcher to believe it is a simple salt.
For example, silver vanadium oxide phosphates are treated as tabular inorganics with a MF of 'Ag . O4 P . O . V'. Note that the phosphate is treated as an distinct component. This particular molecular formula currently retrieves five substances. In this case, each substance has also been assigned an alternate MF showing the exact ratios.
Chemical Abstracts Service originally designed the Registry System as an internal tool to index the chemical literature. This literature of course contains many variable, indeterminate, and imprecisely modified compositions. CAS handles this problem in one of two ways:
When indexing the chemical literature, CAS always attempts to index substances as specifically as possible.
None of the materials described in this section are categorized by CAS as "Incompletely Defined Substances", an official CAS Class Identifier using the tag 'IDS'. The IDS class is almost all organics with the exception of about 370 registrations, mostly boron cage compounds. Incompletely Defined Substances are substances where the molecular formula is known but for which the complete structure is not known, often because the precise attachment points of all substituents are not known.
Hence, it is important to not use the check box labeled 'Incompletely Defined' on the Chemical Structure query screen or in the post-search Refine By class option when searching for inorganics with incomplete or non-existent structures. SciFinder programmers should either expand the list of classes shown on the Chemical Structure and 'Refine by' screens or else provide a clearer indication of the impact of checking the IDS box. Another point of confusion is, "What does the check box 'Organics, and others not listed' cover?" It requires trial and error and an careful review of class identifiers to discover that, contrary to expectations, tabular inorganics are included in this "Organics…" check box.
This approach is often used in indexing imprecise organic derivatives, such as polychlorinated biphenyls (PCBs). Unless a fairly small number of specific PCB isomers are specified, the literature reference may well be indexed to the biphenyl registry number [92-52-4] following formats like '92-52-4D, chloro derivs.' or '92-52-4D, polychloro derivs'.
The same technique is used for inorganic substances. CAS may have a Generic Registration registry number assigned to the actual, indeterminate substance, but that CASRN is seldom, if ever, used in actually indexing the literature (see the next subsection).
In SciFinder, these "derivative" or D-suffixed registry numbers can only be searched in the Research Topic query screen. This most easily explained by way of an example kindly provided by Dana Roth of CalTech in an e-mail to the CHMINF list (Roth 2002). A question arose about finding neodymium derivatives of aluminum yttrium borates (e.g., Al3Y(BO3)4 [13813-76-8]). These substances are sometimes treated as multicomponent salts, rather than tabular inorganics. Searching '13813-76-8D' in Research Topic currently produces 16 literature hits. Due to a flaw in the way hit registry numbers are displayed in the full record, index entries drop the 'D' from the hit RN, but is reinserted in the examples below for clarity. Sample index entries for these literature records are:
13813-76-8[D] Aluminum yttrium borate (Al3Y(BO3)4), Eu doped
13813-76-8[D], solid solns. with bismuth aluminoborate and cerium aluminoborate
13813-76-8[D], solid solns. with Group IIIB aluminum borates
The text phrases (CAS calls them text modifiers) are not controlled vocabulary like the index terms are. Hence, within this set of 16 literature references, there are at least three variations describing aluminum neodymium borate derivatives of aluminum yttrium borate:
13813-76-8[D], solid solns. with aluminum neodymium borate
13813-76-8[D], solid solns. with neodymium aluminum borate
13813-76-8[D], solid solns. with neodymium aluminoborate
However, there are also perfectly valid registry numbers for neodymium yttrium aluminum borates in the form of tabular inorganics, for example:
CAS Registry Number: 109165-91-5
Standard MF: Al . B O3 . Nd . Y
Alternate MF: Al3 B4 Nd0-1 O12 Y0-1
CA Index Name: Aluminum neodymium yttrium borate (Al3(Nd,Y)(BO3)4)
Synonyms: Aluminum neodymium yttrium borate (Al3Nd0-1Y0-1(BO3)4); Neodymium yttrium aluminum borate (Nd0-1Y0-1Al3(BO3)4)
CAS Class Identifier: Tabular Inorganic Substance
0 - 1
0 - 1
Hence, to retrieve all neodymium derivatives of aluminum yttrium borates requires a four-step process:
Whether inorganic substances in a reference are indexed generically as derivatives ('D') or more specifically as tabular inorganics depend primarily on the level of specificity in the reference and on the decisions made by the human indexer. The key point is that the searcher must always take into account the possibility of derivative-based indexing and design a set of strategies to retrieve references of interest regardless of the type of indexing used.
CAS assigns registry numbers to two special classes of substances collectively known as Unknown or Variable Composition or Biological Substances (UVCB). Although a discussion of biological substances is beyond the scope of this paper, the description given below is applicable to any material classed as a UVCB. The two classes that fall under the UCVB category are Registered Concepts and Generic Registrations described fully in the two subsections below.
Although registry numbers are assigned to these materials, most are seldom, if ever, used in indexing the literature. Hence, they cannot be used reliably to retrieve literature records, even if there are few (or many) references are associated with the substance record. To help alert searchers that these are not "normal" registrations, CAS places an asterisk beside the registry number and displays this note in detailed record, "* Should be combined with text terms for complete reference search results." This footnote could mislead searchers. One must combine registry numbers with text terms (Research Topic) in the sense of doing both strategies independently (i.e., a Boolean 'Or').
Many of these substances were part of special registration projects such as the U.S. TSCA inventory. Others are substances that at one time were indexed by CAS as a specific substance. However, they are now indexed as General Subject Headings, and hence, must be retrieved via a Research Topic search. Although organics and biological substances are most commonly found in this category, inorganics also populate the two classes. One of the challenges that confronts searchers is to be aware that these registry numbers might exist and index some of the literature of interest. Often these registry numbers are found serendipitously on a MSDS or other regulatory document.
In STN International, these materials are assigned the Class Identifier '/CTS'. In SciFinder, the detailed record display indicates 'Manual Registration: Concept' below the substance name(s). Old STN documentation states:
"Concepts are those substances with Registry Numbers that normally are not registered by CAS as specific substances. If they appear in CA [the literature database], they are indexed as General Subject Headings [index terms]"(STN International 1993, p. 2.28) .
For example, CAS has assigned a Concept RN to mica-group minerals:
CAS Registry Number: 12001-26-2 *
Standard Molecular Formula: Unspecified
CA Index Name: Mica-group minerals
Synonyms: Mica; 200D; ... Alsibronz 55; Ascoat 30; ...
[Long list of various trademarks, etc. where references did not provide structural information].
CAS Class Identifiers: Manual Registration, Concept
* Should be combined with text terms for complete reference search results.
Literature References: ~760 References
Unlike many UCVB registrations, one can see this CASRN has been used to index about 760 records as of August, 2010. However, as suggested by the asterisked footnote, searching this RN will retrieve a small fraction of the literature references on mica minerals. To do a reasonably comprehensive search, one should search at least 'mica group minerals' or more generally 'mica or micas'. 'Mica group minerals' (General Subject Heading from 1981 forward) retrieves over 27,000 hits while mica(s) ('mica', being the General Subject Heading prior to 1981) retrieves over 64,000 hits.
This example vividly illustrates not only the need for keyword searching, but also the importance of identifying and using CAS controlled vocabulary terms via an examination of SciFinder's 'Analyze by Index Terms' or the print CAS Index Guides to the various Collective Index periods (Chemical Abstracts Service [various dates]).
In STN International, these materials are assigned the Class Identifier '/GRS'. Nearly all these substances are organic. In SciFinder, the full record display indicates 'Manual Registration: Generic Registration' below the substance name(s). Again the most helpful information is found in old STN documentation:
"Generic Registrations are generic derivatives of a specific substance (e.g., chlorinated biphenyl) which have been assigned a Registry Number that is not used in routine CAS indexing. When these substances are encountered in normal CAS indexing, the part of the substance that can be completely defined is indexed (e.g., biphenyl [registry number] for chlorinated biphenyl), along with a phrase describing the type of derivative that is being indexed (e.g., chloro deriv. for chlorinated biphenyl)' The majority of these generic substances [as of 1993] have been registered for the TSCA inventory" (STN International 1993, p. 2.29) .
The indexing process quoted above is exactly what is described in Section VII:A. For inorganics, there is little practical difference between a Generic and a Concept registration, as can be seen by comparing the sample Generic Registration record below with the Concept record in the previous subsection. Neither type of record has a molecular formula or structure diagram.
CAS Registry Number: 1319-43-3 *
Molecular Formula: Unspecified
C AS Index Name: Carbonic acid, beryllium salt, basic
CAS Class Identifiers: Manual Registration, Generic Registration
* Should be combined with text terms for complete reference search results.
Literature References: ~0 References
With zero references, searching either 1319-43-3 or 1319-43-3D is useless. One must first identify specific beryllium carbonates which may be salts or tabular inorganics. This should be followed up by a Research Topic search for 'Basic beryllium carbonate' and generally the "as entered" option is best.
SciFinder is a wonderful exploration tool for all things chemical. However, comprehensive retrieval of inorganics requires a fairly extensive knowledge of CAS substance indexing policies and conventions. Many searches also require more than one approach, as demonstrated in this article. To assist searchers, a master table summarizing the strategies described herein is provided in the appendix, "Inorganic Searching Quick Reference Table.Inorganic Searching Quick Reference Table (PDF)
Chemical Abstracts Service. 2010b. STN user documentation [Internet]. [cited 2010 August 13]. Available from: http://www.cas.org/support/stngen/stndoc/index.html
Chemical Abstracts Service. [various dates]. Chemical Abstracts index guide. Columbus, OH: Chemical Abstracts Service.
Cooke, H. and Ridley, D.D. 2004. The challenges with substance databases and structure search engines. Australian Journal of Chemistry 57(5):387-392.
Dittmar, P.G., Stobaugh, R.E., and Watson, C.E. 1976. The Chemical Abstracts Service chemical registry system 1:general design. Journal of Chemical Information and Computer Sciences 16(2):111-121.
Haldeman, M., Vieira, B., Winer, F., and Knutsen, L.J.S. 2005. Exploration tools for drug discovery and beyond: applying SciFinder to interdisciplinary research. Current Drug Discovery Technologies 2(2):69-74.
Hawley, G.G. and Lewis, R.J. 2002. Hawley's condensed chemical dictionary. New York: Wiley.
Moulton, C.W. 1993. Composition: a critical property for chemical and material databases. Journal of Chemical Information and Computer Science 33(1):27-30.
Ridley, D.D. 2009. Information retrieval : SciFinder. Hoboken, N.J.: Wiley.
Roth, D. 2002. Help with synthesis information [CHMINF-L archived list posting]. [Internet]. [Cited February 17, 2011]. Available from: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0202&L=CHMINF-L&P=R2765
Ryan, A.W. and Stobaugh, R.E. 1982. The Chemical Abstracts Service chemical registry system 9: input structure conventions. Journal of Chemical Information and Computer Science 22(1):22-28.
Schwall, K. and Zielenbach K. 2000. SciFinder: a new generation of research tool. Chemical Innovation 30(10):45-50.
Shively, E. and Roth, D. 2008. Substances in CA / nanotoxicology [CHMINF-L Archived Posting] [CHMINF List]. [Internet]. [Cited February 17, 2011]. Available from: https://listserv.indiana.edu/cgi-bin/wa-iub.exe?A2=ind0809&L=CHMINF-L&P=R15076
STN International. 1990. Enhancements to substance searching on STN International: enhancements to alloy searching in the CAS Registry File. Columbus, OH: Chemical Abstracts Service. Technical Note No. 90/02.
STN International. 1991. The Registry File database description. Columbus, OH: STN International.
STN International. 1993. REGISTRY File: dictionary searching. Columbus, OH: Chemical Abstracts Service.
Wagner, A.B. 2006. SciFinder Scholar 2006: an empirical analysis of research topic query processing. Journal of Chemical Information and Modeling 46(2):767-774.