Previous   Contents   Next
Issues in Science and Technology Librarianship
Winter 2001

URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.

[Invited article]

CrossRef: A Collaborative Linking Network

Ed Pentz
Executive Director
CrossRef
epentz@crossref.org

Abstract

References are at the heart of scholarly journal publishing and therefore reference links are seen as an essential feature of online scholarly journals. Scholarly publishers created CrossRef, run by the non-profit Publishers International Linking Association, Inc., in order to make broad-based linking efficient and scalable across a wide range of primary publishers, secondary publishers, abstracting and indexing services, and libraries. CrossRef runs a system that enables publishers to assign unique identifiers -- Digital Object Identifiers (DOI) -- to articles and collects standardized metadata so that the identifiers can be retrieved using bibliographic data. Once the DOI for an article is known, a persistent link to the full-text article can be created. CrossRef is a milestone for the scholarly information industry.

Imagine a scholarly journal publisher looking at a set of references at the end of an article and contemplating linking as many of those references as possible to the online full text, wherever it may be. This is no easy task. The publisher must figure out the publisher of the journal from the journal title abbreviation usually present in a reference. Knowing the publisher of the cited article, the publisher of the citing article would need to know if, and how, links can be made to the cited article. This means tracking the linking schemes of potentially hundreds of publishers. However, just knowing the publisher and how to create links to the publisher isn't good enough since the linker also needs to know if the article is available online. Otherwise, the linker would be creating broken links that didn't resolve to an article. For a few references and a few publishers this isn't a problem, but imagine up to 50 references in each article citing articles in journals published by hundreds of different organizations all with different linking schemes. After a few years of grappling with these linking issues, a group of scholarly publishers began to work together and set up the CrossRef organization and system.

Citing articles in references is one of the foundations of the scholarly communication system. With references, authors make explicit links between their research and other articles that may, on the surface, appear unrelated. Eugene Garfield, the founder of ISI, tells us that through references, "authors should formally assert and verify their ideas are original and do not replicate discoveries already reported in the archive" (Garfield 1994). Links enable users to see the body of primary literature as an interconnected collection of articles. The goal is to allow readers to move from a reference to the full text of a cited article in one or two clicks.

With the advent of electronic journals, much attention has been given to multimedia and features like online peer review, online commentary and online communities, but a full system of reference links is an essential feature that has been missing to date. Peter Boyce of the American Astronomical Society (AAS), reporting on the electronic Astrophysics Journal Letters, a pioneering online journal, wrote in 1997, "Reader feedback continues to emphasize the importance of links by which it is possible to retrieve referenced articles" (Boyce 1997).

Because reference linking is so important, publishers of scholarly journals have an economic imperative to provide reference links -- journals without links will be seen as less valuable or useful than those with links. Many online journals have some links and have had them for a number of years. However, most of this linking has been within a very narrow, focused subject area, between large secondary database publishers and large primary publishers, or within proprietary journal systems.

For example, the astronomy literature is very well linked through the Astrophysics Data System and {HighWire Press} has extensive reference links between HighWire journals. Many publishers link references to PubMed, the online version of Medline. However, direct links between primary publishers and links between secondary services and smaller publishers were missing. Reference linking has been held back by the reasons mentioned above as well as the need for bilateral linking agreements between individual publishers; drafting such agreements is a laborious and time-consuming legal process. To have abundant links, a publisher would have to sign agreements with hundreds of organizations, an unworkable proposition. It is especially difficult for smaller publishers without extensive staffs to participate in reference linking.

Aware of the importance of linking and of the inefficiency of the existing linking system, publishers took the unusual step of cooperating to set up CrossRef, a collaborative reference linking service. Building on the important work of the DOI-X project (Atkins 2000), at the end of 1999 a group of leading scientific, technical, and medical (STM) publishers joined to form the non-profit, independent organization, Publishers International Linking Association, Inc. (PILA), which operates CrossRef.

The PILA Board of Directors includes representatives from AAAS (Science), Academic Press (Harcourt), American Institute of Physics, Association of Computing Machinery, Blackwell Science, Elsevier Science, Institute of Electronics and Electrical Engineering, Kluwer, Nature, Oxford University Press, Springer Verlag, and John Wiley & Sons. Even though CrossRef was incorporated only in January 2000, the CrossRef system went live in June 2000 and there are over 2.5 million articles from 3,400 journals indexed in the CrossRef database. There are well over 60 publishers participating in CrossRef. CrossRef was initially seen as a commercial scientific, technical, and medical (STM) initiative. This is incorrect since over 60% of CrossRef members are non-profit publishers and, while STM was an initial focus, CrossRef now covers all areas of scholarly publishing.

CrossRef functions as a sort of digital switchboard. It holds no full-text content, but enables publishers and other organizations to create links using Digital Object Identifiers (DOI), which are tagged to article metadata supplied by the participating publishers. A researcher clicking on a link will be connected to a page on the publisher's web site showing a full bibliographical citation of the article, and, in most cases, the abstract as well. The format of the link is determined by publisher preference; for example, a CrossRef button, or "Article" in HTML. The reader can then access the full-text article through the appropriate mechanism; subscribers will generally go straight to the text, while others will receive information on access via subscription, document delivery, or pay-per-view.

It is important to note that CrossRef acts "behind-the-scenes" and collects only a minimal amount of bibliographic metadata. Abstracts and full-text articles remain at publishers' sites and access to the material is controlled by publishers' access control systems. This has been referred to as "distributed aggregation." Users who are subscribers to the cited journal will in most cases have their Internet Protocol (IP) address checked and access the full-text content seamlessly. CrossRef is not a search system. End users do not access CrossRef directly; organizations access CrossRef to look up DOIs to create full-text links to scholarly journal articles.

For participating publishers, CrossRef offers three main services: the depositing of article metadata in the CrossRef database, the submission of the references in those articles for the purpose of obtaining their DOIs, and the creation of links using those DOIs.

The first step is for the publisher to obtain a DOI prefix from the International DOI Foundation (IDF). The cost of this service is covered by CrossRef membership. The publisher then submits minimal article metadata (journal title, article title, volume, issue, page, and first author) to the metadata database (MDDB), along with the DOI and URL. Metadata is to be in an XML-based Document Type Definition format, the standards of which are provided on the CrossRef web site. As part of the submission process, CrossRef registers the article DOI and URL in the central DOI Directory, run by the International DOI Foundation.

The publisher then submits the reference citations contained in each journal article to the Reference Resolver (RR), a front-end component of the MDDB. The RR allows the retrieval of DOIs, enabling the publisher to create links. The format and protocol for these submissions are also covered on the CrossRef web site. The publisher uses the DOI to create a normal DOI link. The DOI is sent to the DOI Directory and automatically resolved to the URL deposited by the publisher.

An example of a DOI is 10.1006/jmbi.1995.2434 -- this is a DOI for an article from Academic Press' Journal of Molecular Biology available on the IDEAL system. "10.1006" is AP's Prefix (each publisher has a unique prefix). After the "/" the publishers determines how to identify the article. In this case Academic Press uses a four-letter code for the journal, the year of acceptance and a sequential article number. This DOI as a link would appear as: http://dx.doi.org/10.1006/jmbi.1995.2434. Clicking on the link will take the user to the abstract page on the IDEAL system. Some publishers are using SICIs (ANSI/NISO) or Publisher Item Identifiers (Garson) for their DOIs.

The DOI is a very powerful tool. Reference linking until now has depended largely on algorithmic links, which employ URLs. Since a URL is not a true identifier, but rather a pointer to a location on a particular machine, one can reach a "file not found" error message if the file has been moved. A more serious problem is that this approach, like bilateral agreements, is not scalable; every publisher has to know and track changes in the linking format of every other publisher, which becomes an overwhelming task as linking proliferates.

By taking the standards-based DOI approach, in which a given DOI is always associated with a specific article, CrossRef has removed the need for participants to archive linking schemes. If a publisher changes its URLs, only the central DOI Directory needs to be updated and each DOI will automatically resolve to its new URL. The International DOI Foundation ensures interoperability among different user communities. Through close cooperation with IDF, CrossRef has launched the first large-scale, practical DOI application to address the sophisticated demands of readers of scientific and scholarly journals. CrossRef has also become the first official DOI Registration Agency, granting it the means to assign DOI prefixes to CrossRef members and to register DOIs in the system.

As a collaborative venture, the success of CrossRef depends on the cooperation of its members. Publishers must be prepared to receive incoming links at the time of metadata submission. They are also expected to maintain the accuracy of their metadata, DOIs, and URLs, and to provide information on article access.

CrossRef membership is open to primary scholarly publishers. However, many other organizations can benefit from using CrossRef to look up DOIs to create links to full-text articles. To fill this need, CrossRef has created the category of Affiliates. Affiliates are non-members, such as secondary database producers, subscription agents, and abstracting and indexing services who can sign up to use the CrossRef system. More importantly, CrossRef has created a flat-fee model for academic libraries. Library Affiliates only pay $500 per year for unlimited access to use CrossRef to lookup DOIs for full-text articles. In most cases, libraries will find that DOIs are included in databases and services that they license. The DOIs will be normal DOI links and users can click the links for no charge. A very important principle for CrossRef is that there is no charging for clicking links -- publishers pay to deposit content in the CrossRef system and organizations pay a small fee to retrieve DOIs to create links. The Retrieval Fee is a one-time lookup fee and any number of links can be created and clicked using the DOIs.

CrossRef costs the researcher nothing; its expenses are covered by charges to member publishers for depositing their metadata, retrieving DOIs, and annual membership fees. There are no charges for clicking on links. Affiliates pay an annual administrative fee and retrieval fees for looking up DOIs. Library Affiliates can pay a flat fee of $500 for unlimited access to DOI look up. Current fee schedules are posted on the CrossRef web site. Cross Ref fees are designed to cover costs are based on use of the system (so small publishers pay lower fees than larger publishers do). CrossRef itself has no stake in publishers' decisions regarding their charges for content access.

Inevitably, problems unique to the digital realm have arisen. Of most concern to libraries is what is known as the "appropriate copy" issue. An institution may provide access to a given article through more than one source; users must be able to discover which is the "appropriate" copy to use. For example, a library user should not pay for an article at the publisher's web site if it is also available through a library subscription to Ovid or EBSCO Online or in the library's print holdings. The question of how to provide "localized links" so that users can get to appropriate copies has been under discussion for several years. An excellent summary of the issues surrounding reference appears in a recent article written by Priscilla Caplan and William Arms (Caplan & Arms 1999).

In order to move this process along, CrossRef co-sponsored the "Workshop on Localization in Reference Linking" with NISO, DLF, CNRI and IDF (NISO). At this meeting a general architecture for localized linking was outlined and a practical prototype of this type of linking is now being planned. The prototype will involve CrossRef, IDF, DLF, publishers, libraries and others working together. DOIs, metadata and OpenURL are all important parts of the localized linking solution and they all work together. The OpenURL is a protocol for passing metadata between servers and one of the key identifiers recognized by the OpenURL is the DOI. But an OpenURL is still a URL and DOIs and standardized metadata are still essential for linking. CrossRef is committed to working with libraries and others on solutions to these problems.

Another major issue is the crucial question of how digital content is to be archived. Here too, CrossRef is seeking the answers we will all need in the years ahead. For example, CrossRef hopes to link to such archiving systems as JSTOR, which scans journal issues, in some cases going back to the 1800s. Assigning DOIs to these older articles means that they can be included in the linking network. When a user can click on a citation to an article from the 19th century and get to the full text online, scholarly communication will truly be transformed.

CrossRef provides the "missing link" in linking, making broad-based linking efficient and manageable for large and small publishers. CrossRef is open to publishers and other organizations to use and can benefit the entire scholarly communications process. Because CrossRef has taken the approach of using open standards it will need to be interoperable with other linking systems. The DOIs and metadata that CrossRef uses lay the groundwork for more sophisticated linking in the future.

We invite you to visit the CrossRef web site at http://www.crossref.org.

Those with further questions are welcome to call Ed Pentz, Executive Director, at 1-781-359-2435, or e-mail him at epentz@crossref.org.

References

Atkins, H. et al. 2000. Reference Linking with DOIs: a Case Study. D-Lib Magazine February 2000. [Online.] Available: http://www.dlib.org/dlib/february00/02risher.html [January 16, 2001].

Boyce, P. 1997. Electronic Publishing: Experience is Telling us Something. Serials Review 23(1): 1-10.

Caplan, P. & Arms, W. 1999. Reference Linking for Journal Articles. D-Lib Magazine July/August. [Online]. Available: http://www.dlib.org/dlib/july99/caplan/07caplan.html [January 16, 2001].

Garfield, E. 1994. The Concept of Citation Indexing. Current Contents. January 3. [Online]. Available: {http://thomsonreuters.com/products_services/science/free/essays/concept_of_citation_indexing/}. [January 16, 2001].

Garson, L. Publisher Item Identifier as a Means of Document Identification. [Online]. Available: {http://pubs.acs.org/journals/pubiden.html}. [January 16, 2001].

NISO/DLF/CrossRef Workshop on Localization in Reference Linking. Meeting Report. [Online]. Available: {http://web.archive.org/web/20071213104035/http://www.niso.org/news/events_workshops/CNRI-mtg.html} [July 24, 2000].

Previous   Contents   Next

W3C 
4.0 Checked!