Tips from the Experts

Using Scopus to Evaluate the Citation Habits of a "Real World" Academic Department (An Update to Prior Research)

Jeremy Cusker
Engineering Library
Cornell University
Ithaca, New York
jpc27@cornell.edu

Abstract

In 2012, this author published a paper describing a method for using the raw data from Web of Science to examine the journals cited by any given group of researchers and then compare that list to lists of 'top journals' of similar disciplines. It was not a straightforward method to use and required a great deal of effort and spreadsheet work by a user in order to properly deploy. Since that time, the Scopus database has made available a workflow that vastly simplifies -- and, indeed, improves -- this process. In this paper, a method of using Scopus to generate expanded lists of highly cited journals within specific research communities and the comparison of those with published lists of journals for similar disciplines is described. The value of finding these top-cited journals in research communities may be useful to librarians making collection development decisions.

Introduction

In 2012, I published a paper in this journal (Cusker 2012) that sought to answer the question "How representative are the citing habits of the specific group of [researchers] that make up an academic department when compared to all such scholars worldwide conducting similar research?" In that paper, I described a method of using Web of Science to extract such information from the raw data of published research papers' citations. Then I described how to compare such data to existing top journal lists from ISI (at that time the vendor and publisher of Web of Science and its associated Journal Citation Reports module, which has since then been acquired by Clarivate Analytics) and from other lists of top journals generated by Eigenfactor. The sample case examined -- that of Cornell University's Department of Earth and Atmospheric Science -- yielded interesting results: That is, the list of journals most frequently cited by the faculty of this department did not correlate to the high-impact journal lists provided by other sources consulted. The results generated did not conform to some other comparisons of measurements of journal quality, which showed that they generally tracked together (Blecic 1999; Elkins et al. 2010).

Comparison of one the quality of one database to another is of course common in the library science literature and comparisons of Web of Science to Scopus especially so. There are manifold quantitative comparisons of citation searching and recall, for instance Sarkozy et al 2015. Likewise, there have been a number of 'global' comparisons of the functionality and usability of these two databases, such as in Matsuura and Ogawa 2006 & 2008. However to the best of my ability to determine, relatively few other papers have an analysis of the specific usage case and methodology I described in my 2012 paper.

Although not heavily cited, presentations of my methodology at a variety of conferences and other venues have been well received. However as I admitted at the time of its publication, the methodology I described was vulnerable to a number of possible statistical artifacts. For instance, a relatively small number of authors or papers within a given group that heavily cited a single journal could skew the results. Likewise, the methodology was itself cumbersome, utilizing Web of Science in a way never quite intended, running to no less than 14 steps and necessitating three illustrated appendices for additional assistance. During the intervening years, I wondered if a more elegant, automated method might be devised to gather similar information. Several other papers have been published using various methodologies to perform this analysis, but they tended to rely either on very time- and labor-intensive 'manual' compilation of citation records (e.g., Tucker 2013) or else programmed, scripted harvesting of records from bibliographic databases.

In 2016, Cornell University Library (CUL) secured a license to the Scopus database, a product of Elsevier and I soon discovered it had features that easily enabled exactly this type of analysis. Scopus is a large, relational database of citations with a number of features particularly focused on evaluation of the research output of both individuals and institutions (Scopus 2018). It became apparent to me that Scopus could be used to improve upon much of the core functionality of my 2012 paper and that it could do so using very straightforward functions of the system, dispensing with much of the data downloading and spreadsheet work my earlier methodology required.

Methods

To begin with, Scopus has the advantage over Web of Science in searching for works-cited because it enables one to dispense with the cumbersome task of compiling a list of possible current research authors within an academic department (Cusker 2012). Likewise, one can skip all of the steps I first described for downloading citation data and then performing complex spreadsheet work to render it suitable for analysis.

Instead use this method:

Launch Scopus
From the Search page (using the default "Documents" tab), enter one's institution name and change the search field to "Affiliation" in the pull-down menu (most institutions are assigned a unique identifier within Scopus; this can be looked up using other system functions, but in general, a plain text name (e.g., "Cornell University")) will be adequate as a starting point
Add a second Affiliation search field with the Boolean operator "AND" and enter one or more identifying terms for the department one is examining (e.g., "civil and environmental engineering").
Open the "Limits" menu and apply any limitations to date range desired
Perform the search
Within the result set, open the Affiliation filter/facet and specifically select one's own institution and hit "Limit To". This serves to backstop the initial selection in step 2, ensuring that papers written by authors with Affiliations of a similar name to one's own are excluded from the set
Some additional refinement or spot-checking using the filters at left may be necessary to ensure that one is looking only or at least primarily at papers written by the authors in the department one is most interested in. For instance, it is possible that if one enters "Cornell" and "systems engineering" as Affiliation fields, a few papers may be in the result set which include multiple authors each fulfilling only one of those criteria (e.g., one author from Cornell, another author from a department of systems engineering at another institution). Librarians will here have to use their own best judgement, possibly by use of the "Subject Area" and/or "Keyword" filters or else to "Exclude" certain specific Authors or Affiliations (and thus, their papers) that one can identify as not being located at one's own institution.
Select "All"
In the additional options menu at the top (marked by 3 ellipses "..."), choose "View References"
A list of top journals cited is thereby generated in the "Source Title" list at left (click "View All" to see all journal titles beyond the top 10).

Now, a sub-selection of this list (a top 10, top 15, top 20, etc.) may be easily compared to a variety of other top journal selections.

Results

For the purposes of this paper, I will compare the results generated from following the above-described Scopus procedure with top journal lists from JCR and Eigenfactor for a selection of journals pertaining to civil and environmental engineering.

One complicating factor arose in that JCR and Eigenfactor rankings are generated only for individual years whereas this Scopus methodology (as well as the earlier, Web of Science-based one from my previous paper) can survey multiple years at the same time. It would be theoretically possible -- but highly labor-intensive -- to collect multiple annual JCR and Eigenfactor rankings and, by averaging the numeric ranks of the journals given, develop a multi-year average. However in the absence of any straightforward, automated means of doing this, I decided not to attempt it.

I did however run two versions of the Scopus search for this example, one drawing on paper citations from just 2016 while the other looked at a 10-year span of 2007 through 2016. I was interested to see if this examination of results over a range of years would yield a substantially different result set than one that looked only at a single year.

Journal Rank	Scopus citations method ("department of civil and environmental engineering, cornell university") Top-cited journals	Scopus citations method ("department of civil and environmental engineering, cornell university") Top-cited journals	JCR ("engineering, civil" and "engineering, environmental") Top-cited journals	Eigenfactor ("engineering, civil" and "engineering, environmental") Top-cited journals
	2007-2016	2016	2016	2016

1	Water Resources Research	Water Resources Research	Applied Catalysis B - Environmental	Environmental Science and Technology
2	Journal of Fluid Mechanics	Environmental Science and Technology	Water Research	Chemical Engineering Journal
3	Environmental Science and Technology	Journal of Hydrology	Chemical Engineering Journal	Journal of Hazardous Materials
4	Applied and Environmental Microbiology	Applied and Environmental Microbiology	Environmental Science and Technology	Water Research
5	Journal of Hydrology	Journal of Fluid Mechanics	Journal of Hazardous Materials	Applied Catalysis B - Environmental
6	Science	Journal of Computational Physics	Computer-Aided Civil and Infrastructure Engineering	Journal of Hydrology
7	Transportation Research Record	Journal of Geophysical Research Atmospheres	Journal of Cleaner Production	Construction and Building Materials
8	Journal of Computational Physics	Transportation Research Part A - Policy and Practice	Environmental Science and Technology Letters	Journal of Cleaner Production
9	Remote Sensing of the Environment	Earthquake Engineering and Structural Dynamics	Environmental Modelling and Software	Energy and Buildings
10	Journal of Water Resources Planning and Management	Physical Review B - Condensed Matter and Materials Physics	Indoor Air	Engineering Structures
11	Coastal Engineering	Physical Revew E - Statistical Nonlinear and Soft Matter Physics	Journal of Industrial Ecology	Building and Environment
12	Geophysical Research Letters	Atmospheric Environment	Energy and Buildings	Waste Management
13	Physics of Fluids	Environmental Modelling and Software	Building and Environment	Ecological Engineering
14	Water Research	Journal of Hydraulic Engineering	Waste Management	Environmental Modelling and Software
15	Physical Review Letters	Ecological Economics	Transportation Research Part B - Methodological	International Journal of Greenhouse Gas Control

This examination yields four lists of journals, no two of them alike. Even the use of my methodology for the 10-year period of 2007-2016 versus just 2016 yields two lists with only six journals in common out of the top 20, with only two of those in the same position in both lists (Water Resources Research (the top journal)and Applied and Environmental Microbiology (the #4 journal in both lists).

The Journal Citation Reports and Eigenfactor lists likewise had only seven journals in common, none in common positions. And most crucially, the comparison of the results of my method for either 2007-2016 or 2016 alone had little in common with either list -- between one and three titles at most, none in common positions.

The upshot of this comparison is that the list of most-cited publications of Cornell University's Department and Civil and Environmental Engineering differs significantly from the top journal lists for the disciplines of civil engineering and environmental engineering provided by JCR and/or Eigenfactor. As before, with my Web of Science methodology, the list of top journals 'locally' differs almost completely from the one for these disciplines as a whole, potentially revealing specific research interests of this department and offering a guide to collection development for the library.

Discussion

The method described here has many general and specific advantages over the prior methodology utilizing Web of Science, as well as having a few caveats

Advantage Over Prior Method: Simpler
As stated above, this method -- provided one has access to the Scopus tool -- is vastly preferable to the Web of Science methodology outlined in my previous paper. It involves working with fully supported, front-end functions of the system rather than a download of raw data and it enables a librarian to skip over a great deal of tedious spreadsheet work and manual counting to render the result 'human-readable.'

Advantage Over Prior Method: Automatically Includes Authors Who Are Not Faculty
Another advantage of this new method is that it includes, by default, authors who are not members of the faculty of a given department. The prior, Web of Science-based methodology took as its first step the construction of an author list that was taken from the departmental directory (Cusker 2012). Technically there is nothing stopping a user of that methodology from including additional names -- for instance graduate students, post-docs, non-faculty researchers and so forth -- but the lists of such personnel are rarely as accessible and complete and the addition of more names simply means more work for the librarian given the old process. Scopus automates and expands the creation of the author name list to reflect, by default, all research authors in a given departmental affiliation, not just faculty.

Advantage Over Prior Method: Not Tied to Specific List of Authors, Especially If Taken Over Many Years
The prior methodology suffered from a potential problem related to the relationship of the author list and the names on said list to the time period examined. If one was looking at more than a few years of coverage, it was almost inevitable that at least one or two faculty would have left the department during that time (and hence their names would likely not appear in the author list, unless one made an effort to research such departures) while other faculty would have joined and yet had fewer total years within which to produce publications, potentially skewing the title list results.

This Scopus process obviates those problems in large degree, insofar as it identifies institutional affiliation in a single step and can account for the affiliation of all authors in all selected years. That is, it can granulate a research author's affiliation at the time he or she publishes a given paper and thus include it in the final result set, even if that author has moved on to another institution since the date of publication.

Remaining Difficulties
Despite these improvements, there remain some caveats in this new method. As noted in step 7 above, chief among them is that, even with the postcoordinated search filter for "Affiliation" set properly, it is still not possible to perfectly 'resolve' a single academic department: Some research authors may collaborate with others from the same institution but not in the same department. Some papers may include the same terms for a given department (e.g., "systems engineering") in contexts separate from a reference to the actual department of the same name and so forth. These problems can be in part addressed, at least if the result set is of reasonable size, by a simple examination of the title and/or abstract of the papers and an exclusion of the obviously non-relevant ones. Still, this process is not entirely scalable and one is likely to get at least a few false positive results, with papers authored by individuals at the same institution but not the correct department, program or sub-unit included in the result set.

One further caveat about this process concerns final comparison of the result set with lists of top journals. For most academic departments, it is possible to find a top journal list corresponding to the academic discipline for which they specialize. However, for some of the newest and most innovative academic programs, as well as for research 'programs' (meaning groupings of faculty or researchers with primary appointments in multiple different regular departments) and for research centers, specifically analogous title lists may not exist.

This can occur in two ways: Either there is simply no analogical discipline for which a top journal list exists (e.g., "nanotechnology"). Alternatively, a given discipline -- and its instantiation as an actual academic department -- may have many sub-specialties. For instance, many universities have a department of "materials science" but a given department may include specialists -- or even exclusively concentrate -- in metals, polymers, "forest products" (wood, paper and cellulose), concrete or more-exotic applications such as biomedical materials. This may make the top-cited journals by research authors in a given department different from the top journal list for a given discipline.

This may in fact be a relevant and useful finding: If one does not already know the areas of focus for an academic department or program, then finding that the journals they cite skew heavily toward one area of research relative to the field as a whole may well be considered valuable information.

Conclusion

I would argue that the methods of gathering information about what journals are truly important at a given institution may be generally ranked as follows, from least- to most-informative:

1.) Journal metrics (e.g., Impact Factor) and corresponding lists of top journals (e.g., JCR, Eigenfactor). These metrics are simply too generalized and are generated by an aggregate of too broad an array of institutions and individuals.

2.) Surveys of faculty at one's institution on their reading habits and/or general opinion on the importance of various journals. User surveys can be informative, but in general are difficult to design well, often have very low response rates, and the respondents often give 'motivated' replies, claiming they use huge numbers of journals in their own research when in fact they actually refer to very few.

3.) Standardized usage statistics (COUNTER reports or a few other types; see below). These statistics 'do not lie' in the sense that they are generally accurate, but on the other hand, clicks are cheap: Individuals browsing online collections may click on results that seem superficially interesting but that they then find to be irrelevant to their interests and which they do not incorporate into their own research.

Some studies have claimed a relatively strong link between usage statistics and citation (Duy 2006), but others note that even when congruent within specific disciplines, these journal lists may nevertheless be at variance with a library's own usage statistics (Schloegl 2010), with journals with a high JR1 score from a COUNTER report nevertheless being actually cited by faculty relatively few times. This difference may be due to a variety of factors. A given journal may be of great general interest (e.g., Nature, Science, etc.) but be infrequently cited or perhaps a given journal may be of interest to a population that does not author much research (e.g., undergraduates).

Another potential source of usage data for journals which could be fruitfully compared to the findings of top journal citations is that provided by some citation management software. For instance, Thelwall (2017) investigated this at the article level using data from Elsevier's Mendeley software. He noticed a strange discrepancy between articles that were highly read but infrequently cited, mostly attributing this distinction to different 'communities' of research readers versus research authors (and hence, 'citers') but he said that both data sources should be considered.

4.) Analysis of what faculty and other research authors at a given institution are actually citing.¹

Whether these analyses and metrics should form the basis of collection development decisions -- what journals to cut, keep or obtain -- for a given library is a separate discussion. The primary focus of this research -- and my earlier study -- is to give a librarian insight into the specific research interests of a given department. Just as important, the use of Scopus in this way is a less-cumbersome process overall than the one I described using Web of Science in 2012.

Notes

¹ Note that all of this is premised upon the assumption that faculty actually read the papers that they cite. In discussions with researchers of this method of analysis, some half-joked that the author of a paper may cite a variety of other papers he or she may be familiar with but has not truly read. They may do this to acknowledge a colleague, mentor or former advisor. Or, less benignly, they may cite a journal editor's papers in an attempt to curry favor. In any case, such behavior does represent at least a minor caveat to this methodology.

References

Blecic, D.D. 1999. Measurements of journal use: An analysis of the correlations between 3 methods. Bulletin of the Medical Library Association 87(1): 20-25.

Cusker, J. 2012. Using ISI Web of Science to compare top-ranked journals to the citation habits of a "real world" academic department. Issues in Science and Technology Librarianship, Summer. DOI: 10.5062/F40V89RB

Duy, J. & Vaughan, L. 2006. Can electronic journal usage data replace citation data as a measure of journal use? An empirical examination. The Journal of Academic Librarianship 32 (5):512-517. DOI: 10.1016/J.ACALIB.2006.05.005

Elkins, M.R., Maher, C.G., Herbert, R.D., Mosley, A. & Sherrington, C. 2010. Correlations between the Journal Impact Factor and three other journal citation indices. Scientometrics 85(1): 81-93. DOI: 10.1007/s11192-010-0262-0

Elsevier Scopus. 2018. Scopus. [Internet] [Cited 2018 April 7]. Available from http://www.scopus.com.

Matsuura, C. & Ogawa, K. 2006. Comparison and effectiveness of citation databases in life science field (Part 1): Web of Science vs. Scopus. Journal Of Information Processing & Management / Joho Kanri 51(6):408-417. (Note: English-language abstract only)

Matsuura, C. & Ogawa, K. 2008. Comparison and effectiveness of citation databases in life science field (Part 2): Web of Science vs. Scopus. Journal Of Information Processing & Management / Joho Kanri 51(7):499-510. (Note: English-language abstract only)

Sarkozy, A., Slyman, A. & Wu, W. 2015. Capturing citation activity in three health sciences departments: A comparison study of Scopus and Web of Science. Medical Reference Services Quarterly 34(2):190-201. DOI: 10.1080/02763869.2015.1019747

Schloegl, C. & Gorraiz, J. 2010. Comparison of citation and usage indicators: The case of oncology journals. Scientometrics 82 (3):567-80. DOI: 10.1007/s11192-010-0172-1

Thelwall, M. 2017. Why do papers have many Mendeley readers but few Scopus-indexed citations and vice versa? Journal of Librarianship & Information Science 49(2):144-151. DOI: 10.1177/0961000615594867

Tucker, C. 2013. Analyzing faculty citations for effective collection management decisions. Library Collections, Acquisitions, and Technical Services 37 (1-2):19-33. DOI: 10.1016/J.LCATS.2013.06.001

Contents

This work is licensed under a Creative Commons Attribution 4.0 International License.

Previous	Contents		Next
Issues in Science and Technology Librarianship		Spring 2018
DOI:10.5062/F4XP735H