Previous Contents Next
Issues in Science and Technology Librarianship
Winter 2018


Using Existing Bibliographic Resources to Compile Faculty Publication Lists: a Case Study from San José State University

Ngoc-Yen Tran
Sciences and Research Impact Librarian

Emily K. Chan
Interim Associate Dean for Research and Scholarship

Dr. Martin Luther King, Jr. Library
San José State University
San José, California


With limited campus resources for faculty scholarship, the College of Science (CoS) at San José State University (SJSU) developed scholarly output metrics as a way to add a quantitative component to the distribution of funds, to ensure objectivity, and to reward proven researchers. To support CoS's efforts to identify and quantify science faculty research publications, we compiled a bibliography of science faculty research and scholarship which would be used to develop and formalize baseline research metrics. Using existing and available resources including librarian time, subscribed science databases, and the institution-subscribed reference citation manager, we developed a method by which any library can provide this specialized service.


Despite a limited and flat budget, San José State University (SJSU), an M1 Carnegie classification institution, has made a commitment to invest in faculty research, scholarship, and creative activities (RSCA) by developing small grant opportunities both at the University and college levels. At the University level, the Division of Academic Affairs and the CSU Chancellor's Office have made grants available to individual faculty. At the college level, deans have had the purview to distribute these funds in the manner they see fit, usually after faculty, administrator, or committee review of completed proposals. The College of Science (CoS) developed scholarly output metrics as a way to add a quantitative component to the distribution of these funds, to ensure objectivity, and to reward proven researchers. The College sought assistance from the University Library to support its efforts to identify and quantify the research publications of science faculty.

Literature Review

For decades, academic librarians have been tracking faculty scholarship for themselves and for stakeholders such as faculty, administrators, and the general public (McKee & Feng 1979). Depending on the audience and available resources, the output of faculty citations took on a variety of forms. Initially, bibliographies were compiled as printed reports, white papers, or assembled into books (McKee & Feng 1979; Tabaei et al. 2013). Later, with the onset of new technologies, the citations were displayed in a variety of ways, including: institutional repositories to promote scholarship; author lists for recognition events; web pages of recent faculty publications; and as searchable online databases using Oracle, Access, or MySQL (Armstrong & Stringfellow 2012; Schwartz & Stoffel 2008). For some institutions, these resources facilitated both internal and external discovery of faculty output on university websites (Ram & Paliwal 2016).

The availability of resources and creation of new technologies also allowed for different ways of generating citations lists, making it easier and less laborious for institutions to track research output. Utilizing computerized searching and Web 2.0 technologies, McKee and Feng (1979) assembled a printed list of faculty publications by searching MEDLINE and SCISEARCH databases. Connor (2008) created a wiki that allowed faculty to contribute to a research and scholarship database. Currently, commercial products are available to help identify, create, update, and display faculty citations in a variety of file formats (Dresbeck 2015).

Depending on the audience, the purposes and uses of faculty scholarship lists have varied. Within the library, the lists have been used as a tool for collection development and general liaison responsibilities (Marsalis & Kelly 2004). Outside of the library, librarians have been providing both print bibliographies and searchable online databases of faculty publications in order to promote the professional services of the library. Beyond library uses, faculty citation lists have been used in internal and external reports (McKee & Feng 1979), for annual recognition events, as a way for researchers to identify potential collaborators or to find student assistants (MacCorkle 1991), and to assess faculty research output. Greater visibility of faculty scholarship can improve the research standing and reputation of the institution.

Librarians are uniquely positioned to facilitate the generation of faculty bibliographies because of their expertise in bibliographic searching, including identifying sources and search terms, and filtering search results in an efficient and accurate manner. Further, librarians' deep understanding of diverse scholarship, indexing, and database architecture can greatly contribute to this process, especially in the collocation and organization of faculty citations.


Some colleges or departments have found it less arduous to disburse the internal grant monies or course buy-outs for RSCA through an application process. For CoS, metrics were developed as a way to dispense these resources in a fair manner and to reward proven researchers. Their quantification process starts with the CoS Associate Dean requesting that full-time faculty who qualify for RSCA grants provide information about their publications, presentations, and grants received. Using a weighted point system, faculty are allocated points to determine an annual score; the most recent five years of values are averaged to determine a final score. When awarding points, the Associate Dean checks each individual faculty-reported citation. It is a process that can take several days to complete because the self-reported data lack uniformity in formatting and completeness; any efforts to enforce compliance with citation style guidelines would likely result in a decreased response rate. However, with an organized and authoritative list of faculty publications culled from online databases, the Associate Dean can more easily ascertain the veracity of the information that faculty provide.

Other administrators, including the Provost, saw the potential in establishing and formalizing research management practices. Not only could the compiled list support the disbursement of internal funds and contextualize the research output of an individual faculty member, it could also show trends in research output within a particular department, college, or across the university. Additionally, the information could be useful in comparing one institution's research output against peer or aspirational institutions and improving the research standing and reputation of the University. The information gathered could also be useful in identifying and tracking funding and funding sources for individuals, departments, colleges, and university. Moreover, an updated database of faculty publications could allow for the automatic display of current faculty output on college or university web sites, facilitating greater visibility which could help a faculty member find collaborators, student researchers, or additional funds for their research.

In Spring 2017, the Provost asked all colleges to develop and formalize metrics for quantifying faculty output as a means for RSCA funds allocation. With this request, it was necessary to consider how to support this type of faculty research management across campus departments. Subscriptions to tools such as SCOPUS and SciVal were considered to aid in compiling and displaying faculty research output. However, with a short deadline and no funds, we, the sciences librarian and the scholarly communications librarian at SJSU, investigated no-cost options.


In compiling the list of CoS faculty publications from the last five years (2012 to 2017), we began by identifying the departments in the college and determining the corresponding subject or multidisciplinary databases that would yield the highest number of citations or the most relevant results on science topics. In Fall 2017, CoS had 138 grant-eligible faculty in the departments of biological sciences, chemistry, computer science, geology, mathematics and statistics, meteorology and climate science, the Moss Landing and Marine Biology Lab, physics and astronomy, and the science education program. The databases identified included ACM Digital Library, Biological Abstracts, ERIC, Education Research Complete, Geological Society of America, GeoRef, MathSciNet, PubMed, SciFinder, and Web of Science. It is important to note that we consulted only freely available and currently subscribed databases. Due to its lack of an export function and the inability to organize results, we did not use Google Scholar. To assist in the project's accuracy, the CoS Associate Dean provided a list of RSCA-eligible tenured and tenure-track faculty.

Our goal was to search the lowest number of databases and still retrieve 100% of faculty citations. To this end, we developed search strategies for each database using its native search facets and limiters. In considering facets, author institution and publication dates were critical to generating citations for any given time period. Secondarily, we sought to apply a limiter for department or often-used keywords for accuracy. For subject-specific databases that offered the department option, we applied the departmental secondary limiter for more accurate results. However, for non-subject-specific, multidisciplinary databases, the inclusion of the CoS department was necessary in order to exclude citations published by authors of SJSU but from another college (Table 1). For example, in GeoRef, the department limiter was not available but a simple affiliation search refined to the specified years was sufficient for good results. In Web of Science, the search facets were more complex and included an "address" facet to limit to only CoS departments.

Table 1: Database facets and rationale

Primary or secondary facet Facet Rationale to use or not to use
Primary Author affiliation Used to limit to San José State University faculty only
Primary Publication date Used because Associate Dean of College of Science requested citations from the last 5 years (2012-2017) only
Secondary Department Used in non-subject-specific or multidisciplinary databases (if available) to obtain College of Science results. Can also be used if option is available in subject-specific databases.
Secondary Keywords Used when department limiter is not available and database coverage is broad

In instances where the database lacked a limiter for a department and the content was broad, keywords were used to get more accurate results. For example, in ERIC and Ebsco's Education Research Complete, the term "science education" was included to differentiate from SJSU authors in the College of Education.

Additionally, we devised search strategies that incorporated the native databases' search parameters and options. For example, in SciFinder, the company name search was applied, refined to the last five years, and further refined to the specific CoS department. Detailed search strategies are available in Table 2.

Table 2: Search strategies

Database name Search limiters and terms or search strings
ACM Digital Library Author affiliation = "san jose state university"
Publication year = 2012-2017
Biological Abstracts Address = San Jose State Univ*
Address = "Dept Biol" OR "Dept Chem" OR "Dept Comp Sci" OR "Dept Geol" OR "Dept Math & Stat" OR "Dept Meteorol & Climate Sci" OR "Moss Landing Marine Lab" OR "Phys & Astrono"
Year Published = 2012-2017
Education Research Complete (EBSCO) Select a field (optional) = "san jose state university" AND "science education"
Publication Date = 2012-2017
ERIC (EBSCO) Select a field (optional) = "san jose state university" AND "science education"
Publication Date = 2012-2017
Geological Society of America Author Search = name of faculty from roster
Refine: Date = 01/01/2012-09/06/2017
(repeat search)
GeoRef (ProQuest) Author affiliation - AF = San Jose State University
Publication date = 2012-2017
MathSciNet "(Institution=(san jose state university)) AND pubyear in [2012 2017]"
PubMed ((san jose state university[Affiliation]) AND (department of biological sciences[Affiliation] OR department of chemistry[Affiliation] OR department of geology[Affiliation] OR department of physics[Affiliation] OR department of meteorology[Affiliation] OR department of mathematics[Affiliation]OR department of computer science[Affiliation]OR science education[Affiliation]OR moss landing[Affiliation])) AND ("2012"[Date - Publication] : "2017"[Date - Publication])
SciFinder Company name search: "San Jose State University"
Refine: Date = 2012-2017
Refine: Company name = department name
(repeat search with every department name)
Web of Science Organization-Enhanced = ("San Jose State University") AND
Address = "Dept Biol" OR "Dept Chem" OR "Dept Comp Sci" OR "Dept Geol" OR "Dept Math & Stat" OR "Dept Meteorol & Climate Sci" OR "Moss Landing Marine Lab" OR "Phys & Astrono" AND
Year Published = 2012-2017

It is worth noting that with 138 current science faculty authors and the complexity of disambiguating author names across multiple databases, we did not include a facet to limit by author name (Scoville et al. 2003). This strategy ensured that we captured all science faculty citations during the given time period, regardless of their current institution affiliation.

All of the databases offered an export option to .txt or .csv file. To determine the extent of available data, we exported citations in both .txt and .csv file formats and viewed them within Microsoft Excel. We found that citation metadata richness and formatting for the bibliographic information of greatest interest to us, particularly author names, affiliation, and reprint author data, varied greatly among the databases in the native database export option (Table 3). It should also be noted that most databases only index the primary author's affiliation. In these instances, some publications may be missed, but could be accounted for via a Web of Science or PubMed database search.

Table 3: Author and affiliation information by database

  Author name convention Identifies all author affiliations Identifies reprint author
ACM Digital Library Last Name, First Name MI No No
Biological Abstracts Last Name, First Name MI No Yes
Education Resource Complete (EBSCO) Last Name, First Name Middle Name No No
ERIC (EBSCO) Last Name, First Name Middle Name No No
Geological Society of America Last Name, First Name MI No No
GeoRef (ProQuest) Last Name, First Name MI No No
MathSciNet Last Name, First Name MI No No
PubMed Last Name, FI.MI Yes No
SciFinder Last Name, First Name MI No No
Web of Science Last Name, First Name MI Yes Yes

Rather than use data exported directly from the native interfaces, we decided to import the citations into RefWorks, where we could generate more uniform output and could exploit its de-duplication function. RefWorks was chosen because of SJSU's institutional license; other citation manager products could be used, as many offer a de-duplication feature. Before de-duplication, citations from each database were imported into RefWorks and tagged with the name of the source database. Understanding which database generated the citation was important; this process was used to inform existing overlap among the databases and helped to identify the essential databases that would need to be searched to ensure that all CoS faculty publications were included in the search results.

In RefWorks, we examined each exact and close duplicate to remove identical citations. In determining which citation record to keep for duplicates, we examined the data quality from each database and created rules for selection (Table 4). For example, if there were two records for the same citation -- one from Web of Science and another from Biological Abstracts -- the Web of Science record was selected because full author names were available and the reprint/corresponding/first author was identified.

We noted the number of citations that were associated with the searched databases prior to and after the de-duplication process to identify the core databases to search for subsequent updates. Among our searched databases, we discovered 100% overlap in coverage between PubMed and Biological Abstracts with Web of Science. Web of Science offered the highest number of results with the richest metadata; in the future, we could search Web of Science in lieu of performing individual searches in PubMed and Biological Abstracts. Additionally, we discovered significant redundancies between GeoRef and Geological Society of America, and ERIC and Education Research Complete. The overlap was not 100%, however, so all databases would be retained for subsequent searches.

Table 4: Prioritization list for selecting records when there is a duplicate

If overlap exists between... Prioritize... Rationale
Web of Science and SciFinder Web of Science Web of Science includes author affiliation for all authors, reprint or first author, and full first name of authors; SciFinder does not include reprint or first author
Web of Science and PubMed Web of Science Web of Science includes author affiliation for all authors, reprint or first author, and full first name of authors; PubMed does not include reprint or first author and provides minimal author name information (first initial of author first name)
SciFinder and PubMed SciFinder SciFinder includes full first name of authors; PubMed only includes initial of author first name
Web of Science and Biological Abstracts Web of Science Web of Science includes author affiliation for all authors, reprint or first author, and full first name of authors
GSA and GeoRef GeoRef GeoRef possessed the greatest number of results (ensuring citation uniformity as much as possible)
Biological Abstracts and PubMed Biological Abstracts Biological Abstracts includes full first name of author and has reprint or first author; PubMed only includes first initial of author first name
MathSciNet and ACM and Web of Science Web of Science Web of Science includes author affiliation for all authors, reprint or first author, and full first name of authors
SciFinder and MathSciNet SciFinder SciFinder includes first author affiliation
SciFinder and Biological Abstracts SciFinder Higher number of SciFinder results compared to Biological Abstracts (ensuring citation uniformity, as much as possible)
ACM Digital Library or MathSciNet MathSciNet Periodical title is given in full in MathSciNet
Web of Science and GeoRef Web of Science Higher number of Web of Science results; no author affiliations in GeoRef
GeoRef or SciFinder SciFinder SciFinder includes full first name of authors; GeoRef only includes initial of author first name

After applying RefWorks' de-duplication tool, all citations were exported to Excel. Unnecessary metadata (e.g., accession numbers) were removed and the spreadsheet was provided to the CoS Associate Dean, along with the list of essential databases searched.

Results and Discussion

The Associate Dean was pleased with obtaining a list of faculty citations across all discipline areas. At a follow-up meeting to discuss the utility of the provided information, as well as how this process could be streamlined in the future, certain issues were raised: the inclusion of false positives in the data, author disambiguation, and service credit for authors who were affiliated with other institutions prior to their tenure at SJSU.

False positives were difficult to eliminate from the data set. We had intentionally opted to design database search strings that would broadly encompass work produced by SJSU faculty. The secondary limiter to designate a department was only available in select databases (Biological Abstracts, PubMed, SciFinder, and Web of Science); without the department limiter, SJSU faculty citations from outside CoS appeared in the search results. While we could have used the supplied CoS faculty roster to isolate RSCA-eligible citations, other concerns of unduly removing student- and adjunct faculty-authored works arose. We were fully aware that the citation list included work by students who had likely graduated, faculty who had left the university and were affiliated with other institutions, faculty who had retired, and faculty whose names may have changed. We determined that it was preferable to include these potential outliers in the data set; at the onset of the project, the Associate Dean had assured us that he was willing to invest his time in further refining the list of citations. In the absence of this offer, we could have readily employed more stringent inclusion criteria or cross-referenced our results with the supplied faculty roster. Ultimately, he used our list to ensure accuracy of faculty self-reported data, from groups that did and did not contribute information, and to ascertain the general research productivity across the entire college.

Due to the number of false positives, the Associate Dean inquired why we did not perform a combined institution and author search. We explained how naming conventions varied across the databases: punctuation marks in last names were problematic, middle initials were not consistently applied, and a major database did not index full first names. We could account for these variances through search strategies; however, we could not ensure that retrieval of CoS faculty citations would be 100%. Additional difficulties could arise from name changes and tracking that level of information on the individual level. The conversation turned to author disambiguation and we suggested that moving forward, it would be in the best interest of the college to have all faculty register for an author identifier through established programs, like ORCiD or ResearcherID and to encourage faculty to actively maintain their publication registries. This would allow the retrieval of faculty citations in a more targeted and precise manner, ensuring greater accuracy in the resulting data sets. Further, ORCiD identifiers have been increasingly adopted as a standard metadata field in bibliographic records; however, the likelihood of retrospective application of ORCiD IDs to older publications and those published prior to faculty registration would be dependent upon publisher practices and the individual efforts of the faculty member.

Another topic of discussion was retrieval of citations from faculty who had received service credit upon SJSU hire. These faculty members would have been affiliated with other institutions, but any publications produced within the service credit timeline would be admissible in retention, tenure, and promotion dossiers. Currently, there have been few who have been in these situations, but given the general propensity to raising the university's research profile and increasing RSCA activity, this may become a more common occurrence as program growth and development center around key faculty.

Another noteworthy subject was the inclusion of citations that appeared as both conference proceedings and journal articles. For the Associate Dean, format type was important in the RSCA weighting system. We did not eliminate any close duplicate citations for which the major difference was format type. We were aware that some publishers had agreements with established conferences, whereby conference proceedings were repackaged in journal format. It was the purview of the Associate Dean to determine how these citations would be factored into the faculty's productivity measure. Maintenance of the faculty citation list is a challenge using this method because the results are limited to the point in time in which the search was last performed. However, this challenge was easily remedied with the librarian conducting a search for the timespan not included in the last search, exporting the results into RefWorks, removing duplicates, and providing an updated Excel spreadsheet. Additionally, creating an account within each database and setting up email alerts for new items that fit the search criteria is an effective way to receive updated faculty citations.


The College of Science tasked the Library with compiling a list of its faculty publications to assist with internal distribution of research, scholarship, and creative activity (RSCA) monies and resources. With a short timeline and no funds, we accomplished this using a combination of existing subscriptions to multidisciplinary and subject-specific databases. Instead of using the native database export option, we exported all results to RefWorks to take advantage of its de-duplication tools and to ensure output uniformity. Other reference manager products can deliver very similar results.

Using the methods set forth in this paper and the lessons we have learned, other librarians may offer this type of service to university administration at relative low cost. The bulk of our invested time (approximately one work week between two librarians) was spent on determining the core databases to search, their degree of overlap, and metadata quality across the various resources. Our project focused on science departments and disciplines, which revere the journal article as the primary metric of scholarly output. Other fields of study with their corresponding indexes and databases would accommodate additional format types.

For readers who are considering offering this type of service, it should be noted that SJSU is a large comprehensive master's level university that generates 300-400 annual citations in Web of Science. Scaling a project of this nature may be difficult, depending on one's university size, areas of expertise, research output, and the availability of indexes that would provide the most comprehensive subject coverage. For institutions that may not be able to afford the tools that provide more robust functionality, this method offers a low-cost option for generating faculty citation lists with nominal ongoing efforts and maintenance. By automating notification of newly published faculty citations with saved alerts, one can continually add manageable numbers of citations to the original list.

Librarians can play a role in further discussions of the evolving nature of scholarly output, the manner in which to best capture that information, and assessing new tools or measures for quantifying the impact of faculty research. Librarians can seek out these types of strategic partnerships with campus stakeholders. This project highlighted the value that librarians can bring to retrieving and organizing faculty research output. We were able to leverage our deep knowledge of the databases and demonstrate a high level of understanding on the nuances of the project. In the process, we gained a greater awareness of the research being conducted in the College of Science, a better understanding of the campus' research priorities, and were exposed to various ways to quantify, promote, and fund RSCA projects. All of these strengthen the library's ability to meet evolving campus research, scholarship, and creative activity needs.


Armstrong, M. & Stringfellow, J. 2012. Promoting faculty scholarship through the University Author Recognition Bibliography at Boise State University. The New Review of Academic Librarianship 18(2):165-175. doi: 10.1080/13614533.2012.717901

Connor, E. 2008. Using wiki technology to build a faculty publications database. Journal of Electronic Resources in Medical Libraries 4(4), 11-25. doi: 10.1300/J383v04n04_02

Dresbeck, R. 2015. SciVal. Journal of the Medical Library Association 103(3):164-166. DOI: 10.3163/1536-5050.103.3.018

MacCorkle, L. 1991. Publishing an annual faculty bibliography at the University of Miami. Information Technology and Libraries 10(2):121-127.

Marsalis, S. & Kelly, J. 2004. Building a RefWorks database of faculty publications as a liaison and collection development tool. Issues in Science and Technology Librarianship 40. doi: 10.5062/F4QZ27WK

McKee, A.M. & Feng, C.C.H. 1979. Using computerized literature searches to produce faculty publications lists. Bulletin of the Medical Library Association 63 (7):333-335. Available from:

Ram, S. & Paliwal, N. 2016. Management of university research publication: a case study of JUIT Publication Database (JPubDB). DESIDOC: Journal of Library & Information Technology 36(4): 212-219.

Scoville, C.L., Johnson, E.D., & McConnell, A.L. 2003. When A. Rose is not A. Rose: the vagaries of author searching. Medical Reference Services Quarterly 22(4), 1-11. DOI: 10.1300/J383v04n04_02

Schwartz, V & Stoffel, B. 2007. Building an online faculty publications database: an alternative to the institutional repository. College & Undergraduate Libraries 14(3):1-25. doi: 10.1300/J106v14n03_01

Tabaei, S., Schaffer, Y., McMurray, G. & Simon, B. 2013. Building a faculty publications database: a case study. Public Services Quarterly 9(3):196-209. doi: 10.1080/15228959.2013.816127

Previous Contents Next

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License. W3C 4.0