Issues in Science and Technology Librarianship
Physics & Astronomy Librarian
Ithaca, New York
eScience Fellow Graduate Student
Syracuse University School of Information Studies
Syracuse, New York
eScience Fellow Graduate Student
Syracuse University School of Information Studies
Syracuse, New York
Research Data & Environmental Sciences Librarian
Ithaca, New York
Research libraries have sought to apply their information management expertise to the management of digital research data. This focus has been spurred in part by the policies of two major funding agencies in the United States, which require grant recipients make research outputs, including publications and research data, openly available. As many academic libraries are beginning to offer or are already offering assistance in writing and implementing data management plans, it is important to consider how best to support researchers. Our research examined the current data management requirements of major US funding agencies to better understand data management requirements facing researchers and the implications for libraries offering data management services for researchers.
In the past decade, announcements from two major funding agencies in the United States have brought attention to the value of making publicly funded research openly available. Recipients of National Institutes of Health (NIH) funding have been required to submit the final, peer-reviewed versions of their papers to PubMed Central, where they will be freely accessible on the web (National Institutes of Health 2005; National Institutes of Health 2008). In 2010, the National Science Foundation (NSF) announced that proposals submitted after January 17, 2011 would require a formal data management plan that would address the agency's data sharing policy (National Science Foundation 2010a). Other federal funding agencies besides NSF have expressed their interest in well-managed, publicly available data from scientific research. Representatives from over twenty federal agencies comprised the Interagency Working Group on Digital Data, which produced a report in 2009 calling for the development of a robust framework to accommodate the effective management of scientific data in order to maximize the return on investment of federally funded research (Interagency Working Group on Digital Data 2009). The report also recommended that funding agencies promote data management planning, outlining several key components of a robust data management plan.
Alongside these developments, research libraries have sought to apply their information management expertise to the management of digital research data. A 2007 report by the Association of Research Libraries (ARL) Joint Task Force on Library Support for E-Science recommended that the ARL develop programmatic efforts to inform librarians about issues in data-intensive scientific research (Association of Research Libraries 2007). Since then, there have been a number of reports and papers outlining how the principles of librarianship can be applied to address the management of digital data (Gold 2007a; Gold 2007b; Salo 2010; Steinhart et al. 2008; Brandt 2007) and describing the experiences of information professionals actively involved in e-Science projects (Garritano & Carlson 2009; Steinhart & Lowe 2007; Soehner et al. 2010).
The NSF's announcement about mandatory data management plans for grant applications (National Science Foundation 2010b) provided a catalyst for university libraries to expand their data outreach efforts. Many responded by offering workshops and web sites on how to write an NSF data management plan, as NSF's own documentation was quite general. More funding agencies are expected to require researchers to articulate their plans for managing, preserving, and providing access to their data in the grant application stage. It is important for information professionals to think systematically about how best to support researchers, especially when requirements are vague or unclear.
With this context in mind, we examined the current data management requirements of selected major US funding agencies. We intended to gain a sense of the data management requirements facing researchers and what information and direction funders provided to applicants on how to fulfill those requirements. This paper details our findings and presents implications for libraries offering or seeking to offer data management guidance to researchers.
Our goal was to survey common funding agencies for academics at research institutions. We analyzed data from Cornell University's Office of Sponsored Programs to determine the top funding agencies and ranked them by number of grants awarded. We then removed any funder specific to New York State or to research programs at Cornell that might not be representative of research universities in general. We believe the resulting list of funders is reasonably representative of major research universities (Table 1).
Table 1: Agencies investigated for data management policies
|National Science Foundation||NSF||http://www.nsf.gov/|
|National Institutes of Health||NIH||http://www.nih.gov/|
|United States Department of Energy||DoE||http://www.energy.gov/|
|Office of Naval Research||ONR||http://www.onr.navy.mil/|
|United States Department of Education||DoEd||http://www.ed.gov/|
|United States Environmental Protection Agency||EPA||http://www.epa.gov/|
|United States Agency for International Development||USAID||http://www.usaid.gov/|
|National Oceanographic and Atmospheric Administration||NOAA||http://www.noaa.gov/|
|American Heart Association||AHA||http://www.heart.org/|
|Alfred P. Sloan Foundation||Sloan Foundation||http://www.sloan.org/|
We then attempted to locate official data management and sharing policies at the agency level and for any major program or unit under each agency. We employed several strategies for locating this information. We searched each funder's web site using phrases such as "data sharing", "data management", or "data policy"; we searched for instructions for grant applicants; and we contacted two agencies directly when we were unable to find any information on the organization's web site. In total, we found 22 policies for the 10 funders.
Next, we established criteria to analyze the policies we found. We used the Digital Curation Centre's analysis (Digital Curation Centre n.d.) of funders' data policies in the UK as a starting point for our rubric. We extended or refined original DCC concepts to arrive at 18 criteria: for instance, we divided the DCC concept of "Published Outputs" ("a policy on published outputs e.g., journal articles and conference papers") into three distinct categories, "Open Access to Publications", "Publication Repository Specified", and "Publication Repository Supported". In this case, we believed it was important to distinguish funders that suggested potential repositories for publications from those that actually supported or maintained repositories. We also added concepts not listed by the DCC to address data embargoes, data and metadata standards, funding for data management, and methods for ensuring compliance with data management requirements. We clustered related elements into categories, to facilitate analysis of trends across policies (Table 2).
Table 2. Data policy elements
|Element||Element Description||Element Category|
|Organizational data Policy||Funder recommends or requires that a project have a policy for management of research data.||General policy|
|Data Plan for Proposal||Funder recommends or requires a data management plan as part of all research proposals.||General policy|
|Data Timeframe||Funder recommends or requires a particular timeframe for the data management plan to be implemented.||General policy|
|Compliance||Funder monitors or enforces compliance.||General policy|
|Funding||Funder specifies if funding for any aspect of data management can be written into a proposal.||General policy|
|Scope||Whether a funder's data management policy applies to all research data, or only to certain types of data.||General policy|
|Guidance||Whether funder provides guidance for meeting data management requirements.||General policy|
|Data Standards||Funder recommends or requires specific file formats or other standardization of data.||Standards|
|Metadata Standards||Funder recommends or requires use of particular metadata standards.||Standards|
|Data Access||Funder recommends or requires access to data.||Access and preservation|
|Data Embargo||Funder addresses embargo periods for data.||Access and preservation|
|Data Preservation||Funder recommends or requires preservation of data.||Access and preservation|
|Data Center||Specified Funder specifies that data be deposited in a particular data center.||Access and preservation|
|Data Center Supported||Funder supports or maintains a data center for use by funding recipients.||Access and preservation|
|Open Access to Publications||Funder recommends or requires open access to resulting publications.||Publications|
|Publication Repository Specified||Funder specifies that publications are to be deposited in a particular repository.||Publications|
|Publication Repository Supported||Funder supports or maintains a publication repository||Publications|
|Date of policy||Any date of issue or posting date on the policy.||-|
Once we had conducted a thorough search of all of the web sites of the funders on our list, we analyzed the content of the policy documents we found. We reviewed each document for sections with language relevant to each category. We recorded this information on separate worksheets, one for each policy, noting when no information in a category was found. All of the worksheets contained direct quotes for and paraphrased information from each policy as it read at the time of retrieval. For each criterion examined in each policy document, we devised a coding scheme to indicate its level of specificity and if it was a suggestion or requirement. The tables and figures included in the subsequent Results and Discussion sections of the paper represent the relevant subsets of our analysis derived from the complete chart; due to size constraints the complete chart with the coding for all of the policies and all of the data elements is included as an appendix to this paper.
Next, we tallied how often each criterion or a cluster of related criteria were indicated in policy documents as requirements and how often there were clear explanations of expectations for PIs contained within these policy documents. We also looked at subsets of the policies to unearth additional trends, such as comparing agency-wide policy documents with policies for individual units within an agency. From these tallies, we were able to tease out trends and consider these trends in light of greater involvement in research data curation activities by libraries.
Overall, data policies were missing a significant number of the elements identified in our rubric (Table 3). We found that no single policy addressed all 17 of the elements. Eleven policies addressed fewer than half of the elements, including four of the funders that appeared to have no policy at all -- the United States Environmental Protection Agency (EPA), Office of Naval Research (ONR), American Heart Foundation, and the Sloan Foundation. The following sections detail several specific trends identified in our analysis.
Table 3. Percent of total data elements addressed by policy. Agency-wide policies are in bold.
|Research Funders||Percent of total elements addressed|
|National Science Foundation (NSF)||53%|
|NSF Basic Research to Enable Agricultural Development (BREAD)||59%|
|NSF Division of Earth Sciences (EAR)||65%|
|NSF Division of Ocean Sciences||59%|
|NSF Integrated Ocean Drilling Program||47%|
|NSF Ocean Acidification Research||59%|
|NSF Office of Polar Programs||59%|
|NSF Engineering Directorate||59%|
|NSF Social Behavioral and Economic Sciences||41%|
|National Institutes of Health (NIH)||82%|
|NIH - Genome-Wide Association Studies (GWAS)||76%|
|NIH - National Human Genome Research Institute||88%|
|United States Department of Agriculture (USDA)||53%|
|National Aeronautics and Space Administration (NASA) - Heliophysics||59%|
|National Aeronautics and Space Administration (NASA) - Earth Sciences||65%|
|United States Department of Energy (DOE)||12%|
|DOE Atmospheric Radiation Measurement Program (ARM)||76%|
|Office of Naval Research (ONR)||0%|
|Office of Naval Research Policy for In Situ Ocean Data (ONR)||35%|
|United States Department of Education (DoEd)||12%|
|United States Environmental Protection Agency (EPA)||0%|
|United States Agency for International Development (USAID)||24%|
|National Oceanographic and Atmospheric Administration (NOAA) Climate Observations and Monitoring (COM)||41%|
|National Oceanographic and Atmospheric Administration (NOAA) Coastal Ocean Program (COP)||53%|
|American Heart Association||0%|
Data policy parameters are typically general in scope
The data policies we surveyed had more general than specific data requirements. The most commonly found requirements across all policies pertained to general data management activities (Table 4), though these were often well described (Figures 1 and 2). We found that while many policies stated that grant recipients were required to make their research data available and include a data management plan in grant proposals, more focused language, such as how to budget for data management or requirements to conform to standard data and metadata formats was less frequently mentioned in policies (Table 4).
Table 4. Data management requirements by element
|Data Element||Number of data policies with a requirement for this element|
|Organizational Data Policy||20|
|Data Plan in Proposals||17|
|Access and preservation|
|Data Center Specified||15|
|Data Center Supported||6|
|Open Access to Publications||6|
|Publication Repository Specified||6|
|Publication Repository Supported||3|
Figure 1. Analysis of data policies for general policy parameters
Figure 2. Analysis of data policies for general policy parameters (continued)
Several of the agencies investigated had multiple data policies: one policy was in effect for the entire agency, while units within that agency sometimes had policies that supplemented and/or superseded the agency-wide guidelines. We anticipated that unit-specific policy documents would be more specific and descriptive than agency-wide documents as attention to data management has already been part of the culture of several scientific disciplines and it is probably easier to provide specific data management guidance for smaller, more clearly defined arenas. In general, we did find that unit-specific policies were more detailed and descriptive overall (Tables 5 and 6) as unit-specific policies were more likely to specify a particular data center for data publication and require a timeframe to implement a data management plan. For instance, the unit-specific data policy for the NSF Basic Research to Enable Agricultural Development (BREAD) states that data "must be released according to the currently accepted community standard ... to public databases (GenBank, if applicable) as soon as they are assembled and the quality checked" (National Science Foundation 2010c).
While more specific guidance (for example, on the topics of data and metadata standards) was present in only a fraction of the unit-specific policies, it was still more common there than in agency-wide policies. The most common requirements of unit-specific policies were similar to those the agency-wide requirements (Table 7). These were quite general in scope, such as requiring data sharing, implementing the data management plan within a certain timeframe, and mandating the inclusion of a data management plan in grant proposals.
Table 5. Percent of data element categories well described for unit-specific policies. Refer to Table 2 for categorization of elements.
Table 6. Percent of data element categories well described for agency-wide policies. Refer to Table 2 for categorization of elements.
Table 7. Top requirements mentioned in policies.
|Data Element||Percent of unit-specific policies with a requirement for this element||Rank of element in unit-specific policies||Percent of agency-wide policies with a requirement for this element||Rank of element in agency-wide policies|
|Organizational Data Policy||94%||1||50%||1|
|Data Plan in Proposals||88%||2||30%||3|
|Data Center Specified||81%||3||20%||4|
|* Cells with a dash "-" indicate that the element was not ranked in the top five for the corresponding category.|
|** Data policy requirements had to appear in at least two policies to be included in this table.|
Data policies often don't address standards thoroughly
Overall, data policies were not very prescriptive regarding standards for data and metadata. Half of the unit-specific policies required that data be in standard formats, while none of the agency-wide policies did (Figure 3). In the group of policies that did require standards, many did not provide a thorough description of the requirement while some provided a great amount of detail. Some policies mentioned that a PI is required to share data in recognized standard formats, but did not identify which standards are acceptable to use or how to assess existing standards. As an example, the NSF-wide policy states that a data management plan "may include", in addition to other information, "the standards to be used for data and metadata format and content" or explanation in the event that existing standards are "absent or deemed incomplete" (National Science Foundation 2011). There is no further guidance on how to assess standards or a list of acceptable standards. In contrast, the DOE Atmospheric Radiation Measurement data policy lists various standards PIs can use, in some cases indicating the rationale: for example, "NetCDF is the preferred data format because it supports efficient data storage and reliable/robust documentation of the data structure" (DOE n.d.). The policy continues to enumerate acceptable standards for documents, graphics, and movies and provides a list of instrument details to include in metadata (DOE n.d.). Overall, the majority of policies that addressed standards -- either requiring their use or suggesting it -- did refer to both data and metadata standards but there were several policies that addressed one and not the other (Figure 3).
Figure 3. Analysis of data policies for data and metadata standards
Data policies concentrate on access more than preservation
The data suggested there was a greater emphasis on access to research data than on preservation (Figures 4 and 5). Both were mentioned as requirements for PIs quite frequently, but policy documents often had better explanations for how to execute access requirements than preservation requirements. Access requirements were well described a little over half of the time in policies, while preservation requirements were well described a little over a quarter of the time. This apparent emphasis on access is interesting in light of our findings regarding data centers. Quite a few unit-specific policies (and two agency-wide policies) suggested a potential data center where PIs could deposit data (Figures 2 and 3). Fewer policy documents, however, referred to a funder-supported repository for depositing data. For instance, the NASA Heliophysics policy references the National Space Science Data Center (NSSDC), which provides "archiving services available to the NASA Planetary Science, Astrophysics, and Heliophysics programs" (National Aeronautics and Space Administration 2009). In contrast, none of the agency-wide policies referred to such a funder-supported data center, even though half of the policies stated that PIs were required to make research data available to others.
Figure 4. Analysis of data policies for access and preservation
Figure 5. Analysis of data policies for access and preservation (continued)
Publications are infrequently mentioned in data policies
Funders were largely silent about open access to the publications resulting from funded research (Figure 6). Ensuring that publications are freely available was a firm requirement in only a handful of policies. In a little over half of policies where open access to publications was suggested or required, the policy also, at minimum, specified a potential repository for these publications. The agency-wide policy of the National Institutes of Health (National Institutes of Health 2008) was most vocal about the issue, with its policy including clear language about the specific requirement that PIs make publications freely available on the web through PubMed Central. Both of the NIH unit-specific policies we examined referenced this requirement as well.
Figure 6. Analysis of data policies for publications
Our analysis of these policies reveals gaps between data management goals and implementation realities, as policy requirements were vague. Many funders had policies stating data be made accessible yet did not supply implementation details. Funding for data preservation efforts continues to be an area of concern for both PIs and information professionals, yet we found that detailed language about funding for data management activities was only included in eight of the policies we studied. The following sections outline our conclusions based on the themes present across the data policies that were surveyed and identifies opportunities for libraries and librarians to support data management and the implementation of data policies.
Guidance is needed in specific data management areas
In order to realize the IWGDD's vision of leveraging the results of current scientific research to fuel more robust research, it is crucial to ensure that the appropriate data and sufficient documentation are preserved in formats that enable their use by researchers in any domain. We found that data policies infrequently addressed data preservation, data standards, and metadata standards, three key components to IWGDD's vision. When these elements were addressed by a policy, they were often vague. This suggests opportunities for information professionals to help researchers identify appropriate data and metadata standards or map out a sustainable data preservation plan. Information professionals may also provide input to funders to craft policies that are more specific.
Data access is another important issue for data management. Building on the work of past scientific research requires the opportunity for other researchers to find the data and make use of it. More than half of the policies that addressed data access also indicated a suitable data center or repository, though fewer funders actually made mention of a supported data repository. Researchers may find it extremely difficult to find a way to share data, especially when they are unsure of where they can deposit their data, or they are not permitted to budget for the infrastructure necessary to provide access. Information professionals might be able to assist researchers in finding appropriate repositories to deposit data for future access (Steinhart & Lowe 2007; Garritano & Carlson 2009; Gabridge 2009). Depending on the area of study, research need, infrastructure available, and specificity of the policy requirement, this can be an institutional or discipline-specific repository. In addition to opportunities for individualized consultation regarding potential data repositories, several academic libraries have produced data management guides with lists of selected data repositories that may fulfill funder mandates to provide access to research data (North Carolina State University Libraries n.d.; University of Oregon Libraries n.d.). Additionally, other initiatives, such as Databib (n.d.) and DataCite (n.d.) provide information professionals with a continually growing database of repositories for research data.
It is critical to recognize the social complexities of data sharing
Researchers may not oppose sharing their data with others if they are first allowed sufficient time to analyze it themselves. More than half of the policies that address data access did address embargoes in some way, though very few described it robustly. Researchers may not be aware of all of their options for sharing data, and so providing information about data embargoes may address some of the hurdles researchers face when considering how to share their data.
Data management requires a great deal of time and planning to do well, and without incentives or mandates, it can be difficult to justify the time spent on it. If funders do not specify how they intend to monitor compliance, it may be difficult to advocate for managing data throughout its lifecycle, even with comprehensive guidelines and clear expectations for researchers. How funders planned to monitor compliance to requirements was addressed in less than half of the documents we surveyed. This can be an opportunity for libraries to be proactive and build data registry infrastructure so that researchers can demonstrate compliance with data access requirements; two examples include the University of Melbourne Data Registry (University of Melbourne n.d.) and the DataStaR project from Cornell (Cornell University n.d.).
It is likely that funders will increasingly require that researchers make explicit their plans for managing research data, including provisioning for the storage, preservation, and access to high-quality, well-documented digital data. Some funders may be prescriptive about how grant applicants ought to manage research data, while others will be more ambiguous about how grant applicants can satisfy data management requirements.
As we noted, many of the data policies are more likely to be general rather than specific, and so information professionals have the opportunity to become involved, especially in these specific data management areas. Providing guidance in the selection of appropriate standards is one avenue for involvement; other avenues may include providing documentation on evaluation criteria for metadata and data standards or participating in the development of discipline-specific metadata and data standards. Information professionals have opportunities to become more involved in raising awareness of and participating in data preservation efforts, either at their institution or more broadly, as policy requirements tended to stress access over preservation. Resources are emerging from individual libraries and collaborations to help information professionals and researchers find appropriate data repositories. Further, as some academic libraries develop an institutional response to e-science and data curation, there are opportunities to obtain directed and focused assistance with this planning process (Association of Research Libraries 2011).
Understanding the data requirements researchers face with respect to data management is key: knowing which requirements are vague or under-supported reveals opportunities for outreach, education, and participation with researchers. It is our hope that this investigation of data policies helps both researchers and information professionals get a better understanding of the landscape of data management requirements that can be used, informed by available resources, to craft an appropriate response to the growing importance of curating research data.
Association of Research Libraries. 2011. ARL/DLF E-Science Institute. [Internet]. [Cited May 2012]. Available from: http://www.arl.org/rtl/eresearch/escien/escieninstitute/index.shtml
Brandt, D.S. 2007. Librarians as partners in e-research. College & Research Libraries 28(6). [Internet]. [Cited December 2011]. Available from: http://crln.acrl.org/content/68/6/365.full.pdf+html
Digital Curation Centre. n.d. Overview of funders' data policies. [Internet]. [Cited May 2012]. Available from: http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
Garritano, J.R. & Carlson, J.R. 2009. A subject librarian's guide to collaborating on e-science projects. Issues in Science and Technology Librarianship. [Internet]. [Cited December 2011]. Available from: http://www.istl.org/09-spring/refereed2.html
Gold, A. 2007a. Cyberinfrastructure, data, and libraries, Part 1. D-Lib Magazine 13(9/10). [Internet]. [Cited December 2011]. Available from: http://www.dlib.org/dlib/september07/gold/09gold-pt1.html
Gold, A. 2007b. Cyberinfrastructure, data, and libraries, Part 2. D-Lib Magazine 13(9/10). [Internet]. [Cited December 2011]. Available from: http://www.dlib.org/dlib/september07/gold/09gold-pt2.html
Interagency Working Group on Digital Data. 2009. Harnessing the Power of Digital Data for Science and Society: Report of the Interagency Working Group on Digital Data to the Committee on Science of the National Science and Technology Council. [Internet]. [Cited December 2011]. Available from: http://www.nitrd.gov/About/Harnessing_Power_Web.pdf
National Aeronautics and Space Administration. 2009. NASA Heliophysics Science Data Management Policy, Version 1.1. [Internet]. [Cited May 2012]. Available from: http://lwsde.gsfc.nasa.gov/Heliophysics_Data_Policy_2009Apr12.pdf
National Institutes of Health. 2005. Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research. [Internet]. [Cited December 2011]. Available from: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-05-022.html
National Institutes of Health. 2008. Revised Policy on Enhancing Public Access to Archived Publications Resulting from NIH-Funded Research. [Internet]. [Cited December 2011]. Available from: http://grants.nih.gov/grants/guide/notice-files/NOT-OD-08-033.html
National Science Foundation. 2010b. Scientists Seeking NSF Funding Will Soon Be Required to Submit Data Management Plans. [Internet]. [Cited December 2011]. Available from: http://www.nsf.gov/news/news_summ.jsp?cntn_id=116928&org=NSF&from=news
National Science Foundation. 2010c. Basic Research to Enable Agricultural Development (BREAD) Program Solicitation. [Internet]. [Cited May 2012]. Available from: http://www.nsf.gov/pubs/2010/nsf10589/nsf10589.htm
National Science Foundation. 2011. Grant proposal guide: Chapter II - Proposal Preparation Instructions. [Internet]. [Cited May 2012]. Available from: http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_2.jsp#IIC2j
North Carolina State University Libraries. n.d. Data repositories. [Internet]. [Cited May 2012]. Available from: http://www.lib.ncsu.edu/guides/datamanagement/repos.html
Steinhart, G. & Lowe, B. 2007. Data Curation and Distribution in Support of Cornell University's Upper Susquehanna Agricultural Ecology Program. Proceedings of DigCCurr2007. [Internet]. [Cited December 2011]. Available from: http://hdl.handle.net/1813/7517
Steinhart, G., Saylor, J., Albert, P., Alpi, K., Baxter, P., Brown, E., Chiang, K., Corson-Rikert, J., Hirtle, P., Jenkins, K., Lowe, B., McCue, J., Ruddy, D., Silterra, R., Solla, L., Stewart-Marshall, Z., and Westbrooks, E. L. 2008. Digital Research Curation: Overview of Issues, Current Activities, and Opportunities for the Cornell University Library. [Internet]. [Cited December 2011]. Available from: http://hdl.handle.net/1813/10903
University of Oregon Libraries. n.d. Data repositories. [Internet]. [Cited May 2012]. Available from: http://libweb.uoregon.edu/datamanagement/repositories.html