Issues in Science and Technology Librarianship
Christie A. Wiley
Engineering Research Data Services Librarian
Erin E. Kerby
Veterinary Medicine Librarian
University of Illinois Urbana-Champaign
The authors conducted six focus group semi-structured interviews consisting of graduate students and postdoctoral researchers within the College of Engineering at the University of Illinois Urbana-Champaign (UIUC) in order to understand their roles within research groups and their ability to manage research data. More specifically, participants were asked how they manage, organize, and describe data, as well as the challenges they face in these activities. This study revealed graduate students primarily discuss managing research data in terms of the software they use and that their focus is task specific. Additionally, the language and concepts librarians use in conversations about data management creates a barrier to understanding for graduate students. This study confirms that there is a significant disconnect between the faculty members who design and direct research projects and the graduate students and postdoctoral researchers that do the front-line work. This study helped identify that more data management engagement, interaction, and instruction within research groups is needed. Acknowledging this will allow librarians to develop more meaningful data management instruction and enhance the research data support services provided to faculty.
An increased interest by funding agencies, publishers, and academic institutions has led researchers, and the librarians who support them, to assess their current data management practices. Good data management practices result in organized, documented, and preserved research data, allowing discoverability, accessibility and better understanding of the data by other interested researchers. Data are a currency for connecting people and ideas, and sharing the results benefits the scientific community. This is one more reason numerous funding agencies require data sharing and want to know how researchers are managing their data.
Although faculty may be the visionaries, grant writers, and authors of data management plans, graduate students and postdoctoral researchers are the ones collecting much of the data and managing daily operations in the laboratory. The manner in which they handle the data impacts its quality and usability (Carlson and Bracke 2013). Data management is also important because today's graduate students will become tomorrow's scientists, and it is vital they develop good data management habits early on in their research careers.
During the fall of 2015, the lead author conducted interviews with engineering and science faculty members about their research data management practices and perspectives managing research data (Wiley and Mischo 2016). These researchers indicated they oversee between one and twenty graduate students and the workflow within their groups. This present study builds on the previous work using focus groups to examine the perspectives of the graduate students and postdoctoral researchers within these research groups. Berg and Lune (2012) define a focus group as an extension of the interview, designed to unpack motivations, decisions, and priorities. This study seeks to understand the role of graduate students and postdoctoral researchers within research groups, how they manage, organize, and describe data, as well as the challenges they face in these activities. This article presents the results of the focus groups and provides insights and comments regarding the role of data within the research process, as well as identifying data training needs and what librarians can do to address these needs.
A review of the current literature on data management demonstrates that researchers' data practices are in great need of attention. Focus groups have revealed a research process that was more imperfect than research lifecycle models purport, and participants describe self-deficits in their skills and knowledge (Marcus et al 2007; McLure et al. 2014; Mattern et al. 2015). In particular, the literature shows that graduate students working in laboratories or on research teams, who often are responsible for managing research data on a day-to-day basis, often have no formal training in data management.
Research data management is a complex process that can vary considerably according to a number of factors, including discipline and research method (Akers and Doty 2013; Weller and Monroe-Gulick 2014; Mohr et al. 2015), as well as funding agency and institutional policies (Peters and Dryden 2011; Diekema et al. 2014; Briney et al. 2015). Some studies have indicated that an individual researcher's age and experience directly correlates with their data management practices, often stating that younger researchers are more knowledgeable and more open to sharing data (Piwowar 2011; Buys and Shaw 2015), while one study found the opposite (Tenopir et al. 2011). Another study, however, indicates that the issue is not so straightforward, and librarians should not assume the age or faculty rank of researchers always predicts behavior (Akers and Doty 2012).
Some broad multi-discipline studies at individual institutions demonstrate a general lack of awareness or understanding on the part of graduate students when it comes to data management. Researchers at the University of Massachusetts-Amherst, working under the premise that graduate students need data management training and support just as much as faculty members, conducted a study to assess the needs of graduate students at their institution (Adamick et al. 2013). They used that information to develop a series of training workshops on data storage, sharing and reuse, preservation, metadata, and ethical and legal considerations. Feedback from workshop participants was positive but indicated that there also needs to be discipline-specific instruction. Sharma and Qin (2014) surveyed graduate students at Syracuse University to better understand their awareness of data literacy and found that a large number of participants appeared unaware of some of the basic skills and knowledge needed for effective data management. At Oregon State University, Valentino and Boock (2015) found that graduate students at their institution were willing but generally lacked an understanding of how to go about good data management.
Librarians also have employed case studies to learn more about the data management practices and needs of graduate students in specific fields or disciplines. Carlson et al. (2013) created the Data Information Literacy (DIL) project to explore the data management skills needed by graduate students to fulfill their professional obligations as future scientists in ways that align with disciplinary cultures and practices. The results of these interviews indicated an overall lack of formal training in data management, the absence of formal policies governing data in the lab, self-directed learning through trial and error and a focus on mechanics over concepts. Johnston and Jeffryes (2014) interviewed members of a structural engineering research group at the University of Minnesota, Twin Cities to better understand graduate student behavior. They found the graduate students in the group had no formal training in data management, let alone any training specific to their academic discipline. In a separate case study, Carlson and Bracke (2013) interviewed Purdue graduate students working at a water quality field station about their data practices. They found that the students were hampered by the culture of the discipline they worked in, where there are few accepted norms for managing data effectively.
Comparisons of faculty and graduate students' data management practices and data literacy indicate both similarities and differences between the groups. Weller and Monroe-Gulick (2015) surveyed faculty and graduate students at the University of Kansas. Comparing the two groups revealed that both have an interest in getting assistance with long-term storage, preservation and archiving, and dissemination and publication of their research. Carlson et al. (2011) interviewed faculty members at Purdue University about the skills, knowledge, and training needed by graduate students to effectively manage and curate data. While the interviews focused on the graduate students, the authors note, "Many faculty admitted or otherwise revealed that they themselves lack the expertise or experience with data management, even as they critiqued their students' abilities" (Carlson et al. 2011). Other studies support the idea that faculty members expect that their graduate students have already learned data management skills or will learn them on the job (Carlson et al. 2013; Wiley and Mischo 2016). These studies demonstrate that graduate students, and often the faculty members who oversee them, are not the best judges of their own data management skill level. Furthermore, despite their differences in experience and priorities, both faculty and graduate students likely are in need of similar types of training and support, albeit at different levels.
Postdoctoral researchers often fill a role bridging the gap between faculty and graduate students, yet little research has been conducted that focuses on their experiences with data management. Jahnke et al. (2012) discovered that postdoctoral researchers had little to no formal training in data management practices, a lack of satisfaction with their data expertise, and gave little consideration to the long-term preservation of data. Weller and Monroe-Gulick (2015) attempted to learn more about the data management experiences of postdoctoral researchers in one of their studies but ultimately excluded those participants due to the very limited number of responses.
Ultimately, graduate students seem to compensate for their lack of skill with data management through application of skills learned in methodology courses and through trial and error (Jahnke et al 2012; Carlson et al. 2013). This supports the idea that both basic data management instruction, as well as discipline-specific or even lab-specific instruction, would be beneficial for graduate students. At a more fundamental level, however, there appears to be a significant disconnect between faculty members, postdoctoral researchers, and graduate students regarding their roles and skill levels with data management. This study explores graduate students and postdoctoral researchers' perspectives in the context of their research workflow and seeks to identify data management services that can better meet local and disciplinary needs of these future researchers.
In the summer of 2016, the authors conducted focus groups with graduate students and postdoctoral researchers from atmospheric science, civil engineering, material science, and aerospace engineering research groups to explore their experiences with and approaches to managing research data. Using a grounded theory perspective, the authors took a composite realist-constructivist approach to analyzing the data, as described by Barbour (2013), which allowed them to explore both theoretical and disciplinary questions.
The focus group questionnaire was based on one used by Wiley and Mischo (2016) that explored data management practices of atmospheric science and engineering faculty. The authors of the present study revised it to account for the different roles of graduate students and postdoctoral researchers. Additionally, the focus groups were semi-structured to allow for follow-up questions and for participants to direct the flow of conversation depending upon the experiences they related. The topics covered included current research projects, data types, format and description, backup practices, tools or software used, file organization, workflow, and issues and challenges with managing research data (see Appendix). Through follow-up questions, participants were also asked about their perspectives on how long the data were useful, what parts of the data would be important to preserve over time, and who had access to the data during the research project.
The focus groups were recorded and later transcribed by a professional transcriptionist into Microsoft Word documents. The lead author manually coded the transcripts using descriptive coding using the following terms based on the interview questions: research, primary ways to manage, length of time, description, tools, size, format, metadata, backup practices, preservation, storage, access, challenges and organization. The second author then coded a subset of the transcripts to check for interrater agreement; where there were any differences in coding, the authors discussed and came to consensus. Finally, the authors used these same terms to identify main ideas or themes to gain additional understanding regarding commonalities and uniqueness among the focus groups participants.
Twelve graduate students and three postdoctoral researchers (12 males, 3 females) participated in these focus groups. Participants represented the following departments within the College of Engineering: Materials Sciences (9), Atmospheric Sciences (1), Aerospace Engineering (1), Chemical and Bio-molecular Engineering (1) and Mechanical Engineering (3). From the analysis, the authors identified the following themes: current research roles, managing research data, data format, size, preservation, organization, backup practices, description, access, preservation, and challenges as detailed below.
The authors first asked participants what research projects(s) they were currently working on. Their responses included studying single molecule dynamics of branched polymers, lithium batteries, self-heating nuclear fuel, and analyzing metallic glasses. They talked about these projects in terms of the specific parts or tasks that they were each responsible for, rather than how they fit into the research process as a whole. The resulting discussions provided a better understanding of the role of graduate students in the research process and research groups.
The three postdoctoral researchers mentioned collaborations when discussing their current research projects. More specifically, they discussed collaboration with National Aeronautics and Space Administration (NASA) and Research Park. Research Park is a technology hub for startup companies and corporate research and development operations located on the southwest part of the University of Illinois Urbana-Champaign (UIUC) campus. These collaborations appeared to affect several aspects of data management, such as how the data were stored, organized, and accessed, with little uniformity from group to group.
NASA (2011) promotes the full and open sharing of all data with the research and applications communities, private industry, academia, and the general public, while the collaboration in Research Park with a pharmaceutical corporation is characterized by a lack of open sharing of data and results, in line with findings of the Future of Privacy Foundation (2017), which include privacy, risk of re-identification, diminishing or destroying the intellectual property of data, data provenance, and contractual requirements. It was clear that collaboration with other researchers, both from within and outside of the institution, was an important aspect of the participants' research and data management.
Graduate students and postdoctoral researchers were asked about the primary ways they manage research data. These comments illustrate their experiences:
They discussed data management in terms of how data is stored and transferred, as well as the levels of access to the data. These responses indicate the participants think about managing data as it relates to the research workflow, design, and available software. Their respective faculty members, however, listed data storage transfer and access as challenges in managing data (Wiley and Mischo 2016). This indicates a disconnect between the two groups, where a better understanding of each others challenges and motivations might contribute to a more cohesive approach to data management.
The participants had no experience writing or following a data management plan, nor were they involved in the funding aspect of these projects. In contrast, engineering faculty are aware of funding agency mandates and have experience writing data management plans (DMPs), and have experience as reviewers of DMPs from various funding agencies (Wiley and Mischo 2016). According to the Federal Reporter web site search results for 2016, the National Science Foundation (NSF) awarded grants to five primary investigators affiliated with participants in this focus group. Although DMP mandates have raised awareness of research data management, that does not mean that DMPs have actually made a measurable impact on working practices. While it is not expected that graduate students would participate in writing DMPs, certainly an understanding of how such documents fit into the research process could benefit them.
Participants were asked about the format of their data and to estimate average file size. Text files were the most common output created by their current research projects. The other file formats mentioned were ASCII, JPEG, Comma Separated Value (CSV), and image files. Those from atmospheric sciences use network Common Data Form (netCDF) and Hierarchical Data Format (HDF). Estimates of total file size ranged from 5 gigabytes to 3 terabytes. The formats used by respondents for research data are all preservable, so discussing open file formats appears to not be a high priority in these disciplines.
When asked how they describe the research data and how they organize it, some of the participants indicated they use lab notebooks and others discussed using a personal file structure. Several participants expressed frustration in having former group members leave without providing notation of the work that was previously completed. In an effort to alleviate this, they created short descriptions of the data, so that research group members would be able to understand the research workflow and what had been done. They also discussed creating plain text files and column headers for organizing data. This suggests the diversity of research within teams creates its own challenges due to differences in workflows, practices and value concepts. Overall, graduate students and postdoctoral researchers expressed that it is challenging to consistently organize and describe research data, particularly when working in a team, where individuals have different workflows, practices, and value concepts.
Participants were also asked about accessing data within their labs and research groups. The responses ranged from no shared access whatsoever, to sharing between research groups by employing user names and passwords, to anyone being able to access the data. Approximately half of the participants indicated that they back up their data in some fashion. Their backup frequency ranged from every week to every couple of months. Participants stated that they use Amazon Cloud and Amazon Prime to back up data, noting that these services automatically do this for them. Some use flash drives to back up data. One participant did not have a backup data practice and trusts in managing their computer very well. Overall, these responses demonstrated the participants use a wide range of methods for backing up data.
As a follow up question as time permitted, participants were asked what they believed was the most important part of data preservation. The majority of respondents indicated that the raw data or code was the most important to preserve over time. One respondent indicated that the processed data was the most important and another indicated the simulations were the most important to preserve. These responses illustrate the variability of opinions on preservation and are similar to engineering and atmospheric faculty responses (Wiley and Mischo 2016).
Participants indicated they were struggling with issues concerning data storage, backing up and disseminating data, and organization of research workflow. A material science graduate student shared this experience working within a research group: "If you somehow transfer scripts or, like, code to other people and they have to understand it, somehow it's quite a problem. I mean long-term over time, it is difficult to access files because of hardware problems; they just break. It is a struggle to think of a long term data solution." Another graduate student expressed concerns about losing data: "Early in the laboratory, a member within the research group's hard drive failed. We lost half of his data. Now our advisor is like, 'We have to have two backups for everything.'"
Other related challenges had to do with collaboration and description. For example, postdoctoral researchers mentioned collaborations with startup companies, as well as corporate research and development operations, citing occasional difficulty with sharing data outside of the institution. The challenge with data description appears to be a lack of standardization; while respondents in this study indicate they use lab notebooks and short descriptions for files, none of them use established metadata standards to describe data, instead using their own structure if they use one at all.
While the challenges described by the participants were wide-ranging, and some were not even directly related to data management, at the heart of these issues is the need for graduate students and postdoctoral researchers to better understand their role in the research process, how their roles evolve, and how good data management affects them and the research process. General data management instruction can help these future researchers understand the different pieces of the data life cycle, even if they are not directly responsible for some of the pieces, so that they understand how their piece affects the others. More discipline-specific instruction can address issues such as better ways to organize, document and improve research workflow (Frank and Pharo 2016).
The primary goal of this study was to understand the role of graduate students and postdoctoral researchers within a variety of science and engineering research groups. This particular study highlights the disconnect between faculty members and the graduate students and postdocs and shows that while each group has a good grasp of the challenges they face, they need to be more collaborative when trying to address these challenges. Faculty members need to be more deliberate with building data management learning into their group's work and not assume that this knowledge is inherent or will be learned organically.
This study revealed graduate students tend to discuss managing research data in terms of the software they use, and that their focus is singular and task specific. Their focus on singular and specific tasks is expected because they have a limited role in the research process, and are not as experienced as faculty, yet it also presents a challenge because many faculty members believe that graduate students manage research data well. Three of the participants were affiliated with Research Park and their research had ties to corporate affiliation, yet they shared the same challenges as those participants in academic research environments. Results from this study and others indicate that graduate students, in fact, do not necessarily manage research data well (Carlson et al. 2013; Adamick et al. 2013).
While librarians are well-positioned to provide this learning, the language and concepts they use in conversations about data management creates a barrier to understanding among graduate students. During the focus groups, participants rarely used words or phrases like "open format" or "metadata," and the term "repository" was not clearly understood. Repositories were referenced in three different ways: as a specific online storage platform used to store code, as a piece of infrastructure provided by Engineering Information Technology support staff for code storage, such as a dedicated server, and as a device such as a personal laptop or flash drive used to store data.
None of the graduate students and postdoctoral researchers were aware the campus provided preservation and data management services. This was not surprising considering they typically do not apply for funding agency grants that require data management plans. Since this study, the lead author has presented instructional seminars on preservation, data management services, national data policies, and publisher requirements to various research groups on campus in order to spread awareness. Future work will include continuing this instruction to research groups, assessing the responses of seminar participants and making adjustments to future seminars as needed.
The focus group responses revealed engineering and science graduate students and postdocs have varying levels of expertise in data management and that the majority use no specific process to organize and document their data. Based on this information, the lead author has created an engineering-focused instructional seminar on the elements of data management, organization, and documentation. At least four instances of this instructional seminar have been delivered, with attendance ranging from six to 43 individuals. A preliminary survey of attendees reveals graduate students and faculty find them useful and are sharing information with other research groups. Consequently, the lead author is considering continuing this study with the mechanical science, industrial engineering and chemistry research groups. The faculty rely heavily upon graduate students and postdocs to organize and store data, create metadata, and carry out the experiments that actually generate the data. Yet this study confirms that they struggle with all of these tasks. Many faculty members experience these same challenges (Wiley and Mischo 2016), indicating a need for help in these areas. This study confirms that there is a huge disconnect between faculty, and the graduate students and postdoctoral researchers that do the work as to how data management fits into this workflow.
Finally, the focus groups helped create connections with graduate students and postdoctoral researchers, share knowledge, and learn more about their research workflow. While the goal at the University of Illinois at Urbana-Champaign continues to be to talk with as many researchers and groups as possible, the long term questions remain "Are we doing enough?" and "How do we measure long term success?" Future work will examine how the different aspects of data management are implemented within engineering and science research groups, as well as collaborative efforts between librarians, research data services, and engineering information technology professionals to provide solutions to data management challenges. Since completing the focus groups, the lead author has expanded these conversation to other research groups and creating supporting services that meet their ever growing and changing needs.
Adamick, J., Reznik-Zellen R.C. & Sheridan, M. 2013. Data management training for graduate students at a large research university. Journal of eScience Librarianship 1(3):e1022. doi: 10.7191/jeslib.2012.1022
Akers, K.G. & Doty, J. 2012. Differences among faculty ranks in views on research data management. IASSIST Quarterly 36(2):16-20. Available from: http://www.iassistdata.org/sites/default/files/iqvol36_2_doty.pdf
Berg, B.L. & Lune, H. 2012. Qualitative research methods for the social sciences. Upper Saddle River, (NJ):Pearson Education, Inc . 8th edition.
Briney, K., Goben, A. & Zilinski, L. 2015. Do you have an institutional data policy? A review of the current landscape of library data services and institutional data policies. Journal of Librarianship & Scholarly Communication 3(2):1-25. doi: 10.7710/2162-3309.1232
Carlson, J., Fosmire, M., Miller, C.C. & Nelson, M.S. 2011. Determining data information literacy needs: A study of students and research faculty. portal: Libraries and the Academy 11(2):629-657. doi: 10.1353/pla.2011.0022
Carlson, J. & Bracke, M.S. 2013. Data management and sharing from the perspective of graduate students: An examination of the culture and practice at the water quality field station. portal: Libraries & the Academy 13(4):343-361. doi: 10.1353/pla.2013.0034
Carlson, J., Johnston, L., Westra, B. & Nichols, M. 2013. Developing an approach for data management education: A report from the Data Information Literacy Project. The International Journal of Digital Curation 8(1):204-217. doi: 10.2218/ijdc.v8i1.254
Diekema, A.R., Wesolek, A. & Walters, C.D. 2014. The NSF/NIH effect: Surveying the effect of data management requirements on faculty, sponsored programs, and institutional repositories. Journal of Academic Librarianship 40(3-4):322-331. doi: 10.1016/j.acalib.2014.04.010
Future of Privacy Forum. 2017. Understanding corporate data sharing decisions: Practices, Challenges, and Opportunities for Sharing corporate data with researchers. [Internet] https://fpf.org/2017/11/14/understanding-corporate-data-sharing-decisions-practices-challenges-and-opportunities-for-sharing-corporate-data-with-researchers/
Johnston, L. & Jeffryes, J. 2014. Data management skills needed by structural engineering students: Case study at the University of Minnesota. Journal of Professional Issues in Engineering Education and Practice 140(2). doi: 10.1061/(ASCE)EI.1943-5541.0000154
Marcus, C., Ball, S., Deserone, L., Hribar, A. & Loftus, W. 2007. Understanding research behaviors, information resources, and service needs of scientistis and graduate students: A study by the Univesity of Minnesota. [Internet] Available from: http://hdl.handle.net/11299/5546
Mattern, E., Wei, J., Daquing, H., Lyon, L. & Brenner, A. 2015. Using Participatory Design and Visusal Narrative Inquiry to Investigate Researchers, Data Challenges and Recommendations for Library Research Data Services. Program: Electronic Library and Information Systems 49(4):408-423. doi: 10.1108/PROG-01-2015-0012
McLure, M., Level, A.V., Cranston, C.L., Oehlerts B. & Culbertson, M. 2014. Data curation: A study of researcher practices and needs. portal: Libraries & the Academy 14(2):139-164. doi: 10.1353/pla.2014.0009
Mohr, A.H., Bishoff, J., Bishoff ,C., Braun, S., Storino, C. & Johnston, L.R. 2015. When data is a dirty word: A survey to understand data management needs across diverse research disciplines. Bulletin of the Association for Information Science & Technology 42(1):51-53. doi: 10.1002/bul2.2015.1720420114
Peters, C. & Dryden, A.R. 2011. Assessing the academic library's role in campus-wide research data management: A first step at the university of Houston. Science & Technology Libraries 30(4):387-403. doi: 10.1080/0194262X.2011.626340
Sharma, S. & Qin, J. 2014. Data management: Graduate student's awareness of practices and policies. Proceedings of the Association for Information Science & Technology 51(1):1-3. doi: 10.1002/meet.2014.14505101130
Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A.U., Wu, L., Read, E., Manoff, M. & Frame, M. 2011. Data sharing by scientists: Practices and perceptions. Plos One 6(6). doi: 10.1371/journal.pone.0021101
Valentino, M & Boock, M. 2015. Data management services in academic libraries: A case study at Oregon State University. Practical Academic Librarianship: The International Journal of the SLA Academic Division 5(2). https://journals.tdl.org/pal/index.php/pal/article/view/7001/6098
Weller, T. & Monroe-Gulick, A. 2014. Understanding methodological and disciplinary differences in the data practices of academic researchers. Library Hi Tech 32(3):467-482. doi: 10.1108/LHT-02-2014-0021
Weller, T. & Monroe-Gulick, A. 2015. Differences in the data practices, challenges, and future needs of graduate students and faculty members. Journal of eScience Librarianship 4(1):e1070. doi: 10.7191/jeslib.2015.1070
This work is licensed under a Creative Commons Attribution 4.0 International License.