Issues in Science and Technology Librarianship
This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level researchers; the structured assignment done in the lab highlights useful aspects of each resource; and a major poster presentation assignment students do after the lab consolidates and reinforces their understanding of the nature of genetic information. Close collaboration with the biology instructor has resulted in a detailed laboratory exercise that supports exploration of advanced information tools through focused questions. The exercise, as befits its subject, is constantly evolving, both in response to changes to the available resources, and to student feedback. The case study presents in detail the step-by-step exercise students work through. The steps are provided as a series of tool-specific modules that other librarians could use individually or as a set. The advantages of using the resources together come from the integration of the resources themselves, and the way they demonstrate the synergies and relationships between data, articles and patents, which are all key sources of information in the sciences.
The explosion in available information about genes, human and otherwise, and the expanding palette of resources available to researchers, have created a rich information environment for researchers, graduate and increasingly, undergraduate students. In order to familiarize biology students with the ways of learning and knowing practiced in genetics and related disciplines, it is necessary to provide opportunities for them to explore and use the current, cutting-edge tools. Only then can they go beyond learning about genetics to learning how to approach questions and problems as geneticists -- in short to join the community of practice. While the tools themselves are sophisticated, and provide access to an enormous and expanding amount of information, they are sufficiently well-designed to allow novice users to gain an understanding of them under the right conditions. This case study will describe how some of those conditions are set in place. The overriding key to introducing these resources effectively is a strong collaboration between the librarian and the instructor so that information in the library lab session is linked directly to the assignment and to the material from lectures. Procedurally, three factors are key to successful implementation: a structured demonstration focused on key aspects of each resource and progressing from familiar bibliographic databases to the unfamiliar terrain of gene and protein data; a laboratory exercise that permits some exploration while requiring specific data from each source; and a follow-up assignment requiring deeper critical thinking, further independent exploration of the resources, and synthesis and analysis of information to consolidate learning. The case study is offered as a guide to others who want to integrate genetic databases into IL instruction at the undergraduate level -- each resource could be shown on its own, but one of the strengths of the session is the integration of all the resources so that students build familiarity with searching and terminology that transfers across resources.
There is a widespread acknowledgement that students should become familiar with genetic and other bioinformatics information resources as part of the undergraduate curriculum (Adams 2009; Bednarski et al. 2005; Boyle 2004; Dinkelman 2007; Pham et al. 2008; Tennant & Miyamoto 2002). Indeed, the American Society for Biochemistry and Molecular Biology (ASBMB) has recommended that the use of such resources be a core competency for students in undergraduate biochemistry and molecular biology programs (Boyle 2003 and Voet et al. 2003). Dymond et al. (2009) concluded that having undergraduate biology students participate in cutting edge interdisciplinary research such as bioinformatics fosters a greater understanding of genes and chromosomes and better prepares them for a career in the life sciences. In a report by Miskowski et al. (2007) the authors contend that the emergence of bioinformatics has not only had a significant impact on how biological research is conducted but has influenced the types of questions that can be asked. MacMullen and Denn (2005) summarized some of the ways information researchers and librarians can support molecular biologists such as assisting with integrating data and literature and locating myriad sources of bioinformatics information. Feig and Jabri (2002) encourage faculty to incorporate these resources into the curriculum, using data-mining exercises such as the one described below to introduce students to the common databases and tools that take advantage of this vast repository of biochemical information.
Similarly, there is ample information to support the integration of genetic data and similar resources into librarians' skill sets. Much of the literature on this subject speaks to supporting faculty and researchers in their work (Brown 2005; Geer 2006; Tennant 2005). There have been a number of introductions to bioinformatics resources for librarians, notably the March 2005 issue of Journal of the American Society for Information Science & Technology including a primer on the discipline (Rapp & Wheeler 2005), a brief review of bioinformatics resources on the web in the July/August 2008 issue of College & Research Libraries, (O'Grady 2008), and a special issue in July 2006, of the Journal of the Medical Library Association, which provided an overview of libraries that provide bioinformatics support at academic institutions. Tennant and Lyon (2007) provide an extensive overview of NCBI's Entrez Gene database, which is a core, integrated online resource for gene and protein information which was used extensively in this study.
Biology 311- Principles of Genetics is a required course for biology majors at the University of Calgary and consists of approximately 560 students and is usually taken in their second year. By this time, students will have had at least one information literacy workshop in their first-year biology course that covers an introduction to scientific literature, BIOSIS Previews, the library catalogue, and Internet resources. The course covers topics such as Mendelian inheritance, sex determination, changes in chromosome structure, molecular genetics, genetics of bacteria and viruses, and the structure and function of genetic material. The information literacy session is offered during week 11 of the Fall semester when students have had some exposure to course content and key vocabulary, and are familiar with the role, structure and function of proteins and genes. The session is delivered during a scheduled three hour lab session, in a classroom in the library where students can work using a desktop, or their own laptop computers. The students meet in their lab before coming to the library for the 90 minute session. There is extra time available at the conclusion of the session for students wishing to stay to work on their assignment or obtain help with any questions they might have. Typically each library session is given to three lab sections at a time so there are usually 72 students in the 50-station classroom, with three lab instructors, the librarian, and often one or two other librarians or library staff to support the students. Students usually work in pairs with their partners. For each resource, the lab follows a pattern of demonstration, practice, and discussion. For the lab exercise each pair of students selects a genetic disease, identifies a gene principally responsible for the disease, and locates both specific pieces of information to complete the exercise, and resources such as patents and articles they can use for the poster presentation assignment.
Most students have little difficulty completing most of the exercises within the allotted time and also obtain the majority of the information they need for the posters within the scheduled 90 minutes. The lab is supported by a workbook with detailed information and screen shots, and an online resource with live links that students can use to follow the demonstrations and locate the resources for subsequent work. Students have approximately two weeks to complete their assignments and prepare for their poster presentations. Over time this worksheet has changed with various platforms and content management systems used by the library and is currently in a LibGuide format, available at http://libguides.ucalgary.ca/content.php?pid=55723&sid=603862
During the lab session in the library, students extracted information from a series of progressively more complex information resources. This information was used both to complete the lab assignment and to develop a poster presentation on a specific genetic disease. This section will describe each resource, and how it is used for the assignment. All the resources are freely available, most through the National Center for Biotechnology Information (NCBI) site. Some of the information the students needed to complete assignments was available in more than one source, enabling them to cross-check and verify data, another useful information skill. The library session is structured to start students in the relatively familiar settings of online encyclopedias and journal databases, progress to patents where they use a familiar interface to access unfamiliar material, and then proceed to the gene and protein resources which are totally unfamiliar to the students.
In OMIM, students see an interface similar to what they have just experienced with PubMed, but which leads to radically different information. OMIM provides information from its own database, and serves as a gateway to many other NCBI genetics resources, including Entrez Gene and Entrez Nucleotide, which were used extensively for this class. These resources present students with many kinds of information -- numeric, graphic and textual. Some of the graphic information, like ideograms and gene maps, are familiar to students who will have seen similar materials in the course lectures and textbook.
Using terminology from their previous searches, they retrieve information on their topics. While the OMIM gateway provides a bewildering array of choices for various types of information and links to resources, the structured questions in the laboratory exercise and the steps provided in the worksheet help keep students on track making them aware of the amazing volume and range of genetic information available. The lab worksheet provides guidance in interpreting the data -- for example choosing an entry from their results list that have an asterisk as a prefix because those are entries with a known sequence which allows students to avoid those entries with an unknown or ambiguous molecular basis. In the lab exercise questions pertaining to his/her disease topic could be answered using OMIM, such as the inheritance pattern of the disease, the name of the gene that causes the disease and the types of mutations that have been discovered in this gene (Appendix A).
As mentioned earlier, many well known genetic diseases, such as most types of cancers, Alzheimer's, and obesity, occur because of mutations on several genes and possible environmental factors not just single gene mutations. Students who selected these diseases developed an understanding of the difficulties faced by researchers trying to develop treatments. In most instances, each student was able to select a gene identified in OMIM in conjunction with results from their PubMed and patent searches which provided very current and corroborating supporting information on a possible genetic mutation.
After mining the OMIM record for the required information, students used links from the OMIM page to enter the Entrez Gene database to gather more specific data about the genes they focused on, by selecting "Gene" from the drop-down menu under the "Links" icon in the upper-right corner of the selected OMIM records. These records also link to the "Map Viewer" to see images of the locations of the genes they were studying. Also linked from the Entrez Gene record is the NCBI's "RefSeq Protein Product" sequence database which attempts to create a single comprehensive, annotated and non-redundant sequence for each gene (NCBI 2009; Tenant & Lyon 2007). Students used the records they retrieved from this source to answer a series of questions on the proteins encoded by the gene they had selected. They then copied the protein sequences found at the bottom of the records to answer two questions about their proteins using another tool, ExPASy's ProtParam. (Appendix B).
Students use information copied from the RefSeq records found earlier to search the ExPASy ProtParam database. As well as providing them with the data they needed for their assignments, this step also showed that as long as data are in a recognizable format (i.e., FASTA sequence), they can be used in more than one database, even those not hosted by the National Center for Biotechnology Information (NCBI). As with the previous set of questions, the structured exercise, asking for particular pieces of information, led students to the appropriate parts of the retrieved records.
The library workshop, exercise and poster assignment fulfill the instructor's objectives of introducing students to genetics resources. The deliverables are worth 3% of the final mark in this class. Students are routinely surveyed about the workshop using the getFAST tool (http://www.getfast.ca), which is a free web-based assessment tool which allows students to provide anonymous feedback on information literacy session. One of the questions students were asked was "What were the most useful things they learned in the library session?" Most students indicated that they found the resources challenging but useful. Sample student comments from the getFast survey included:
The posters, in particular, illustrate that students have not only understood the data but are able to incorporate data with patents and scholarly articles to communicate information about a genetic diseases. The quality of the poster presentations is very high and this is also evident from the questions fellow students asked each presenter and the responses provided. In the longer term, surveys of students have shown greater awareness of, and facility with, the PubMed database in senior classes since the introduction of this second-year workshop.
The workshop has evolved over time, in response to feedback from the librarian, instructor and students and to changes in the resources. Students had difficulty answering questions about the proteins involved in a disease using NCBI resources, so another resource, ExPASy-ProtParam, was identified and included in the workshop in 2008 and 2009 to help students answer amino acid and molecular weight protein questions. This had the added benefit of showing students that data from the NCBI suite of resources could be used elsewhere, because that there was a common language among genetics databases. Similarly, having students each select one consolidated and non-redundant RefSeq Protein Product to answer their protein questions eliminated the need for students to select from a list of proteins in order to select the protein encoded for their chosen genes. In previous years students selected from a list of proteins from their gene records because spliced genes can have multiple "transcripts" or variant proteins. This and other changes has led to a noticeable drop in the number of questions this librarian has had to answer in each of the past three years as the session and assignment were fine tuned.
The librarian has benefited from the collaboration and the opportunity to work with a range of data sets. Familiarity with the resources has also contributed to liaison work with faculty and graduate students. The librarians and staff who assist in the workshops also report greater confidence with genetic and other non-bibliographic data as a result of working through the exercise with the students.
As noted above there are a number of reasons why this introduction to genetic information works so well. Collaboration with the instructor has ensured that the assignment meets the needs of the course and the students, and because the instructor is an expert in the field, that the workflow in the exercise follows the pattern used by senior researchers, thus offering an authentic research experience. The instructor's knowledge of new resources in the field has led to improvements in the workshop. The librarian's experience in teaching information literacy has ensured that the workshop helps students contextualize the various information sources, and develop a deeper understanding of the discipline's information environment. In developing the workshop the librarian also provided a student's perspective, identifying threshold concepts and areas where students might need extra scaffolding to understand what they were doing beyond simply following rote instructions.
The authenticity of the assignment, with the opportunity to explore advanced research tools, fosters student engagement with the material; they immediately appreciate the benefit of using the resources. The lab worksheet supports this by providing explicit links between the activities students are completing and cutting-edge genetic research. The exercises and the poster presentations give the students ample opportunity to practice using the tools, while fitting information from different resources - patents, articles and databases - together to create a cohesive presentation. The use of diseases as an access point to genetic information provides a gateway that students understand, and that uses language they are familiar with. Students often choose to research diseases that affect people they know, also increasing the engagement level.
The structure of the workshop, lab exercise, and poster assignment is also important. At each step students are given specific tasks, highlighting key aspects of the data in each resource. While they need to explore the information to extract required answers, the steps required are provided in some detail. This allows students to become familiar with the resources without getting lost. The choice of resources to use is also deliberate. NCBI hosts a vast array of genetics databases, and only some of them are included in the workshop to prevent overwhelming the students. The modules, progressing from the relatively familiar (encyclopedias, PubMed) to the completely unknown (Entrez Nucleotide and BLAST) and the simple act of searching an article database to the more complex work required to obtain protein sequence information support the students in developing their knowledge through a series of manageable steps. This works on the affective level by reducing student anxiety -- imagine beginning a second-year class with a protein database search - as well as on the cognitive level by helping students make connections between the known and unknown. Students gain an understanding of the contributions diverse resources - articles, patents, data, make to increasing understanding and building knowledge in the discipline.However, it is by no means necessary to use all of the modules, or to use them all at once.
Librarians could start with the Genes and Disease resource and follow a link to one of the NCBI data sets. What is critical, as in all IL instruction is a close fit with the aims of the course, a clear purpose in introducing a particular resource that is linked to a marked assignment, ample opportunities for hands-on practice and active learning, and adequate preparation. It is important that the librarian become familiar with the resources in advance, working through the assignment as a student would to see where students are likely to experience confusion. It is useful as well to check the terms students will be searching in advance. Some diseases have more information than others, some have one genetic factor, others have many. It may be useful to provide a list of the simpler disorders about which more is known that students could pick from to make for a less frustrating experience.
The amount of genetic data available is growing rapidly and exponentially as techniques and tools for sequencing improve. Fortunately access to the data is also steadily improving in usability, integration, depth and breadth and remains for the most part, free to use. With this case study I hope to encourage other librarians to incorporate these genetic data resources in IL instruction. Students need to become familiar with the key tools in the field they are entering, and the best way to foster that familiarity is with authentic assignments. Discipline faculty appreciate library partners who understand the structures of information in the discipline that complement more traditional bibliographic tools, and the possibilities these sources represent for developing more interesting assignments and more engaged students. Librarians who are looking for ways to collaborate more effectively with teaching faculty, to expand their skill sets and to develop IL beyond first-year assignments may find in these resources an effective way to fulfill all these needs.
I would like to thank Dr. Isabelle Barrette-Ng, Instructor for Biology 311, for her expertise and assistance with this class and the students in this class for their participation.
Adams, D. J. 2009. Current trends in laboratory class teaching in university bioscience programmes. Bioscience Education 13: 13-3.
Amberger, J., Bocchini, C.A., Scott, A. F., & Homosh, A. 2009. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Research 37 (Database Issue): D793-D796.
Bednarski, A. E., Elgin, S. C. R., & Pakrasi, H. B. 2005. An inquiry into protein structure and genetic disease: Introducing undergraduates to bioinformatics in a large introductory course. Cell Biology Education 4: 207-220.
Boyle, J. A. 2004. Bioinformatics in undergraduate education: Practical examples. Biochemistry and Molecular Biology Education 32(4): 236-238.
Brown, C. 2005. Where do molecular biology graduate students find information?, Science & Technology Libraries 25(3): 89-104.
Dinkelman, A.L. 2007. "See a need, fill a need" -- reaching out to the bioinformatics research community at Iowa State University, Issues in Science and Technology Librarianship 52 [Internet]. [Cited May 5, 2010]. Available from: http://www.istl.org/07-fall/refereed1.html
Dymond, J.S., Scheifele, L.Z., Richardson, S., Lee, P., Chandrasegaran, S., Bader, J.S. & Boeke, J.D. 2009. Teaching synthetic biology, bioinformatics and engineering to undergraduates: the Interdisciplinary build-a-genome course Genetics, 81: 13-21.
Feig, A.L. & Jabri, E. 2002. Incorporation of bioinformatics exercises into the undergraduate biochemistry curriculum. Biochemistry and Molecular Biology Education, 30(4): 224-231.
Geer, R.C. 2006. Broad issues to consider for library involvement in bioinformatics. Journal of the Medical Library Association 94(3): 286-298.
MacMillan, D. 2007. Ask an interesting question: insights from a reflective survey of senior biology students. In: Seitz, B., editor. Uncharted Waters: Tapping the Depths of Our Community to Enhance Learning: Proceedings of the 35th National LOEX Library Instruction Conference; San Diego, CA.: LOEX Press. p. 149-153.
MacMullen, J. W. & Denn, S. O. 2005. Information problems in molecular biology and bioinformatics. Journal of the American Society for Information Science and Technology 56(5):447-456.
McKusick, V.A. 2007. Mendelian inheritance in Man and its Online version, OMIM. The American Journal of Human Genetics 80:588-604.
Miskowski, J. A., Howard, D. R., Abler, M. L. & Grunwald, S. K. 2007. Design and implementation of an interdepartmental bioinformatics program across life science curricula. Biochemistry and Molecular Biology Education 35(1): 9-15.
O'Grady, T. 2008. Internet resources: bioinformatics, a brief overview of resources on the web. College & Research Library News 69(7): 404-407.
Pham, D.Q.D., Higgs, D.C., Statham, A. & Schleiter, M.K. 2008. Implementation and assessment of a molecular biology and bioinformatics undergraduate degree program. Biochemistry and Molecular Biology Education 36(2): 106:115.
Rapp, B.A. & Wheeler, D.L. 2005. Bioinformatics resources from the National Center for Biotechnology Information: an integrated foundation for discovery. Journal of the American Society for Information Science and Technology 56(5): 538-550.
Tennant, M.R. & Miyamoto, M.M. 2002. The role of medical libraries in undergraduate education: a case study in genetics. Journal of the Medical Library Association 90(2): 181-193.
Tennant, M.R. 2005. Meeting the information needs of genetics and bioinformatics researchers. Reference Services Review 31(1): 12-19
Tennant, M. R. & Lyon, J.A. 2007. Entrez Gene: A gene-centered "information hub". Journal of Electronic Resources in Medical Libraries 4(3): 53-78.
Voet, J.G., Bell, E., Boyer, R., Boyle, J., O'Leary, M., & Zimmerman, J.K. 2003. Mini-Series: the ASBMB recommended biochemistry and molecular biology undergraduate curriculum and its implementation: recommended curriculum for a program in biochemistry and molecular biology. Biochemistry and Molecular Biology Education 31(3): 161-16
Genes & Disease
Genomics Energy.com. Bioinformatics Tools: Tips, Tutorials, and Terminology for Using Selected Resources in Genome Database Guide.
getfast -- Free online assessment tool
National Center for Biotechnology Information (NCBI)
National Center for Biotechnology Information. A Science Primer.
Online Mendelian Inheritance in Man (OMIM)
Using ProtParam, please provide answers to the following questions:
For this exercise, you will be asked to do a protein blast (blastp) to search the RCSB Protein Data Bank for structural information. Using the protein record you found in the last exercise, highlight and copy the sequence of the protein into the query window of the blastp site.