Previous Contents Next
Issues in Science and Technology Librarianship
Spring 2010
DOI: 10.5062/F44F1NNK


Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

Don MacMillan
Liaison Librarian for Biological Sciences, Physics & Astronomy
University of Calgary
Calgary, Alberta, Canada

Copyright 2010, Don MacMillan. Used with permission.


This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level researchers; the structured assignment done in the lab highlights useful aspects of each resource; and a major poster presentation assignment students do after the lab consolidates and reinforces their understanding of the nature of genetic information. Close collaboration with the biology instructor has resulted in a detailed laboratory exercise that supports exploration of advanced information tools through focused questions. The exercise, as befits its subject, is constantly evolving, both in response to changes to the available resources, and to student feedback. The case study presents in detail the step-by-step exercise students work through. The steps are provided as a series of tool-specific modules that other librarians could use individually or as a set. The advantages of using the resources together come from the integration of the resources themselves, and the way they demonstrate the synergies and relationships between data, articles and patents, which are all key sources of information in the sciences.


The explosion in available information about genes, human and otherwise, and the expanding palette of resources available to researchers, have created a rich information environment for researchers, graduate and increasingly, undergraduate students. In order to familiarize biology students with the ways of learning and knowing practiced in genetics and related disciplines, it is necessary to provide opportunities for them to explore and use the current, cutting-edge tools. Only then can they go beyond learning about genetics to learning how to approach questions and problems as geneticists -- in short to join the community of practice. While the tools themselves are sophisticated, and provide access to an enormous and expanding amount of information, they are sufficiently well-designed to allow novice users to gain an understanding of them under the right conditions. This case study will describe how some of those conditions are set in place. The overriding key to introducing these resources effectively is a strong collaboration between the librarian and the instructor so that information in the library lab session is linked directly to the assignment and to the material from lectures. Procedurally, three factors are key to successful implementation: a structured demonstration focused on key aspects of each resource and progressing from familiar bibliographic databases to the unfamiliar terrain of gene and protein data; a laboratory exercise that permits some exploration while requiring specific data from each source; and a follow-up assignment requiring deeper critical thinking, further independent exploration of the resources, and synthesis and analysis of information to consolidate learning. The case study is offered as a guide to others who want to integrate genetic databases into IL instruction at the undergraduate level -- each resource could be shown on its own, but one of the strengths of the session is the integration of all the resources so that students build familiarity with searching and terminology that transfers across resources.

There is a widespread acknowledgement that students should become familiar with genetic and other bioinformatics information resources as part of the undergraduate curriculum (Adams 2009; Bednarski et al. 2005; Boyle 2004; Dinkelman 2007; Pham et al. 2008; Tennant & Miyamoto 2002). Indeed, the American Society for Biochemistry and Molecular Biology (ASBMB) has recommended that the use of such resources be a core competency for students in undergraduate biochemistry and molecular biology programs (Boyle 2003 and Voet et al. 2003). Dymond et al. (2009) concluded that having undergraduate biology students participate in cutting edge interdisciplinary research such as bioinformatics fosters a greater understanding of genes and chromosomes and better prepares them for a career in the life sciences. In a report by Miskowski et al. (2007) the authors contend that the emergence of bioinformatics has not only had a significant impact on how biological research is conducted but has influenced the types of questions that can be asked. MacMullen and Denn (2005) summarized some of the ways information researchers and librarians can support molecular biologists such as assisting with integrating data and literature and locating myriad sources of bioinformatics information. Feig and Jabri (2002) encourage faculty to incorporate these resources into the curriculum, using data-mining exercises such as the one described below to introduce students to the common databases and tools that take advantage of this vast repository of biochemical information.

Similarly, there is ample information to support the integration of genetic data and similar resources into librarians' skill sets. Much of the literature on this subject speaks to supporting faculty and researchers in their work (Brown 2005; Geer 2006; Tennant 2005). There have been a number of introductions to bioinformatics resources for librarians, notably the March 2005 issue of Journal of the American Society for Information Science & Technology including a primer on the discipline (Rapp & Wheeler 2005), a brief review of bioinformatics resources on the web in the July/August 2008 issue of College & Research Libraries, (O'Grady 2008), and a special issue in July 2006, of the Journal of the Medical Library Association, which provided an overview of libraries that provide bioinformatics support at academic institutions. Tennant and Lyon (2007) provide an extensive overview of NCBI's Entrez Gene database, which is a core, integrated online resource for gene and protein information which was used extensively in this study.

Case Study


Biology 311- Principles of Genetics is a required course for biology majors at the University of Calgary and consists of approximately 560 students and is usually taken in their second year. By this time, students will have had at least one information literacy workshop in their first-year biology course that covers an introduction to scientific literature, BIOSIS Previews, the library catalogue, and Internet resources. The course covers topics such as Mendelian inheritance, sex determination, changes in chromosome structure, molecular genetics, genetics of bacteria and viruses, and the structure and function of genetic material. The information literacy session is offered during week 11 of the Fall semester when students have had some exposure to course content and key vocabulary, and are familiar with the role, structure and function of proteins and genes. The session is delivered during a scheduled three hour lab session, in a classroom in the library where students can work using a desktop, or their own laptop computers. The students meet in their lab before coming to the library for the 90 minute session. There is extra time available at the conclusion of the session for students wishing to stay to work on their assignment or obtain help with any questions they might have. Typically each library session is given to three lab sections at a time so there are usually 72 students in the 50-station classroom, with three lab instructors, the librarian, and often one or two other librarians or library staff to support the students. Students usually work in pairs with their partners. For each resource, the lab follows a pattern of demonstration, practice, and discussion. For the lab exercise each pair of students selects a genetic disease, identifies a gene principally responsible for the disease, and locates both specific pieces of information to complete the exercise, and resources such as patents and articles they can use for the poster presentation assignment.

Most students have little difficulty completing most of the exercises within the allotted time and also obtain the majority of the information they need for the posters within the scheduled 90 minutes. The lab is supported by a workbook with detailed information and screen shots, and an online resource with live links that students can use to follow the demonstrations and locate the resources for subsequent work. Students have approximately two weeks to complete their assignments and prepare for their poster presentations. Over time this worksheet has changed with various platforms and content management systems used by the library and is currently in a LibGuide format, available at


During the lab session in the library, students extracted information from a series of progressively more complex information resources. This information was used both to complete the lab assignment and to develop a poster presentation on a specific genetic disease. This section will describe each resource, and how it is used for the assignment. All the resources are freely available, most through the National Center for Biotechnology Information (NCBI) site. Some of the information the students needed to complete assignments was available in more than one source, enabling them to cross-check and verify data, another useful information skill. The library session is structured to start students in the relatively familiar settings of online encyclopedias and journal databases, progress to patents where they use a familiar interface to access unfamiliar material, and then proceed to the gene and protein resources which are totally unfamiliar to the students.

Genes and Disease
This database, hosted by the National Center for Biotechnology Information (NCBI), was launched in 1998 and consists of brief encyclopedia-type entries on over 80 genetic diseases. These diseases can be searched by name or under the bodily system(s) they affect. Students in Biology 311 - Principles of Genetics begin their assignment earlier in the term in consultation with the course Instructor by selecting a disease found in the database. Genes and Disease is arranged like an encyclopedia, easy to use, and written in clear language. It is an ideal starting point for the research providing a low-threshold entry into the assignment. Through this, students can choose a disease and gain some of the specific information they need later on -- including the names of the genes that cause the disease, some history and the state of research at the time of writing. The database also has links to other tools students will be using later on -- a feature that is highlighted by the librarian as an entry into the integrated sets of data students will use. Most genetic disorders are caused by the mutation in a single gene but other well-known diseases such as cancers, diabetes and Alzheimer's have a more complex inheritance pattern resulting from multiple mutations. This makes it more difficult for researchers to develop effective treatments and therapies for these diseases and for students in this class to pinpoint the exact gene(s) that causes the diseases. While some students decided to choose different diseases in these cases, most continued with these more difficult diseases, and focused on one of the genes involved.

This extensive database of biomedical publications provides articles that give students insight into current research in the field. Students are prompted to use PubMed to locate one review article each on the genetic aspects of the diseases they have chosen. Using PubMed serves as a review of article database searching, which students should be familiar with through a first-year assignment using BIOSIS Previews. In selecting an article, each student extends his/her familiarity with the terminology of his/her disease and gain an appreciation for the human impact, and the state of current research. By using review articles, students benefit not only from the content, but also from the structure, which, like their posters, must incorporate information from many sources. Because many of the biology students in this class plan to study medicine or pursue a related graduate degree, early familiarity with PubMed helps bridge the two disciplines. In fact, before the introduction of the IL session for BIOL 311, students surveyed in a third-year class stated that they wished they had known about PubMed earlier in the program (MacMillan 2007). Subsequent feedback from senior students in other biology information literacy sessions indicated that they benefit from the introduction of PubMed and count it as one of their most valuable information sources. The laboratory exercise also requires students to cite the articles they find properly, a further review of skills gained in first year.

Patents provide access to an increasing proportion of genetics and biotechnology information often before or in place of scholarly journals. In this step, students use Google Patents to access patents involving the disease and/or gene they have chosen to study. By using Google to access patents, the session uses a familiar interface to access unfamiliar resources. Students are introduced to the structure and purpose of patents, and must extract key information for the laboratory exercise. Because this is generally their first encounter with patent literature, the searching required for the assignment is very basic, and Google provides an excellent entry point, allowing the students to focus on the complexity of the material, rather than on navigating an unfamiliar interface. The students find the background description very useful, and any related diagrams are also helpful in helping students understand the mechanics of genetic research. Because Google Patents selectively indexes patents from the United States and Trademark Office (USPTO) many students used this opportunity to further investigate their topics using the USPTO's interface. Exposure to the cutting edge of science through patents links students with the discoveries and applications of the often drier textbook material on genes. At this stage, a deeper question is also asked on the ethics of patents in genetics. On the posters, students must consider this debate in answering whether their chosen patents should have been granted. By searching PubMed and patents in a short space of time, students can also see the different information available through each type of source, and understand how proprietary and academic research results are disseminated somewhat differently. This early introduction of patent searching also provides a tool they can use in subsequent research projects across the sciences.

OMIM (Online Mendelian Inheritance in Man)
OMIM is a comprehensive and authoritative catalogue of human genes and genetic disorders created at Johns Hopkins University by Dr. Victor McKusick. OMIM is updated daily and is comprised of published biomedical literature related to Mendelian disorders and related genes. As of January 2010 were over 19,900 entries in OMIM used by researchers and medical practitioners to advance the knowledge and treatment of genetic disorders. For readers wanting to know more about OMIM excellent reviews have been published by Amberger et al. (2009) and McKusick (2007).

In OMIM, students see an interface similar to what they have just experienced with PubMed, but which leads to radically different information. OMIM provides information from its own database, and serves as a gateway to many other NCBI genetics resources, including Entrez Gene and Entrez Nucleotide, which were used extensively for this class. These resources present students with many kinds of information -- numeric, graphic and textual. Some of the graphic information, like ideograms and gene maps, are familiar to students who will have seen similar materials in the course lectures and textbook.

Using terminology from their previous searches, they retrieve information on their topics. While the OMIM gateway provides a bewildering array of choices for various types of information and links to resources, the structured questions in the laboratory exercise and the steps provided in the worksheet help keep students on track making them aware of the amazing volume and range of genetic information available. The lab worksheet provides guidance in interpreting the data -- for example choosing an entry from their results list that have an asterisk as a prefix because those are entries with a known sequence which allows students to avoid those entries with an unknown or ambiguous molecular basis. In the lab exercise questions pertaining to his/her disease topic could be answered using OMIM, such as the inheritance pattern of the disease, the name of the gene that causes the disease and the types of mutations that have been discovered in this gene (Appendix A).

As mentioned earlier, many well known genetic diseases, such as most types of cancers, Alzheimer's, and obesity, occur because of mutations on several genes and possible environmental factors not just single gene mutations. Students who selected these diseases developed an understanding of the difficulties faced by researchers trying to develop treatments. In most instances, each student was able to select a gene identified in OMIM in conjunction with results from their PubMed and patent searches which provided very current and corroborating supporting information on a possible genetic mutation.

After mining the OMIM record for the required information, students used links from the OMIM page to enter the Entrez Gene database to gather more specific data about the genes they focused on, by selecting "Gene" from the drop-down menu under the "Links" icon in the upper-right corner of the selected OMIM records. These records also link to the "Map Viewer" to see images of the locations of the genes they were studying. Also linked from the Entrez Gene record is the NCBI's "RefSeq Protein Product" sequence database which attempts to create a single comprehensive, annotated and non-redundant sequence for each gene (NCBI 2009; Tenant & Lyon 2007). Students used the records they retrieved from this source to answer a series of questions on the proteins encoded by the gene they had selected. They then copied the protein sequences found at the bottom of the records to answer two questions about their proteins using another tool, ExPASy's ProtParam. (Appendix B).

ExPASy ProtParam (Protein Parameter)
This tool, hosted by the Swiss Institute of Bioinformatics, enables users to calculate various physical and chemical parameters for a particular proteins including molecular weight and amino acid composition. While the answers to the laboratory assignment questions could generally be found in the RefSeq Protein Product records, some students in the 2007 class were not able to successfully complete the protein section of the assignment because information was either not available or not up-to-date. In 2008 the class instructor changed the exercise to take advantage of the availability of ExPASy ProtParam tool which provides more current and authoritative information.

Students use information copied from the RefSeq records found earlier to search the ExPASy ProtParam database. As well as providing them with the data they needed for their assignments, this step also showed that as long as data are in a recognizable format (i.e., FASTA sequence), they can be used in more than one database, even those not hosted by the National Center for Biotechnology Information (NCBI). As with the previous set of questions, the structured exercise, asking for particular pieces of information, led students to the appropriate parts of the retrieved records.

The Basic Local Alignment Search Tool (BLAST) enables scientists to perform similarity searches on all available sequence data in order to uncover functional information about a particular protein sequence. Students used BLAST to determine if there were areas or regions of the proteins that that they had selected earlier that were thought to have functions in other proteins. Any loss or degradation of a proteins structure would result in the complete loss of the function of the protein. Students used the same FASTA or protein sequences that they had copied in the previous exercise and copied the sequence of the proteins into the BLASTp query box. They then selected the Protein Data Bank proteins (pdp) for structural information which they used to answer questions about any putative conserved domains or functional units that might be used in proteins with different functions (Appendix C). As with all the resources, students were required to incorporate information from BLAST in their posters.


The library workshop, exercise and poster assignment fulfill the instructor's objectives of introducing students to genetics resources. The deliverables are worth 3% of the final mark in this class. Students are routinely surveyed about the workshop using the getFAST tool (, which is a free web-based assessment tool which allows students to provide anonymous feedback on information literacy session. One of the questions students were asked was "What were the most useful things they learned in the library session?" Most students indicated that they found the resources challenging but useful. Sample student comments from the getFast survey included:

  1. "I learned how to use PubMed. How to find a specific protein, how to find the coding sequence for the specific protein".
  2. "OMIM is quite complicated but the instruction provided really helped to make this web site useful".
  3. "I didn't know most of these research tools existed. This session was very helpful in highlighting what's out there and explaining how to use them".
  4. "If I use OMIM again in the future it will be for any extra genomic information that I need about a disease".
  5. "Not only will these tools be helpful in class and lab assignments, but also in researching for private reasons on family histories of diseases".

The posters, in particular, illustrate that students have not only understood the data but are able to incorporate data with patents and scholarly articles to communicate information about a genetic diseases. The quality of the poster presentations is very high and this is also evident from the questions fellow students asked each presenter and the responses provided. In the longer term, surveys of students have shown greater awareness of, and facility with, the PubMed database in senior classes since the introduction of this second-year workshop.

The workshop has evolved over time, in response to feedback from the librarian, instructor and students and to changes in the resources. Students had difficulty answering questions about the proteins involved in a disease using NCBI resources, so another resource, ExPASy-ProtParam, was identified and included in the workshop in 2008 and 2009 to help students answer amino acid and molecular weight protein questions. This had the added benefit of showing students that data from the NCBI suite of resources could be used elsewhere, because that there was a common language among genetics databases. Similarly, having students each select one consolidated and non-redundant RefSeq Protein Product to answer their protein questions eliminated the need for students to select from a list of proteins in order to select the protein encoded for their chosen genes. In previous years students selected from a list of proteins from their gene records because spliced genes can have multiple "transcripts" or variant proteins. This and other changes has led to a noticeable drop in the number of questions this librarian has had to answer in each of the past three years as the session and assignment were fine tuned.
The librarian has benefited from the collaboration and the opportunity to work with a range of data sets. Familiarity with the resources has also contributed to liaison work with faculty and graduate students. The librarians and staff who assist in the workshops also report greater confidence with genetic and other non-bibliographic data as a result of working through the exercise with the students.


As noted above there are a number of reasons why this introduction to genetic information works so well. Collaboration with the instructor has ensured that the assignment meets the needs of the course and the students, and because the instructor is an expert in the field, that the workflow in the exercise follows the pattern used by senior researchers, thus offering an authentic research experience. The instructor's knowledge of new resources in the field has led to improvements in the workshop. The librarian's experience in teaching information literacy has ensured that the workshop helps students contextualize the various information sources, and develop a deeper understanding of the discipline's information environment. In developing the workshop the librarian also provided a student's perspective, identifying threshold concepts and areas where students might need extra scaffolding to understand what they were doing beyond simply following rote instructions.

The authenticity of the assignment, with the opportunity to explore advanced research tools, fosters student engagement with the material; they immediately appreciate the benefit of using the resources. The lab worksheet supports this by providing explicit links between the activities students are completing and cutting-edge genetic research. The exercises and the poster presentations give the students ample opportunity to practice using the tools, while fitting information from different resources - patents, articles and databases - together to create a cohesive presentation. The use of diseases as an access point to genetic information provides a gateway that students understand, and that uses language they are familiar with. Students often choose to research diseases that affect people they know, also increasing the engagement level.

The structure of the workshop, lab exercise, and poster assignment is also important. At each step students are given specific tasks, highlighting key aspects of the data in each resource. While they need to explore the information to extract required answers, the steps required are provided in some detail. This allows students to become familiar with the resources without getting lost. The choice of resources to use is also deliberate. NCBI hosts a vast array of genetics databases, and only some of them are included in the workshop to prevent overwhelming the students. The modules, progressing from the relatively familiar (encyclopedias, PubMed) to the completely unknown (Entrez Nucleotide and BLAST) and the simple act of searching an article database to the more complex work required to obtain protein sequence information support the students in developing their knowledge through a series of manageable steps. This works on the affective level by reducing student anxiety -- imagine beginning a second-year class with a protein database search - as well as on the cognitive level by helping students make connections between the known and unknown. Students gain an understanding of the contributions diverse resources - articles, patents, data, make to increasing understanding and building knowledge in the discipline.

However, it is by no means necessary to use all of the modules, or to use them all at once.

Librarians could start with the Genes and Disease resource and follow a link to one of the NCBI data sets. What is critical, as in all IL instruction is a close fit with the aims of the course, a clear purpose in introducing a particular resource that is linked to a marked assignment, ample opportunities for hands-on practice and active learning, and adequate preparation. It is important that the librarian become familiar with the resources in advance, working through the assignment as a student would to see where students are likely to experience confusion. It is useful as well to check the terms students will be searching in advance. Some diseases have more information than others, some have one genetic factor, others have many. It may be useful to provide a list of the simpler disorders about which more is known that students could pick from to make for a less frustrating experience.


The amount of genetic data available is growing rapidly and exponentially as techniques and tools for sequencing improve. Fortunately access to the data is also steadily improving in usability, integration, depth and breadth and remains for the most part, free to use. With this case study I hope to encourage other librarians to incorporate these genetic data resources in IL instruction. Students need to become familiar with the key tools in the field they are entering, and the best way to foster that familiarity is with authentic assignments. Discipline faculty appreciate library partners who understand the structures of information in the discipline that complement more traditional bibliographic tools, and the possibilities these sources represent for developing more interesting assignments and more engaged students. Librarians who are looking for ways to collaborate more effectively with teaching faculty, to expand their skill sets and to develop IL beyond first-year assignments may find in these resources an effective way to fulfill all these needs.


I would like to thank Dr. Isabelle Barrette-Ng, Instructor for Biology 311, for her expertise and assistance with this class and the students in this class for their participation.


Adams, D. J. 2009. Current trends in laboratory class teaching in university bioscience programmes. Bioscience Education 13: 13-3.

Amberger, J., Bocchini, C.A., Scott, A. F., & Homosh, A. 2009. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Research 37 (Database Issue): D793-D796.

Bednarski, A. E., Elgin, S. C. R., & Pakrasi, H. B. 2005. An inquiry into protein structure and genetic disease: Introducing undergraduates to bioinformatics in a large introductory course. Cell Biology Education 4: 207-220.

Boyle, J. A. 2004. Bioinformatics in undergraduate education: Practical examples. Biochemistry and Molecular Biology Education 32(4): 236-238.

Brown, C. 2005. Where do molecular biology graduate students find information?, Science & Technology Libraries 25(3): 89-104.

Dinkelman, A.L. 2007. "See a need, fill a need" -- reaching out to the bioinformatics research community at Iowa State University, Issues in Science and Technology Librarianship 52 [Internet]. [Cited May 5, 2010]. Available from:

Dymond, J.S., Scheifele, L.Z., Richardson, S., Lee, P., Chandrasegaran, S., Bader, J.S. & Boeke, J.D. 2009. Teaching synthetic biology, bioinformatics and engineering to undergraduates: the Interdisciplinary build-a-genome course Genetics, 81: 13-21.

Feig, A.L. & Jabri, E. 2002. Incorporation of bioinformatics exercises into the undergraduate biochemistry curriculum. Biochemistry and Molecular Biology Education, 30(4): 224-231.

Geer, R.C. 2006. Broad issues to consider for library involvement in bioinformatics. Journal of the Medical Library Association 94(3): 286-298.

MacMillan, D. 2007. Ask an interesting question: insights from a reflective survey of senior biology students. In: Seitz, B., editor. Uncharted Waters: Tapping the Depths of Our Community to Enhance Learning: Proceedings of the 35th National LOEX Library Instruction Conference; San Diego, CA.: LOEX Press. p. 149-153.

MacMullen, J. W. & Denn, S. O. 2005. Information problems in molecular biology and bioinformatics. Journal of the American Society for Information Science and Technology 56(5):447-456.

McKusick, V.A. 2007. Mendelian inheritance in Man and its Online version, OMIM. The American Journal of Human Genetics 80:588-604.

Miskowski, J. A., Howard, D. R., Abler, M. L. & Grunwald, S. K. 2007. Design and implementation of an interdepartmental bioinformatics program across life science curricula. Biochemistry and Molecular Biology Education 35(1): 9-15.

National Center for Biotechnology Information. RefSeq. [Internet]. [Cited December 20, 2009]. Available from:

O'Grady, T. 2008. Internet resources: bioinformatics, a brief overview of resources on the web. College & Research Library News 69(7): 404-407.

Pham, D.Q.D., Higgs, D.C., Statham, A. & Schleiter, M.K. 2008. Implementation and assessment of a molecular biology and bioinformatics undergraduate degree program. Biochemistry and Molecular Biology Education 36(2): 106:115.

Rapp, B.A. & Wheeler, D.L. 2005. Bioinformatics resources from the National Center for Biotechnology Information: an integrated foundation for discovery. Journal of the American Society for Information Science and Technology 56(5): 538-550.

Tennant, M.R. & Miyamoto, M.M. 2002. The role of medical libraries in undergraduate education: a case study in genetics. Journal of the Medical Library Association 90(2): 181-193.

Tennant, M.R. 2005. Meeting the information needs of genetics and bioinformatics researchers. Reference Services Review 31(1): 12-19

Tennant, M. R. & Lyon, J.A. 2007. Entrez Gene: A gene-centered "information hub". Journal of Electronic Resources in Medical Libraries 4(3): 53-78.

Voet, J.G., Bell, E., Boyer, R., Boyle, J., O'Leary, M., & Zimmerman, J.K. 2003. Mini-Series: the ASBMB recommended biochemistry and molecular biology undergraduate curriculum and its implementation: recommended curriculum for a program in biochemistry and molecular biology. Biochemistry and Molecular Biology Education 31(3): 161-16

Key Links

Genes & Disease

Genomics Bioinformatics Tools: Tips, Tutorials, and Terminology for Using Selected Resources in Genome Database Guide.

getfast -- Free online assessment tool

Google Patents

National Center for Biotechnology Information (NCBI)

National Center for Biotechnology Information. A Science Primer.
Available: {}

Online Mendelian Inheritance in Man (OMIM)


Appendix A - OMIM and Gene Task Questions

  1. What is the inheritance pattern of the disease? (i.e., dominant, recessive, autosomal, sex-linked)
  2. What is the name of the gene(s) thought to be involved in causing the disease?
  3. Which of these genes, if there are several, have you chosen to examine? Why did you make this selection?
  4. What types of mutations have been discovered in this gene (i.e., insertion, deletion, point mutation, nonsense mutation)? Select one mutation to describe in your poster. Did this mutation arise in replication or transcription?
  5. What is the normal function of the protein encoded by this gene? How does the mutation you selected above (in your answer to question 4) alter the function of the protein, and lead to the disease? Can you now explain the Mendelian inheritance pattern of the disease on the basis of the available molecular data (i.e., why is the disease dominant/recessive based on the mutation?)?
  6. On what chromosome is the gene located?
  7. In what year was the gene first identified?
  8. How many exons and introns are present in this gene?
  9. What is the length of the mature mRNA for this gene?
  10. On which chromosome is the gene located (Copy and paste the ideogram found in the Gene record for your poster).
  11. On which type of chromosome is the gene located (i.e., metacentric, telocentric or acrocentric)? On which arm of the chromosome is the gene located?

Appendix B -- Protein Task Questions

Using ProtParam, please provide answers to the following questions:

  1. How many amino acids are found in the wild-type form of the human protein encoded by this gene?
  2. What is the predicted molecular weight of the protein? The units are Daltons (Da).

Appendix C -- BLAST Task Questions

For this exercise, you will be asked to do a protein blast (blastp) to search the RCSB Protein Data Bank for structural information. Using the protein record you found in the last exercise, highlight and copy the sequence of the protein into the query window of the blastp site.

  1. Make note of these putative conserved domains and include a copy of the figure in your poster.
  2. Select the entry with the highest score (this should be the top record listed). Click on the hyperlinked pdb access code located to the left of the record. Does the record contain structural information for the entire protein of interest? If not, which portion of the protein is contained within the record?

Previous Contents Next

W3C 4.0