COMP 6821 Bioinformatics Databases and Systems

Winter 2008 Semester: January 9 to April 9, 2008

Lectures: Wednesdays 17:45 to 20:15 in FG-355


Course Objectives

The principal objectives of the course are to survey the needs of bioinformatics for data management, knowledge management, and computational support; to provide in-depth description of an example of each kind of database and system; and to introduce advanced database technology and software technology relevant to the needs of bioinformatics.

Bioinformatics is a relatively new discipline dealing with the computational needs of genomics. Biology has become a data-intensive activity. Genomics databases must deal with this variety and scale, as well they must integrate disparate databases that are their information sources; must provide flexible, friendly user interfaces for querying and data mining; and cope with incomplete and uncertain data.

Information technology must also address the workflow within the laboratory, including the automated analysis of data. Some analysis techniques require significant computational resources, and the management of large-scale distributed computation is an issue.


Information Sources

Bioinformatics : managing scientific data, edited by Zoe Lacroix and Terence Critchlow, San Francisco, CA : Morgan Kaufmann Publishers, c2003. QH 324.2 B55 2003. This book is available in Webster Library Reserve.

Bioinformatics: Databases and Systems, editted by Stan Letovsky. Kluwer Academic Press, Boston, 1999. QH 441.2 B55 1999 This book is available in Webster Library Reserve.

IBM Systems Journal, volume 40, no. 2, 2001.

Selected journal and conference articles. Selected web sites. Selected software systems, databases systems, their documentation, and implementations.


Evaluation

Students are required to complete five individual reports (each 20%).

The assignments will be written reviews that (a) critique a particular database or system in the context of other available systems for the task at hand; or (b) discuss an issue by reviewing examples of approaches to the issue in existing databases or systems. They will require detailed knowledge of the systems.

Each report should be 5-8 pages (definiteley no more than 10 pages), in IEEE conference format, two-column, 8pt. (See the Latex2e styles at IEEE.)

You will have about 2-3 weeks to do each report.


Lecture Overview

Week 1-2: Introduction to Bioinformatics.

Week 3-5: Bioinformatics Databases, and Issues: Quality, Provenance, Integration.

Week 6-9: Bioinformatics Computation, Web services, Libraries, Workflow.

Week 10-13: Data and Text Mining, Graph Databases, Review of Issues.


Announcements


Course Details

2008-05-19: Marks for all assignments.

2008-02-27: Marks for assignments 1 and 2.

2008-01-16: Assignment Schedule

Details will follow for each assignment.

Assignment 1: Deadline 2008-02-03 at 11:59pm submitted electronically.
Select a database site for one of the model organisms. Discuss the data contents, schema, browsing, and querying facilities. Is there anything unique or special about the database when compared to other model organism databases?

Assignment 2: Deadline 2008-02-15 at 11:59pm submitted electronically.
Describe BioKleisli as a database system. Discuss the data integration features of BioKleisli in particular. Compare Biokleisli with Biozon.

Assignment 3: Deadline 2008-03-09 at 11:59pm submitted electronically.
Describe Taverna and its implementation. Discuss its use of web standards and its use of non-standard technology. How do bioinformatics workflow systems (and their needs) differ from business workflow?

Assignment 4: Deadline 2008-03-30 at 11:59pm submitted electronically.
Discuss the issue of data quality, accuracy, precision, and provenance in bioinformatics databases and workflow systems. Give examples from different databases and systems to illustrate both good and bad practice with regards to these issues.

Assignment 5: Deadline 2008-04-13 at 11:59pm submitted electronically.
Discuss the role of ontologies in bioinformatics databases and systems. What issues or problems have ontologies already solved for bioinformatics? What problems have they not solved?

2008-01-09: Course lectures (internal only)

2008-01-09: Course Outline


Last modified on May 19, 2008 by gregb@cs.concordia.ca