CS 5950 - 6030 Bioinformatics

 
 

Bioinformatics

 

[ Courses ] [ assignments ] [ project ] [ presentations ]

   

Instructor
Dr. Elise de Doncker
Department of Computer Science
College of Engineering and Applied Sciences
B-240 Parkview Campus
Kalamazoo, MI-49008

Phone: (269) 276-3102 (office), 276-3101 (Dept. office) 276-3122 (fax)
(but preferably contact me by e-mail: elise [dot] dedoncker [at] wmich [dot] edu)

Office hours
MW 13:15 - 14:00 (please let me know if you plan on coming); other times by appointment

Grader:
Jiadong Gui, email: jiadong.gui@wmich.edu

Texts

  • Required:
    • Introduction to Bioinformatics Algorithms. Neil C. Jones and Pavel A. Pevzner, The MIT Press (ISBN: 0-262-10106-8)
  • Recommended:
  • Other texts:
    • Bioinformatics: High Performance Parallel Computer Architectures. Bertil Schmidt (Ed.), Series: Embedded Multi-Core Systems, CRC Press (ISBN: 9781439814888).
    • Parallel Computing for Bioinformatics & Computational Biology. Albert Y. Zomaya (Ed.), John Wiley & Sons, Inc. (ISBN: 9780471718482)
    • Data Mining for Bioinformatics. Sumeet Dua, Pradeep Chowriappa, 2013 Taylor & Francis Group LLC, CRC Press (ISBN: 978-0-8493-2801-5)
    • Data Mining in Bioinformatics. Jason T.L. Wang, Mohammed J. Zaki, Hannu T.T. Toivonen, Dennis Shasha (Eds), Springer-Verlag London Ltd. (ISBN: 1852336714)
    • Introduction to Bioinformatics. Arthur M. Lesk (ISBN: 979-0-19-920804-3)
    • Advanced Data Mining Technologies in Bioinformatics. Hui-Huang Hsu, Idea Group Publishing (ISBN: 1-59140-863-6 hardcover; 1-59140-865-2 ebook)
    • Mastering Perl for Bioinformatics. James D. Tisdall, O'Reilly (ISBN: 0-596-00307-2)
    • Python for Bioinformatics. Sebastian Bassi, Chapman & Hall/CRC, Taylor & Francis Group (ISBN 978-1-58488-929-8)
    • Ruby Programming for Medicine and Biology. Jules J. Berman, Jones and Bartlett Series in Biomedical Informatics (ISBN-10: 0763750905 | ISBN-13: 978-0763750909)
    • Algorithms in Structural Molecular Biology. Bruce R. Donald, The MIT Press (ISBN 978-0-262-01559-2)
    • Bioinformatics Sequence, Structure and Databanks. Eds. Des Higgins and Willie Taylor, Oxford University Press (ISBN 0-19-963791-1 Hbk., 0-19-063790-3 Pbk.)

Course description

The main algorithm design paradigms will be covered. Branch-and-bound algorithms can be seen as a clever way to improve exhaustive search for hard problems. Furthermore, when no greedy solution or divide-and-conquer algorithm is available we may be able to resort to a dynamic programming method, combinatorial search, approximation algorithm, or a stochastic approach (cf., the text book of Jones and Pevzner). The different techniques will be demonstrated on various applications in bioinformatics, such as to sequence alignment problems, motifs finding and pattern matching, protein structure/folding, prediction and modeling, with applications, e.g., to drug discovery Graph algorithms have important applications, e.g., in DNA sequencing. Hidden Markov Models are also used for determining patterns in genome sequences.

Tree structures are of fundamental importance, such as for hierarchical clustering, evolutionary trees and keyword trees. Data mining techniques are needed to make sense of huge amounts of data, which is generated and can be accessed via large data bases on the web. Tools for the manipulation of data are often implemented by bioinformaticians using languages such as Perl, Python, Ruby and XML.

Grading
There will be two tests and the final examination. The tests (tentatively) carry about 45-48% of the grade; assignments, semester project and presentations count for the remaining percentage. Of the two (midterm) tests, the lowest grade will be dropped. Problems with class attendance may lead to a failing grade.

Academic Integrity Policies
You are responsible for making yourself aware of and for understanding the policies and procedures in the Undergraduate Catalog (pp. 271-272) that pertain to Academic Integrity. These policies include cheating, fabrication, falsification and forgery, multiple submission, plagiarism, complicity and computer misuse. If there is reason to believe you have been involved in academic dishonesty, you will be referred to the Office of Student Judicial Affairs. You will be given the opportunity to review the charge(s). If you believe you are not responsible, you will have the opportunity for a hearing. You should consult with me if you are uncertain about an issue of academic honesty prior to the submission of an assignment or test.

Additional instructor's notes: The above policy includes cheating by submitting programming assignments or projects where a program (even in part) has been downloaded from the internet; this also applies to text in assignments. Cooperating among students on the homework is not allowed, unless it is work for a semester project conducted by a team of students. If you are caught there will be consequences.

Links
DNA replication video.
DNA transcription and translation videos.
DNA transcription and translation video.
DNA transcription and translation video.
DNA Sequencing - 3D video.
Protein folding.
Protein synthesis animation video.
From RNA to protein synthesis video.
Protein folding video.
Time-lapse of a nascent protein.
Creating the Suffix Tree - Conceptually.
Big Data Analytics in Bioinformatics ..., H. Kashyap et al..
Machine Learning in Bioinformatics, P. Larranaga et al..
Machine Learning in Bioinformatics, Y. Zhang and J. C. Rajapakse (Wiley text).
Machine Learning in Bioinformatics (Wikipedia page).
Compeau text web page.
Bertil Schmidt book Contents .
Rosalind web page.
Genscript gene news.
Wang et al. text web page.
Lesk text book web page.
The Bioperl Toolkit: Perl Modules for the Life Sciences, Jason E. Stajich et al., in Genome Research 2002, 12, 1611-1618, with many references and links.
A geometric approach for classification and comparison of structural variants, Suzanne Sindi et al., Bioinformatics(2009), 25 (12), i222-i230.
Software packages for discovering structural variation with next-generation sequencing , with many refs. to software packages (e.g., the link following).
High-resolution mapping of copy-number alterations with massively parallel sequencing , Derek Y Chiang et al., Nature Methods 6, 99 - 103 (2009).
Nucleic Acid Research, Oxford Journals.
Differential Equations and Mathematical Biology, Second Edition, D.S. Jones, M. Planck, B.D. Sleeman, Chapman & Hall/CRC Mathematical & Computational Biology (2009).
Elaine Rich Automata book.
Data sets used in Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting, M. Blanchette and M. Tompa.

[ Courses ] [ assignments ] [ project ] [ presentations ]