Programming for Chemical and
Life Science Informatics - I573
Spring 2008
Introduction
This class (formerly I590) is a 3CH graduate course aimed at giving
students a broad knowledge of
the programming, algorithmic and software techniques used in the
chemical and life science
informatics disciplines, though the main focus will be on
cheminformatics. In the area of programming we'll be looking at
tools to help write programs and toolkits that allow us to focus on
domain specific problems. We'll also look at some theoretical
topics, such as graph theory and machine learning,
which are widely used in cheminformatics and bioinformatics
problems. And finally, we'll also
cover a variety of technologies that are playing an important role in
todays cheminformatics and bioinformatics projects. Examples of
such technologies include workflows, wikis and blogs, ontologies and
so on.
You should be comfortable in at least one programming
language. The class will mainly focus on Python and Java, though the
examples should be easily translatable to other languages. There is
no language restriction for assignments, so you can use whatever
you're comfortable with.
The class is located in Bloomington, but
is also offered as a Distance Education course to any graduate in the US
through teleconference and web conferencing services. A few of the
lectures will be given by guest lecturers from industry and
academia.
The address of the mailing list for the class is
i573-sp2008-l@indiana.edu. Mails
sent to this address will be recieved by all members of the class.
Use the links below to jump to the details
- Place & Time
- Office Hours
- Distance Students
- Books & References
- Course Evaluation
- Course Outline
- Class Schedule
- Academic Policy
Place & Time
The class will be held on Tuesdays and Thursdays from 9:30am - 10:45am
in I 105.
Office Hours
By appointment
Distance Students
- From your phone, dial 800-940-6112, and enter the passcode
- Go to
http://breeze.iu.edu/i573 and log in as a guest
- If you have any accessing problems during the class, try sending a chat message in Breeze. If that doesn't work, interrupt the teleconference, or call Rajarshi's cellphone (number will be given out in email)
Books & References
Scientific programming
Python
Java
R
- Introductory tutorial
- Brief overview
of R and some basic statistics
- An excellent tutorial
that covers some advanced topics
- R Graph
Gallery is the best place to go if you want to see how to
draw a certain type of graph
- ESS - R support for Emacs
- Extending R
SQL
- SQL Tutorials (focusing on the language itself)
- Postgres specific information
- Chemistry cartridges
- gNova
is specific to PostgreSQL and is based on OEChem
(commercial)
- Torus
also allows Markush searching within an Oracle DB and is based on
the BCI toolkit (commercial)
- DayCart
is a cartridge for Oracle based on the Daylight toolkit (commercial)
- Tigress an
open source cartridge for PostgreSQL (based on OpenBabel)
- MyChem is
a cartridge for MySQL and is based on OpenBabel
- Programmatic Database Access
- Java uses JDBC
and specific databases require their own JDBC drivers (jar
files). The PostgreSQL driver is here
- Python uses the DataBase API (DB-API) to
support DB agnostic access. The preferred Python package for
PostgreSQL access is Pyscopg2. A
short tutorial
is available
- You can access a PostgreSQL DB via C using libpq
Toolkits
Academic Policy
The principles of academic honesty and
professional ethics will be vigorously enforced in this course,
following the
IU Code of
Student Rights, Responsibilities, and Conduct, and the
School
of Informatics Academic Regulations. This includes the usual
standards on acknowledgment of help, contributions and joint work,
even when you are encouraged to build on libraries and other
software written by other people. Cases of academic misconduct
(including cheating, fabrication, plagiarism, interference, or
facilitating academic dishonesty) will be reported to
IUB Office of Student
Ethics,
a branch of the Office of the Dean of Students. Your submission of
work to be graded in this class implies acknowledgement of this
policy.
If you need clarification or have any questions, please see the
instructor during office hours.
Evaluation
Course Outline
- Scientific computing
- Software engineering practice
- Methodologies
- Debugging
- Version control
- Domain specific toolkits
- Chemistry
- Biology
- Statistics
- Databases for chemistry
- Issues and challenges
- Cartridges for chemisry
- Mathematical and computational tools
- Machine learning
- Optimization methods
- Graph theory
- Programming for the web
- Web services
- Semantic web languages
- Distributed computing
- Scientific visualization
Course Schedule
| Week |
Dates |
Lecture |
Recording |
|
| 1 | 01/8 |
Introduction |
Week 1, Part 1 |
|
| 1 | 01/10 |
The Programming Environment |
Week 1, Part 2
Alternate Recording |
|
| 2 | 01/15 |
Debugging and Profiling |
Week
2, Part 1
Alternate Recording |
bug.c
Project list available |
| 2 | 01/17 |
Language Overview |
Week 2, Part 2
Alternate Recording
|
|
| 3 | 01/22 |
Guest Lecture (David Wild) Design Principles for Scientific Software |
Week 3, Part
1
Alternate Recording
|
|
| 3 | 01/24 |
Graph Theoretic Techniques I |
Week 3, Part
2
Alternate Recording |
Homework (due 12th February) |
| 4 | 01/28 |
Graph
Theoretic Techniques II |
Week 4, Part
1
Alternate Recording
|
|
| 4 | 01/31 |
Programming for Chemistry with
the CDK |
Week 4, Part 2
Alternate Recording
|
Pharmacophores in the CDK |
| 5 | 02/5 |
Programming for Chemistry with OEChem |
Week 5, Part
1
Alternate Recording
|
|
| 5 | 02/7 |
Programming for Biology with BioPython |
Week 5, Part 2 |
Protein
sequences for M. tuberculosis
KEGG enzyme
data
|
| 6 | 02/12 |
Optimization & Machine
Learning I |
Week 6, Part 1
Alternate Recording
|
|
| 6 | 02/14 |
Optimization & Machine
Learning II |
Week 6,
Part2
Alternate Recording |
|
| 7 | 02/19 |
Statistical Programming with R |
Week 7, Part 1 |
data1.csv |
| 7 | 02/21 |
Statistical
Programming with R |
Week 7, Part
2
Alternate Recording
|
|
| 8 | 02/26 |
Statistical
Programming with R - Data Analysis |
Week 8, Part 1
Alternate Recording |
Boiling Point data
Solubility data
Example exercises
Homework (due 17th March)
|
| 8 | 02/28 |
(continued) |
Week 8,Part 2 |
|
| 9 | 03/4 |
Databases for
Cheminformatics I |
Week 9, Part 1
Alternate Recording
|
|
| 9 | 03/6 |
Databases for
Cheminformatics I (continued) |
Week 9, Part 2
Alternate Recording
|
Project outline due |
| | 03/11 |
No class |
|
|
| | 03/13 |
No class |
|
|
| 10 | 03/18 |
Databases for Cheminformatics II |
Week 10, Part 1 |
|
| 10 | 03/20 |
Databases for Cheminformatics II
(continued) |
Week 10, Part 2 |
|
| 11 | 03/25 |
Guest Lecture (Jean-Claude Bradley)
|
Week 11, Part
1
Alternate Recording |
Screencast
Slides
Transcript
|
| 11 | 03/27 |
Semantic Web Languages I |
Week 11, Part 2
Alternate Recording |
|
| 12 | 04/1 |
Semantic Web Languages II |
Week 12, Part 1 |
|
| 12 | 04/3 |
Web Applications |
Week 12, Part 2 |
|
| | 04/8 |
No Class |
|
|
| | 04/10 |
No Class |
|
|
| 14 | 04/15 |
Web Applications (continued) |
|
|
| 14 | 04/17 |
Guest Lecture (Randy Heiland)
Scientific Visualization |
Week 14, Part 2 |
Slides |
| 15 | 04/22 |
Class Presentations |
Bin Chen
Jun Ma
Sriram Raghuraman
Steve Wathen
|
Project due
Week 15, Part 1
Alternative Recording
|
| 15 | 04/24 |
Class Presentations |
Dah Me Ko
Vidyashankar Venkataraman
George Krudy
Chirayu Goswami
|
Week 15, Part 2
Alternative Recording
|