// Implicit group for all visitors $wgGroupPermissions['*' ]['createaccount'] = false; $wgGroupPermissions['*' ]['read'] = true; $wgGroupPermissions['*' ]['edit'] = false; $wgGroupPermissions['*' ]['createpage'] = false; $wgGroupPermissions['*' ]['createtalk'] = false; // Implicit group for all logged-in accounts $wgGroupPermissions['user' ]['createaccount'] = true; $wgGroupPermissions['user' ]['move'] = true; $wgGroupPermissions['user' ]['read'] = true; $wgGroupPermissions['user' ]['edit'] = true; $wgGroupPermissions['user' ]['createpage'] = true; $wgGroupPermissions['user' ]['createtalk'] = true; $wgGroupPermissions['user' ]['upload'] = true; Main Page - I573Wiki2007

Main Page

From I573Wiki2007

Contents

Welcome

Welcome to the Wiki for the 2007 I573 Programming for Chemical and Life Science Informatics class. I573 (formerly I590) is a 3CH graduate course aimed at giving students knowledge of the programming, algorithmic and software techniques used in the chemical and life science informatics disciplines (see Syllabus for more information. It is taught by David Wild and Rajarshi Guha, and as well as being delivered to classroom students in Bloomington and Indianapolis, will be offered as a Distance Education course to any graduate in the US through teleconference and web conferencing services.

Information on editing wiki pages can be found here.

All classes meet at 9.30am in Informatics I105.

Contact information:

Scratch Page

Scratch

Connection information for Distance Students

  1. From your phone, dial 1-800-940-6112, and enter the passcode
  2. Go to http://breeze.iu.edu/cheminfo1 and log in as a guest
  3. If you have any accessing problems during the class, try sending a chat message in Breeze. If that doesn't work, interrupt the teleconference, or call David's cellphone (number will be given out in email)

Class Schedule

Date Lecturer Subject
Jan 9 David Wild Introduction to the course and scientific computing for the life sciences ** Recording **
Jan 11 David Wild Tools and methodologies for the scientific computing group ** Recording **
Jan 16 David Wild Live creation of a server, wiki, versioning and collaboration system to use through the course ** Recording **
Jan 18 David Wild Continuation of live creation of server environment, Introduction to Visual Studio ** Recording **
Jan 23 Rajarshi Guha Client-side programming with Eclipse ** Recording **
Jan 25 Rajarshi Guha Debuggers and Profilers ** Recording **
Jan 30 Rajarshi Guha Statistical Programming with R ** Recording **
Feb 1 Rajarshi Guha Programming for Chemistry with the CDK ** Recording **
Feb 6 Rajarshi Guha Programming for Chemistry with OEChem ** Recording **
Feb 8 David Wild Introduction to developing web-based applications ** Recording **
Feb 13 Rajarshi Guha Optimization and Machine Learning Methods I ** Recording **
Feb 15 Rajarshi Guha Optimization and Machine Learning Methods II ** Recording **
Feb 20 David Wild Design Principles for Scientific Software ** Recording **
Feb 22 David Wild CLASS CANCELLED
Feb 27 Randy Heiland Scientific Visualization Software ** Recording **
Mar 1 David Wild Usability studies and contextual design ** Recording **
Mar 6 Rajarshi Guha Semantic Web Languages I ** Recording **
Mar 8 David Wild Fundamentals of Parallel Programming ** Recording **
Mar 13 - NO CLASS - spring break
Mar 15 - NO CLASS - spring break
Mar 20 David Wild The world of mash-ups and Web 2.0 ** Recording ** See also ACS Talk
Mar 22 Rajarshi Guha Semantic Web Languages II ** Recording **
Mar 27 Geoffrey Fox Physics Experiment Data Analysis ** Recording **
Mar 29 David Wild A glance through some more languages
Apr 3 Johann Gasteiger Chemistry colloquium talk ** Recording (RM File) **
Apr 5 Rajarshi Guha Web development with Java ** Recording **
Apr 10 Rajarshi Guha Graph Theoretic Techniques ** Recording **
Apr 12 Bob Clark
Tripos
What's a chem(o)informatician to do? ** Recording **
Apr 17 David Wild Pipelining Tools ** Recording **
Apr 19 Santiago Schnell Complex Systems Applications ** Recording **
Apr 24 Albert Fahrenbach O2HU - Online collaboration for Chemistry
Apr 26 - NO CLASS

Coursework

Grading for this class will be based on short assignments (50%) and a final project (50%). For the short assingments, please email your submissions directly to the tutor (David or Rajarshi), with the subject line "I573 assignment (assignment number) for (your name)". Please only put one assignment per email. Assignments will be listed below, with due dates and how much they count toward your grade. See the Main Project page for main project suggestions.

Assignment 1: Source code submission (5% of grade, Due Feb 8)

Write a small piece of code in any language to do anything you want. Create a subversion directory for yourself (as per the instructions in Rajarshi's email of Jan 24), and commit your code to the repository inside your directory. Go through at least 2-3 modify - commit cycles on the code. Then create yourself an entry (a heading 2 with your name) on the Assignment1 page. Under your entry, give a very brief description of the code and paste the subversion log (svn log).

Assignment 2: Simple HTML page with PHP (5% of grade, due Feb 26)

Write a small homepage in HTML and PHP in your cheminfo web directory (in your home directory, cd into public_html. Files created here will be available with a URL http://cheminfo.informatics.indiana.edu/~yourusername/). At a minimum, your page should give your name and display the current date using PHP. Once you are done, paste your name and the URL of your page into the Assignment2 page.

Assignment 3: Predictive Models with R (40% of grade, due Mar 31)

Build linear and non-linear models to predict whether a person will suffer from diabetes or not. The specific model type is your choice. A training set is provided here and is a subset of the Pima Indian data set from the UCI machine learning repository. The data file has 9 columns, the first column being the variable indicating whether that person has diabetes or not. The remaining 8 columns are described in the README. Since the variable to predict is categorical, this is a classification problem.

This assignment has the following components

1. Provide a summary of the data

    • How many people had diabetes and how many did not?
    • Mean and median of each predictor?
    • Which pair of predictor variables have the maximum correlation?

2.Choose a linear classifier of your choice and a non-linear classifier

    • Build a predictive model using the training set and all the variables
    • Generate a confusion matrix, which summarizes how many diabetics were correctly identified as diabetics, how many we incorrectly identified as nondiabetics and so on. See here for more details on the confusion matrix

3. Save the model as an R binary file (see the save() command) 4. Create a wiki page (off Assignment3 that summarizes your results from steps 1 and 2. Also provide a link to your saved R model (generated in step 3). The page should be placed on your web directory on cheminfo.

We will download the models and then use them to predict a test set. Whoever gets the best accuracy on the test set gets a Snickers :)