Main Page
From I573Wiki2007
Contents |
Welcome
Welcome to the Wiki for the 2007 I573 Programming for Chemical and Life Science Informatics class. I573 (formerly I590) is a 3CH graduate course aimed at giving students knowledge of the programming, algorithmic and software techniques used in the chemical and life science informatics disciplines (see Syllabus for more information. It is taught by David Wild and Rajarshi Guha, and as well as being delivered to classroom students in Bloomington and Indianapolis, will be offered as a Distance Education course to any graduate in the US through teleconference and web conferencing services.
Information on editing wiki pages can be found here.
All classes meet at 9.30am in Informatics I105.
Contact information:
- David Wild, djwild@indiana.edu
- Rajarshi Guha, rguha@indiana.edu
Scratch Page
Connection information for Distance Students
- From your phone, dial 1-800-940-6112, and enter the passcode
- Go to http://breeze.iu.edu/cheminfo1 and log in as a guest
- If you have any accessing problems during the class, try sending a chat message in Breeze. If that doesn't work, interrupt the teleconference, or call David's cellphone (number will be given out in email)
Class Schedule
Coursework
Grading for this class will be based on short assignments (50%) and a final project (50%). For the short assingments, please email your submissions directly to the tutor (David or Rajarshi), with the subject line "I573 assignment (assignment number) for (your name)". Please only put one assignment per email. Assignments will be listed below, with due dates and how much they count toward your grade. See the Main Project page for main project suggestions.
Assignment 1: Source code submission (5% of grade, Due Feb 8)
Write a small piece of code in any language to do anything you want. Create a subversion directory for yourself (as per the instructions in Rajarshi's email of Jan 24), and commit your code to the repository inside your directory. Go through at least 2-3 modify - commit cycles on the code. Then create yourself an entry (a heading 2 with your name) on the Assignment1 page. Under your entry, give a very brief description of the code and paste the subversion log (svn log).
Assignment 2: Simple HTML page with PHP (5% of grade, due Feb 26)
Write a small homepage in HTML and PHP in your cheminfo web directory (in your home directory, cd into public_html. Files created here will be available with a URL http://cheminfo.informatics.indiana.edu/~yourusername/). At a minimum, your page should give your name and display the current date using PHP. Once you are done, paste your name and the URL of your page into the Assignment2 page.
Assignment 3: Predictive Models with R (40% of grade, due Mar 31)
Build linear and non-linear models to predict whether a person will suffer from diabetes or not. The specific model type is your choice. A training set is provided here and is a subset of the Pima Indian data set from the UCI machine learning repository. The data file has 9 columns, the first column being the variable indicating whether that person has diabetes or not. The remaining 8 columns are described in the README. Since the variable to predict is categorical, this is a classification problem.
This assignment has the following components
1. Provide a summary of the data
- How many people had diabetes and how many did not?
- Mean and median of each predictor?
- Which pair of predictor variables have the maximum correlation?
2.Choose a linear classifier of your choice and a non-linear classifier
- Build a predictive model using the training set and all the variables
- Generate a confusion matrix, which summarizes how many diabetics were correctly identified as diabetics, how many we incorrectly identified as nondiabetics and so on. See here for more details on the confusion matrix
3. Save the model as an R binary file (see the save() command) 4. Create a wiki page (off Assignment3 that summarizes your results from steps 1 and 2. Also provide a link to your saved R model (generated in step 3). The page should be placed on your web directory on cheminfo.
We will download the models and then use them to predict a test set. Whoever gets the best accuracy on the test set gets a Snickers :)
