Week 4 Homework

Simple Molecular Properties

Given the molecules in compounds.sdf write code, using your toolkit of choice, that will evaluate the following properties of each the molecules. After you have calculated these values, generate histograms of each property. Submit the code and the final histograms (as PDF's).
  1. Molecular weight
  2. Number of heavy atoms
  3. Number of halogens

SD File Parser

Implement a parser for the SD file format. See here for the specification. Given the parser, you will then parse this SD file and determine
  1. the average molecular weight of the molecules and the distribution of C, N and O over the molecules. You can use the atomic weight data from here
  2. the number of molecules that were parsed
  3. the time it takes for your code to parse a single molecule on average. (Also report the machine details such as CPU, RAM and disk speed)

You can implement it in any language you want but you cannot use a toolkit like OEChem or CDK. This must be implemented from scratch. Make sure that the code is robust so your program does not crash when faced with a malformed SD file. Note, that you don't need to implement the full specification. You only need to implement the spec to the extent that you can parse the specified SD file. (Specifically, don't bother with Rxnfiles, RDfiles, XDfiles. Instead focus on Molfiles and SDfiles. Pages 37 & 44 of the spec is a good summary of what you need to do)

You should submit the source code (and any dependencies or compilation instructions) along with a text file indicating the average molecular weight and running times and a PNG or PDF of the histogram.

Evaluation

Each of the two questions are worth 20 points, broken down as follows

The due date for this homework is 19th February


Last modified: Fri Jan 2 15:47:15 EST 2009