Week 4 Homework
Simple Molecular Properties
Given the molecules in compounds.sdf write
code, using your toolkit of choice, that will evaluate the following properties of each the
molecules. After you have calculated these values, generate histograms
of each property. Submit the code and the final histograms (as
PDF's).
- Molecular weight
- Number of heavy atoms
- Number of halogens
SD File Parser
Implement a parser for the SD file format. See here
for the specification. Given the parser, you will then parse this SD
file and determine
- the average molecular weight of the molecules and
the distribution of C, N and O over the molecules. You
can use the atomic weight data from here
- the number of molecules that
were parsed
- the time
it takes for your code to parse a single molecule on average. (Also report
the machine details such as CPU, RAM and disk speed)
You can implement it in any language you want but you cannot use a
toolkit like OEChem or CDK. This must be implemented from
scratch. Make sure that the code is robust so your program does not
crash when faced with a malformed SD file. Note, that you don't need
to implement the full specification. You only need to implement the
spec to the extent that you can parse the specified SD
file. (Specifically, don't bother with Rxnfiles, RDfiles,
XDfiles. Instead focus on Molfiles and SDfiles. Pages 37 & 44 of the spec is
a good summary of what you need to do)
You should submit the source code (and any dependencies or compilation
instructions) along with a text file indicating the average molecular
weight and running times and a PNG or PDF of the histogram.
Evaluation
Each of the two questions are worth 20 points, broken down as follows
- 15 - implementing the requirements
- 5 - correctness
The due date for this homework is 19th
February
Last modified: Fri Jan 2 15:47:15 EST 2009