Design Principles for Scientific Software
From I573Wiki2007
Contents |
Hurdles to writing good scientific software
- Writing software that actually does consistently what you claim it does
- Particular problem when using methods which are not 100% accurate
- What happens if the user tries to do something that will give a wrong result?
- Efficiency
- How fast to write the code and how fast should the code run?
- Traditionally fast languages - Fortran, C, C++, Assembly/Inline
- Slower languages getting faster - Java, Perl Python, Ruby
- Writing software that is relevant to the scientific problems
- What is the program for? Is it a decision making tool, an exploratory tool or what? How does it actually impact the science?
- Flexibility vs complexity tradeoff
- Handling situations where the science has to change to accommodate the computation
- How are tools/algorithms mapped to the science?
- Scripting tools vs single-application tools
- Working with different cultures and world-views
- Writing software which has high usability
- Project management challenges
Some classic failures
- MSNBC - Software disasters are often people problems
- Large list of computer failures in RISKS digest
- Wikipedia - software engineering disasters
- ThisIsBroken
- The Daily WTF
Writing working code
- Does it do what you think it is doing? - see Two disasters caused by computer arithmetic problems
- Does it do what the real user thinks it is doing?
- Don't re-invent working wheels but do re-invent broken wheels
- See e.g. SciPy
- Testing at the center of development - use unit tests
- Versioning system together with realistic unit tests is essential for any complex development problem
- Documentation and maintainability
- For nondeterministic or non-provable methods, what is the applicability and confidence of the method? Can this be communicated to the user?
Writing efficient code
Timescales
- Interactive time (< 5 minutes)
- What is the time tolerance?
- Asynchronous (5 mins - 48 hours)
- How is the user informed that the job is complete?
- What is the time tolerance?
- What variation is possible due to scheduling systems etc?
- How long should results persist?
- Long term (weeks or months)
- How will the science change during the period of computation
- Need to accommodate equipment failure, restarts, and so on
- How can the user be informed of progress?
Ways of improving efficiency
- "Premature optimization is the root of all evil (or at least most of it) in programming" (Knuth)
- Processor level optimizations
- Parallelization
- Volatile vs disk memory and cacheing
- Algorithmic improvements - complexity reduction
- Always use best algorithm for data and task - e.g. don't default to bubble sort. See Numerical Recipes books and Knuth's The Art of Computer Programming
- "Virtual" improvements - e.g. returning partial results, coarse results then refined, etc.
Software relevance
- What are scientists doing now?
- Why do they need your software?
- How does it help?
- Makes it easier or quicker to do things they already do
- Enables them to do new things
- Does it require a shift in thinking or workflows? How do you plan to make that happen?
- Contextual Design
- Interaction Design
Software usability
- How does the software integrate with the world they work in?
- How easy it it for real people to perform tasks with the software?
- Where is it on the scale of
- Straightfoward - complicated?
- Pleasing - irritating?
- Useful - useless?
- Usability Engineering book
- Don't make me think! book
- IU UITS User Experience Group
- Usability.gov
- UIE
- UserExperience.org
- NASA Usability Engineering Team
- Edward Tufte's website
- Web Style Guide
- Good/Bad Usability Examples
Project management challenges
- Budgeting, 'man month', complex raplidly changing science
- Difficult questions:
- How long will it take?
- What are the exact specifications?
- How much will it cost?
- Easier questions
- What expertise is needed?
- Who is it aimed at?
- What is the anticipated impact on the science and/or business?
- What constitutes success and failure (c.f. unit testing)
- Agile Methodologies may be the best approach
Best practices
- Use some combination of Contextual Design and/or Interaction Design maybe in an Extreme Programming development environment
- Follow the "code is cheap" approach
- Use established alogrithms and methods where possible
- Consider efficiency of development AND execution
- ALWAYS do some form of usability study - formal or informal
- Consider unit testing - essential for a large complex system
