Came across a great article by James Bach titled “How to Investigate
Intermittent Problems”. There’s some great advice in there,
particularly around suggestions on how to deal with elusive problems
and some great principles. Much recommended. Read here
Here’s some goodness from the article:
Some Principles of Intermittent Problems:
- …
- Be comforted: the cause is probably not evil spirits.
- If it happened once, it will probably happen again.
- If a bug goes away without being fixed, it probably didn’t go away for good.
- Complex and baffling behavior often has a simple underlying cause.
- Complex and baffling behavior sometimes has a complex set of causes.
- Intermittent problems often teach you something profound about your product.
- It’s easy to fall in love with a theory of a problem that is sensible, clever, wise,
and just happens to be wrong.- The key to your mystery might be resting in someone else’s common knowledge.
- The Pentium Principle of 1994: an intermittent technical problem may pose a *sustained
and expensive* public relations problem.- The problem may be intermittent, but the risk of that problem is ever present.
- …
Some General Suggestions for Investigating Intermittent Problems:
- …
- Recheck your most basic assumptions: are you using the computer you think you are
using? are you testing what you think you are testing? are you observing what you
think you are observing?- Invite more observers and minds into the investigation.
- If someone tells you what the problem can’t possibly be, consider putting extra attention
into those possibilities.- Seek tools that could help you observe and control the system.
- Establish a central clearinghouse for mystery bugs, so that patterns among them might
be easier to spot.- Systematically cover the input and state spaces.
- Consider controlling things that you think probably don’t matter.
- Simplify. Try changing only one variable at a time; try subdividing the system. (helps
you understand and isolate problem when it occurs)- Complexify. Try changing more variables at once; let the state get “dirty”. (helps
you make a lottery-type problem happen)- Beware of burning huge time on a small problem. Keep asking, is this problem worth
it?- When all else fails, let the problem sit a while, do something else, and see if it
spontaneously recurs.- …
Much more where those came from. Read here