My hierarchy of testing

My hierarchy of testing

I originally wrote this when I worked at Apple, and it was the most popular thing I ever wrote.

This document comes from a long career in software testing where I went through more than a dozen full release cycles.

I’ve executed and observed 5 major types of testing, seen the bugs they catch, and seen the overall quality of projects that rely on different types of testing. I don’t have objective data to back any of this up, so this is purely my subjective observations. After you read this, you’ll see that I value the subjective very highly in testing despite the fact that pretty charts and graphs can be compelling.

I think this is important because everyone has scarce testing resources and needs to get the best bang for the buck. Many companies invert their emphasis and I think that goes a long way to explaining why much of the software out there is poor quality.

So, here are what I see as the five types, in order from most to least important.

Ad hoc testing

This type of testing goes by many different names, but my favorite is “exploratory testing”:

Cem Kaner, who coined the term in 1984, defines exploratory testing as “a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the quality of his/her work by treating test-related learning, test design, test execution, and test result interpretation as mutually supportive activities that run in parallel throughout the project.

Exploratory Testing—Wikipedia

A great exploratory tester would have these qualities:

  • Incredibly knowledgeable about the products
  • Able to learn from past defects in the products and anticipate bugs
  • Ability to get into the heads of users
  • Deeply integrated with the engineering team (ideally, a full-fledged member), knowing how things work at the low level
  • Work with other teams on cross-functional features about to be submitted

Amongst the most valuable type of exploratory testing is pre-submission testing. In addition to making sure base functionality still works (possibly handled by automated testing) before submission, a good exploratory tester will:

  • Watch all commits from the engineering team and formulate things that seem important to focus on
  • Work with engineers on how new functionality is supposed to work so it can be verified
  • Verify the build is good enough to submit so that it will be testable by other testers the next day. Otherwise you lose a valuable day of testing.
  • Test with real user data, ideally their own. This allows spotting anomalies much quicker than with canned data except in very simplistic cases. 

This type of testing is mostly gone from the world. QA typically does black box testing or the other types of testing shown below.

Black box testing

If you have enough testing resources, black box testing is a great complement to Ad hoc testing. Here’s where it differs:

  • Typically there are test plans that try to cover all the functionality of the project that will be tested before release
  • Testing is generally done all after a submission has happened or possibly after a complete feature has been submitted
  • There is a focus on finding regressions moreso than testing new features before they are fully formed

Because Ad hoc testing tends to be more free form, black box testing can help make sure all bases are covered. 

Dogfood testing

I would put dogfood testing very close in importance with black box testing. Between these first three types of testing, I think it’s possible to ship a quality product. Dogfood testing is unique in a few ways:

  • A dogfood user is potentially using a build all the time (depending on the product) rather than just in their office when they are executing tests, so many unique scenarios are now available
  • Every user is different and even the best ad hoc or black box testers can’t conceive of the edge cases a dogfood tester can find
  • Someone that depends on their device will communicate their frustration more accurately than a tester at a desk and this can be valuable feedback to weight the importance of an issue

Automated low-level testing

While I did say that the top three types of testing are enough to ship a quality product, automated testing does allow repetitive testing that is either impossible for a QA engineer to accomplish manually or just downright tedious. Here are some things that are great for automated low-level testing:

  • Performance – who wants to sit around with a stopwatch and run something numerous times?
  • Multiple-device scenarios – if you want to do a low-level test on every shipping iOS device, for example, probably a great candidate for automation
  • Correctness – unit tests especially can be great to verify the correctness of various blocks of code
  • Setup – device setup can be automated and free a tester up to do the actual testing

One VERY important point is that you should automate ONLY when it’s apparent that a manual approach is a problem or inefficient. Never automate for the sake of it as it can be very time consuming. Better to have your smart testers do manual testing and then automate only if it becomes clear that automation will be a time saver.

Automated UI testing

I’ve seen little value and a lot of wasted time over the years with automated UI testing. Very smart testers toiling away when they could be finding real bugs. There are a number of problems:

  • You can’t reliably test new features – there is too much churn and you end up wasting time adapting to the changing feature and not finding bugs.
  • UI testing is inherently prone to breaking even if it’s not testing new features. Perhaps the automation tools break often or the drawing infrastructure changes.
  • Depending on how you do it, UI testing can often not be representative of how a user uses a device, so you find irrelevant bugs (testing with robotics being a notable exception)
  • Automated UI tests are tests where a human is not watching them at all, so you could miss other bugs that are present
  • An automated test only knows that the success condition was not achieved, but not why. A human doing it knows the why.

I would recommend this type of testing only in very carefully chosen cases. There must be a potential for high-value return and the time saved versus a manual tester has to be high. 

Conclusion

I believe many companies focus far, far too much on automated testing. And it’s not hard to see why. We’re engineers. Hard data. Hard numbers. Charts. Graphs. Management wants concrete proof that a release is ready to go out the door. Asking a bunch of testers what they think subjectively doesn’t have the same level of comfort as a chart with a line descending to zero.

When I was at Apple, I was once debating a high-level manager (Scott Forstall, who later headed the iPhone team) about automated tests. He beamed at me and said: “You know, Microsoft runs more than 1 MILLION automated tests every night on Internet Explorer!”

My snarky comeback: “But it’s a piece of shit!”

And I was right, it was a piece of shit back in the early 2000s on macOS. It doesn’t mean their automated tests were a total waste of time. There are many hands in every software project and lots of blame can be passed around, but my point is that the automated tests didn’t ensure a quality product.

I can’t prove it, but I feel a correlation between the increased reliance on automated testing and the decline in quality of software. Perhaps we need all 5 types of testing, but we need to invert the pyramid and re-prioritize. Can’t hurt to try.