[Update: Added links for Hexawise and Allpairs, which I’d meant to do earlier.]

Data driving, also often called parameterization, is a wonderful way to increase your test coverage though parts of your system; however, it’s also a seductive, alluring tool that can needlessly explode your tests’ complexity and execution cost.

Let’s tackle the cons of data driving first, then walk through how it can, when mindfully used, lend some great value to your automation suites.

Problems with Data Driving

Automated test scripts need to be treated like production code—because they are production code! With that in mind, as test automation folks we need to focus on keeping tests granular, specific, and simple. I generally see two issues with data driven testing:

  1. Conflating test cases/increasing overall complexity
  2. Too much data

Exploding Complexity

It’s easy to fall into the trap of creating a huge dataset to cover all your different scenarios in one fell swoop. Unfortunately, this creates test cases that aren’t focused, have mixed responsibilities, and are harder to decipher and maintain. It’s also a bad approach to conflate, or mix up, different test cases.

The example I often use revolves around a social media platform. Such platforms generally provide you many different avenues to create various types of content: forum threads, blog posts, wiki articles, and usually there are different roles by which this content can be created as (owner, administrator, moderated user, unmoderated user, etc.)

You could use a data driven test to create one huge “Check users can create content” test method which checks whether users can create different kinds of content for blogs, forums, wikis, etc. You can gin up an Excel spreadsheet that handles all the permutations you want.


The problem with this approach is it badly mixes different concerns. You have mixed types of behaviors going on, you have mixed types of content being generated, and you’re working through different permission scenarios with different roles.  Your actual test script will have to use multiple conditional statements to branch off appropriately based on the specific scenario you’re working through. (Conditionals can explode the complexity of software and test cases. Read up on cyclomatic complexity for more background information.)

Another problem with this is simple maintainability. It’s very hard to come back to a test with this many mixed concerns weeks or months later and try to decipher what exactly is being done.


Fix: Split Out Test Cases

Avoid these mixed concerns by splitting out the multiple behaviors/scenarios into separate, well-named tests with much smaller, more specific sets of data. Following the example above, at the bare minimum you’d want to split out your content creation by content type. Forums separately, for example.


A very good discussion could be had amongst the team whether it might make sense to split these into completely separate methods as well, testing each type of role and its applicable success or failure, but I think you get the idea that this is a starting point.

Exploding Data Size

Just because you can run several thousand rows of data through a method doesn’t mean you should! Huge sets of data beget extraordinarily long test execution times, especially when we’re talking automated UI tests. (Even unit tests bog down with 100,000 rows of data!)

At some point multiple sets of data are pointless. Simply iterating through repetitive values makes no sense if the sets of data aren’t triggering different values.

Fix: Cut the Number of Test Data Rows with Boundary Values

Two concepts can help us trim, often dramatically, the number of rows we need to iterate through. First, choose values which focus on borders/boundaries in your system. (Read up on boundary value analysis for even more information.)

For example, say you’re testing a payroll system and you’re looking to check proper computations for overtime and regular time. Overtime kicks in if an employee works over eight hours in a day. Your initial thought might be to run something like this through:


The problem is the vast majority of those combinations don’t test anything different in the system. Instead, you want to check the boundaries of the overtime algorithm. This means use one value above and one below 8, plus perhaps something at zero and 24—the boundaries of the day. You should also have values outside the boundaries of a regular day.


Fix: Cut the Number of Test Data Rows via Combinatorial Testing

Do you have data/condition matrixes with more than two columns/parameters? If so, you need to read up on pairwise or combinatorial testing. Quickly.

Matrices explode our test data complexity because we need to test interaction between the various parameters. Think of Google’s Maps feature that lets you compute driving directions between points. Justin Hunter, the founder of Hexawise, has a great example showing the number of permutations needed to get 100% coverage of every value option combination: it’s a horrifying 16 trillion or so.

Combinatorial tools such as Hunter’s Hexawise, James Bach’s allpairs, or other similar tools, can help you dramatically cut the number of test cases in these situations while still maintaining outstanding and reliable test coverage. For example, Hexawise can cut Google’s example from 16 trillion down under 30.

Each combinatorial tool gives you options for “dialing” coverage up or down to as you need, but the basic point is this: combinatorial tools can save you incredible amounts of time and effort. (The site pairwise.org is a great place to get introduced to the concepts of pairwise/combinatorial testing. You can also read some great information and work through manual exercises in Dr. Cem Kaner and James Bach’s book Lessons Learned in Software Testing.)

Save Your Sanity: Cut Your Data Driven Complexity

We do need to ensure we’re getting good test coverage; however, make sure you’re carefully thinking through what it is you’re testing. Data Driven Testing is a wonderful tool, but avoid overly complex, large data sets. Focus on single behaviors/scenarios, and keep your datasets small and tight.

You’ll be much happier you did!

About the author

Jim Holmes

Jim Holmes

has around 25 years IT experience. He is co-author of "Windows Developer Power Tools" and Chief Cat Herder of the CodeMash Conference. He's a blogger and evangelist for Telerik's Test Studio, an awesome set of tools to help teams deliver better software. Find him as @aJimHolmes on Twitter.

Related Posts


Comments are disabled in preview mode.