riedquat - valueable resource for those who seek.
Home Blog Technical Reports Art Articles RapiDocs Coding Bugs Links Reviews Projects: CherBot Daimonin Gridarta

Automatic Test Generation - Heaven or Hell?

previous: SUN open sources Java
next: Critique of "Representing Explicit Attributes in UML"
twitter button

At the end of last week, I've attended a half day workshop / demonstration of AgitarOne, a tool that automatically generates unit tests for existing code. Though this was very very interesting, the workshop left me with mixed feelings.

What is automatic test generation?

JUnit is the industry standard test framework for Java. Though it's got unit in its name, it's not only suited for plain normal unit testing. It's also perfectly well suited for many other fields of testing like acceptance testing / system testing, integration testing, performance testing and more.

Automatic test generation generates JUnit test cases for existing Java code. Can this work? Oh yes it can. Agitar demonstrated a tool that does not only generate stubs for JUnit, but complete JUnit tests.

Heaven

Wouldn't this be nice if you'd have a tool that automatically creates the tests for you? It would. But this dream isn't brought to reality by Agitar. The tool naturally does not know the programmer's intentions. All it can do is analyze existing code and generate tests for two purposes. The tests can be used for regression testing, to find out whether changes or new features do not change the code's behaviour. And the extracted data can be reviewed to identify inconsistencies detected by the test generation tool.

Hell

Theory - and the real world

All this sounds good but in my opinion is not suited for the real world. The example demonstrated by Agitar was a very simple one, and honestly, it was primitive Java-starter-class code. Luckily we were allowed to bring some code of our own. I brought some code of my open source projects and let AgitarOne generate test cases for it. I ran three projects through AgitarOne, and AgitarOne was not able to create error-free (compilable!) test suites for any of them. That's not so much of an issue as I'm confident the company will fix these errors quickly, and they were only minor issues.

Performance - or the lack of

The speed of code generation was extremely poor. The architecture of the application is client/server, with clients being the developer's PCs for submitting tasks for automatic unit test generation to the servers and the servers performing these tasks. Agitar used two fast laptops with dual-core CPUs as servers for the workshop, running 6 jobs alltogether. In three hours they were creating a rough estimate of 180 test classes. This is 1 class per minute on 4 CPUs. Hell, I don't know anybody who would want to wait more than just a few seconds for something like that. Poor performance like that simply doesn't fit into the world of emerging agile processes.

Generated tests - and what they say

The number of single test cases of course was big. Quite often, the test cases reached a statement coverage of more than 80%. This is not so bad. But keep in mind that the tests are created based on existing code and only have the code as specification. Unless there's some discrepant behaviour within the code, it won't detect errors. And if it detects errors after a refactoring, that does not neccessarily mean that the new code is wrong. A very close in-depth manual inspection of all the tests is required.

Automatic - but not everything

The tool can't perform wonders. Quite often you'll encounter situations where you need to create fixtures, dummies, stubs or mocks yourself.

More Manual Work

The automated tests were quite often testing too tightly on the existing code. For instance, they tested whether the texts of an exception were as expected. This is something I'd willingly omit in unit tests. The purpose of unit testing is reducing the efforts and costs of software projects while increasing the quality. Testing volatile, non-critical features causes double effort in case of changes and thus contradicts with the original purpose of unit testing. Besides, when developers get used to the test is wrong they will too easily assume wrong tests even in the case when actually the code has a bug, not the test.

And how you'd use it

The usage way for AgitarOne that appeared in the workshop was the following. You have a piece of Java software in source code, let's call it Source(A). You generate tests, let's call them Tests(A), you run them and see everything is green. You change the software, let's call it Source(B). You run Tests(A). If they are green, you didn't break anything. If they are red, you have to look whether the tests are correct and the software is wrong or the other way round and adopt tests and/or software until the Tests(A) are green again. Now you generate the tests again on Source(B), calling them Tests(B).

This doesn't differ so much from the classical unit test cycle on first sight. But in the classical unit test cycle, the tests are more rational. The tests created by AgitarOne will create an unusual amount of false positives after changes.

Conclusion

The tool is interesting but not worth its money. Its slow performance, client/server architecture and high price make it nearly impractical for real-world use. Paired with the facts that the tool still requires developers to perform many manual steps and that it does not only reduce work on one hand, but causes additional work on the other hand, I doubt that it makes sense to invest money in such a tool. If I were in the position of making a decision on whether or not to buy such a tool, I'd wait until such a tool exists as open source software.

About

About Cher

Cher is a computer hacker that actively contributes to open source software and also runs some of his own projects as open source.

Cher's homepage is ↗http://www.riedquat.de/.

End Of Line
 . 
..: