↗Tweet![]() |
At the end of last week, I've attended a half day workshop / demonstration of AgitarOne, a tool that automatically generates unit tests for existing code. Though this was very very interesting, the workshop left me with mixed feelings.
JUnit is the industry standard test framework for Java. Though it's got unit in its name, it's not only suited for plain normal unit testing. It's also perfectly well suited for many other fields of testing like acceptance testing / system testing, integration testing, performance testing and more.
Automatic test generation generates JUnit test cases for existing Java code. Can this work? Oh yes it can. Agitar demonstrated a tool that does not only generate stubs for JUnit, but complete JUnit tests.
Wouldn't this be nice if you'd have a tool that automatically creates the tests for you? It would. But this dream isn't brought to reality by Agitar. The tool naturally does not know the programmer's intentions. All it can do is analyze existing code and generate tests for two purposes. The tests can be used for regression testing, to find out whether changes or new features do not change the code's behaviour. And the extracted data can be reviewed to identify inconsistencies detected by the test generation tool.
All this sounds good but in my opinion is not suited for the real world. The example demonstrated by Agitar was a very simple one, and honestly, it was primitive Java-starter-class code. Luckily we were allowed to bring some code of our own. I brought some code of my open source projects and let AgitarOne generate test cases for it. I ran three projects through AgitarOne, and AgitarOne was not able to create error-free (compilable!) test suites for any of them. That's not so much of an issue as I'm confident the company will fix these errors quickly, and they were only minor issues.
The speed of code generation was extremely poor. The architecture of the application is client/server, with clients being the developer's PCs for submitting tasks for automatic unit test generation to the servers and the servers performing these tasks. Agitar used two fast laptops with dual-core CPUs as servers for the workshop, running 6 jobs alltogether. In three hours they were creating a rough estimate of 180 test classes. This is 1 class per minute on 4 CPUs. Hell, I don't know anybody who would want to wait more than just a few seconds for something like that. Poor performance like that simply doesn't fit into the world of emerging agile processes.
The number of single test cases of course was big. Quite often, the test cases reached a statement coverage of more than 80%. This is not so bad. But keep in mind that the tests are created based on existing code and only have the code as specification
. Unless there's some discrepant behaviour within the code, it won't detect errors. And if it detects errors after a refactoring, that does not neccessarily mean that the new code is wrong. A very close in-depth manual inspection of all the tests is required.
The tool can't perform wonders. Quite often you'll encounter situations where you need to create fixtures, dummies, stubs or mocks yourself.
The automated tests were quite often testing too tightly on the existing code. For instance, they tested whether the texts of an exception were as expected. This is something I'd willingly omit in unit tests. The purpose of unit testing is reducing the efforts and costs of software projects while increasing the quality. Testing volatile, non-critical features causes double effort in case of changes and thus contradicts with the original purpose of unit testing. Besides, when developers get used to the test is wrong
they will too easily assume wrong tests even in the case when actually the code has a bug, not the test.
The usage way for AgitarOne that appeared in the workshop was the following. You have a piece of Java software in source code, let's call it Source(A). You generate tests, let's call them Tests(A), you run them and see everything is green. You change the software, let's call it Source(B). You run Tests(A). If they are green, you didn't break anything. If they are red, you have to look whether the tests are correct and the software is wrong or the other way round and adopt tests and/or software until the Tests(A) are green again. Now you generate the tests again on Source(B), calling them Tests(B).
This doesn't differ so much from the classical unit test cycle on first sight. But in the classical unit test cycle, the tests are more rational. The tests created by AgitarOne will create an unusual amount of false positives after changes.
The tool is interesting but not worth its money. Its slow performance, client/server architecture and high price make it nearly impractical for real-world use. Paired with the facts that the tool still requires developers to perform many manual steps and that it does not only reduce work on one hand, but causes additional work on the other hand, I doubt that it makes sense to invest money in such a tool. If I were in the position of making a decision on whether or not to buy such a tool, I'd wait until such a tool exists as open source software.
Cher is a computer hacker that actively contributes to open source software and also runs some of his own projects as open source.
Cher's homepage is ↗http://www.riedquat.de/.
End Of Line