DB Fixtures vs. Factories : generating data to your tests.
1October 31, 2010 at 10:54 pm Filed in:Test Fixtures
The Beggining
When we started with TDD and tests on PHP, two years ago, we´ve learned about data fixtures and, of course, started to use them as a part of our tests. The time has passed and today I see that fixtures, as we used to create them, are not the best way to generate test data. In fact they can cause serious performance issues on the test suite. In this article I´ll explain how we used to do it, why I think they are not good and also propose a better way to create data fixtures.
Definition of Fixtures
Let´s take a look on two quotes from the article about test fixtures on Wikipedia to understand what they are first:
A test fixture is something used to consistently test some item, device, or piece of software.
Test fixture refers to the fixed state used as a baseline for running tests in software testing. The purpose of a test fixture is to ensure that there is a well known and fixed environment in which tests are run so that results are repeatable. Some people call this the test context.
When we started to use fixtures, we had a feature on the ORM tool we use (Doctrine) that allowed us to easily insert data into the database. Doctrine allows you to defina an yml with data and has a simple way to load this data into the DB.
And that´s the kind of fixtures we´ve being using for sometime and we are now perceiving the consequences. So, to be more specific, this post will examine the usage of database data fixtures.
A good way to use DB fixtures
We normally started our test by setting a “fixed condition” and we do that loading the fixtures data into the database. One well known good practice to isolate tests is to:
- clean the database (setup / beforeEach)
- open a transaction (setup / beforeEach)
- run the test
- rollback the transaction (teardown / afterEach)
That will make sure that our database has allways the same data on every run of the test. That is a very good thing and makes our test more robust. This way we do not even alter the database because we do not commit. Since it´s a good thing, of course we started to do it always we needed to. And we actualy started doing it a lot of times.
Improving database performance (little improvement)
Well, time has passed and then we had a lot of tests that access the database to create some fixtures and run the tests. Of course this were making our tests pretty slow and we decided to find a way to improve it. We found that we could use a lightweight memory database to run the tests, and we started usind Doctrine sqlite driver. It gave us a significant improvement in performance. But at this point most of the tests were still relying on the DB to run.
Not enough
Today we are perceiving that this solution still not satisfactory. All those connections make the tests very slow anyway. And the facility to generate fixtures in that way generated a bad side effect: We started to use the database fixtures in situations where the data did not realy need to be on the database to run the test. Since we had that easy way to define the data, we used it all over the tests without realy understanding the implications on the whole test suite.
The fact is that if you need to test an operation into some data and you can run your test with that data in memory only, you should not put that data on the database. It might seems obvious now, but since we had a lot of facility with our fixture generation tool, we were using that way without perceiving the damage it was causing to our test suite. A good test suite have to be very fast. In fact a “true” unit test should not access the database.
There are two other points that I also take into consideration here. One of them is that when I put some data into the database to generate a condition to test something that is on my domain, I´m coupling my domain with the database. Of course a good ORM will make this a loose coupling, but it still a coupling. When I set up my data in memory and run my test on my domain layer, without the need to go down to the persistence layer, it´s sure a less coupled or no coupled at all alternative. And a very faster one, of course. So one point is to have a cohesive domain layer that stands for it´s own without the need of the persistence.
The other point to consider is that when you write fixtures straight to the database, they migh have to pass a lower level validation, but that is all. Depending on the tool you use, if you have some validation on the model or orm layers, they might run. But when you think about generating data on a higher layer to run the tests, it also means that you will be able to have this data validated by a higher layer of validation and rules, if it exists, and this validation also improves the robustness of your tests and system.
Perceiving this, we had to move towards a more appropriate solution.
Improving performance with better design using factories
So we need to generate fixture data in memory, using the domain layer methods to do so. The way we´ve foud to do so is to create factory objects to generate this data. Simple as that. Of course if we need a complex object graph to run our tests, we will also have to code complex factories. We can´t think that simple factories that generate only one object without any relations will be enough. A good start is to hava a central factory to each of your model “aggregates”.
The creation of such factories might seems too expensive (time to code) but in fact they are not. If you are having a hard time on creating factories to generate a complex data graph, there is a good chance this is a reflection of a poor domain model. The creation of this factories is also an exercise to see if your domain model has a good design and to test your contructors, builders, etc…
This approach does not block you from persisting data into the database if it is realy needed. In fact there are times you won´t have another way besides this one. But, one thing this approach actualy does and I like it a lot, is to enforce us to design operations and queries that do not rely in the database or persistence layer. We are encouraged to think in a model that lives in memory so that we can isolate our queries and operations from the DB.
You will probably have a lot more factories to your tests than you have to the normal usage of the system. So you can think about separating factories only for the tests and factories that will also be used on your system. Something like having a UserFactory and a UserFactoryTests for example.
There are a lot of practical coding things that could be issued here, but most of them are to specific to be touched in this post. If you have some questions about any practical issues, I´ll appreciate if you ask on the comments. This post is a share of some real experience and my intention with it is to also hearfrom other people experiences and learn more.

