The Test Data Nightmare
A talk without solutions
How test data often starts...
def test_logged_in_user_has_first_name():
# Arrange
test_name = "Bob"
user = User.objects.create(
first_name=test_name,
email="foo@excella.com"
)
# Act
result = user.first_name
# Assert
assert result == test_name
...but then the model changes
def test_logged_in_user_has_first_name():
# Arrange
test_name = "Bob"
user = User.objects.create(
first_name=test_name,
+ new_required_field="derp",
email="foo@excella.com"
)
# Act
result = user.first_name
# Assert
assert result == test_name
The fixture pattern:
Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests
[
{
"model": "my.app.user",
"pk": 1,
"fields": {
"first_name": "Bob",
+ "new_required_field": "foo",
"email": "foo@excella.com"
}
}
]
The fixture pattern:
Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests
Pros:
When the data model changes, you only have to change it in one place
Cons:
- You still have to maintain the fixtures
- Changing one field could fix one test but break another
The dump a database pattern:
Periodically dump data from a deployed environment to keep the test fixtures up to date
Pros:
- Migrations handle the modifications to the data model, so we don't have to change the fixtures by hand
- The data is realistic!
Cons
- Real data in version control?! What about personal information?
- You still have to maintain the fixtures
- Data can get stale
- Empty new fields
The factory pattern:
Use a tool like FactoryBoy to generate models dynamically using "realish" data
class UserFactory(factory.django.DjangoModelFactory):
class Meta:
model = 'my.app.User'
first_name = 'john'
email = factory.Faker("email", domain="excella.com")
def test_logged_in_user_has_first_name():
# Arrange
test_name = "Bob"
- user = User.objects.create(
+ user = UserFactory(
first_name=test_name,
- new_required_field="derp"
- username="foo"
)
The factory pattern (continued):
Pros:
- No changes needed to support new_required_field
- Tests aren't co-dependent on one set of data
Cons:
- Tests can fail randomly depending on the random data generated
- The data may not be realistic
I wish there was a talk covering strategies for defeating many types of test data nightmares:
- Recognizing the difference between product data and test case data
- Deciding when to prepare data statically beforehand or dynamically during testing
- Using data to control how tests run or reflect product state
- Hard-coding values versus discovering data in the system
- Avoiding collisions on shared data
PyCon Talk:
Managing the Test Data Nightmare
Presented by Pandy Knight
Sunday 3:45 p.m.–4:15 p.m. EST
Managing the Test Data Nightmare
By m3brown
Managing the Test Data Nightmare
- 431