The Test Data Nightmare

A talk without solutions

How test data often starts...


def test_logged_in_user_has_first_name():
  # Arrange
  test_name = "Bob"
  user = User.objects.create(
    first_name=test_name,
    email="foo@excella.com"
  )
  
  # Act
  result = user.first_name
  
  # Assert
  assert result == test_name

...but then the model changes


  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
    user = User.objects.create(
      first_name=test_name,
+     new_required_field="derp",
      email="foo@excella.com"
    )
  
    # Act
    result = user.first_name
  
    # Assert
    assert result == test_name

The fixture pattern:

Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests

  [
    {
      "model": "my.app.user",
      "pk": 1,
      "fields": {
        "first_name": "Bob",
+       "new_required_field": "foo",
        "email": "foo@excella.com"
      }
    }
  ]

The fixture pattern:

Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests

Pros:

When the data model changes, you only have to change it in one place

Cons:

You still have to maintain the fixtures
Changing one field could fix one test but break another

The dump a database pattern:

Periodically dump data from a deployed environment to keep the test fixtures up to date

Pros:

Migrations handle the modifications to the data model, so we don't have to change the fixtures by hand
The data is realistic!

Cons

Real data in version control?! What about personal information?
You still have to maintain the fixtures
Data can get stale
Empty new fields

The factory pattern:

Use a tool like FactoryBoy to generate models dynamically using "realish" data

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = 'my.app.User'

    first_name = 'john'
    email = factory.Faker("email", domain="excella.com")

  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
-   user = User.objects.create(
+   user = UserFactory(
      first_name=test_name,
-     new_required_field="derp"
-     username="foo"
    )

The factory pattern (continued):

Pros:

No changes needed to support new_required_field
Tests aren't co-dependent on one set of data

Cons:

Tests can fail randomly depending on the random data generated
The data may not be realistic

I wish there was a talk covering strategies for defeating many types of test data nightmares:

Recognizing the difference between product data and test case data
Deciding when to prepare data statically beforehand or dynamically during testing
Using data to control how tests run or reflect product state
Hard-coding values versus discovering data in the system
Avoiding collisions on shared data

The Test Data Nightmare

How test data often starts...

...but then the model changes

I wish there was a talk covering strategies for defeating many types of test data nightmares:

PyCon Talk:

Managing the Test Data Nightmare

Managing the Test Data Nightmare

Managing the Test Data Nightmare

m3brown

The Test Data Nightmare

How test data often starts...

...but then the model changes

I wish there was a talk covering strategies for defeating many types of test data nightmares:

PyCon Talk:

Managing the Test Data Nightmare

Managing the Test Data Nightmare

More from m3brown