The Test Data Nightmare

A talk without solutions

How test data often starts...

def test_logged_in_user_has_first_name():
  # Arrange
  test_name = "Bob"
  user = User.objects.create(
  # Act
  result = user.first_name
  # Assert
  assert result == test_name

...but then the model changes

  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
    user = User.objects.create(
+     new_required_field="derp",
    # Act
    result = user.first_name
    # Assert
    assert result == test_name

The fixture pattern:


Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests

      "model": "",
      "pk": 1,
      "fields": {
        "first_name": "Bob",
+       "new_required_field": "foo",
        "email": ""

The fixture pattern:


Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests




When the data model changes, you only have to change it in one place



  • You still have to maintain the fixtures
  • Changing one field could fix one test but break another

The dump a database pattern:


Periodically dump data from a deployed environment to keep the test fixtures up to date



  • Migrations handle the modifications to the data model, so we don't have to change the fixtures by hand
  • The data is realistic!


  • Real data in version control?! What about personal information?
  • You still have to maintain the fixtures
  • Data can get stale
  • Empty new fields

The factory pattern:


Use a tool like FactoryBoy to generate models dynamically using "realish" data

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = ''

    first_name = 'john'
    email = factory.Faker("email", domain="")
  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
-   user = User.objects.create(
+   user = UserFactory(
-     new_required_field="derp"
-     username="foo"

The factory pattern (continued):


  • No changes needed to support new_required_field
  • Tests aren't co-dependent on one set of data



  • Tests can fail randomly depending on the random data generated
  • The data may not be realistic

I wish there was a talk covering strategies for defeating many types of test data nightmares:


  • Recognizing the difference between product data and test case data
  • Deciding when to prepare data statically beforehand or dynamically during testing
  • Using data to control how tests run or reflect product state
  • Hard-coding values versus discovering data in the system
  • Avoiding collisions on shared data

PyCon Talk:

Managing the Test Data Nightmare


Presented by Pandy Knight
Sunday 3:45 p.m.–4:15 p.m. EST

Managing the Test Data Nightmare

By m3brown

Managing the Test Data Nightmare

  • 446