The Test Data Nightmare

A talk without solutions

How test data often starts...


def test_logged_in_user_has_first_name():
  # Arrange
  test_name = "Bob"
  user = User.objects.create(
    first_name=test_name,
    email="foo@excella.com"
  )
  
  # Act
  result = user.first_name
  
  # Assert
  assert result == test_name

...but then the model changes


  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
    user = User.objects.create(
      first_name=test_name,
+     new_required_field="derp",
      email="foo@excella.com"
    )
  
    # Act
    result = user.first_name
  
    # Assert
    assert result == test_name

The fixture pattern:

 

Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests

  [
    {
      "model": "my.app.user",
      "pk": 1,
      "fields": {
        "first_name": "Bob",
+       "new_required_field": "foo",
        "email": "foo@excella.com"
      }
    }
  ]

The fixture pattern:

 

Define a bunch of sample data in one place (e.g. JSON, raw SQL, etc.) that is intended to be reused by tests

 

 

Pros:

When the data model changes, you only have to change it in one place

 

Cons:

  • You still have to maintain the fixtures
  • Changing one field could fix one test but break another

The dump a database pattern:

 

Periodically dump data from a deployed environment to keep the test fixtures up to date

 

Pros:

  • Migrations handle the modifications to the data model, so we don't have to change the fixtures by hand
  • The data is realistic!

Cons

  • Real data in version control?! What about personal information?
  • You still have to maintain the fixtures
  • Data can get stale
  • Empty new fields

The factory pattern:

 

Use a tool like FactoryBoy to generate models dynamically using "realish" data

class UserFactory(factory.django.DjangoModelFactory):
    class Meta:
        model = 'my.app.User'

    first_name = 'john'
    email = factory.Faker("email", domain="excella.com")
  def test_logged_in_user_has_first_name():
    # Arrange
    test_name = "Bob"
-   user = User.objects.create(
+   user = UserFactory(
      first_name=test_name,
-     new_required_field="derp"
-     username="foo"
    )

The factory pattern (continued):

Pros:

  • No changes needed to support new_required_field
  • Tests aren't co-dependent on one set of data

 

Cons:

  • Tests can fail randomly depending on the random data generated
  • The data may not be realistic

I wish there was a talk covering strategies for defeating many types of test data nightmares:

 

  • Recognizing the difference between product data and test case data
  • Deciding when to prepare data statically beforehand or dynamically during testing
  • Using data to control how tests run or reflect product state
  • Hard-coding values versus discovering data in the system
  • Avoiding collisions on shared data

PyCon Talk:

Managing the Test Data Nightmare

 

Presented by Pandy Knight
Sunday 3:45 p.m.–4:15 p.m. EST

Managing the Test Data Nightmare

By m3brown

Managing the Test Data Nightmare

  • 366