Database Management & Seeding Strategy

The Problem

  • The SQL database is blown away and recreated during every CI & QA deployment
     
  • QA must manually add back their test data in order to perform their tests
     
  • There isn't a migration story for MongoDB
     
  • Test creator needs to know too much about how the data is stored in order to be able to write integration / e2e tests

The SQL Server Problem

  • During deployment, database is blown away
     
  • A backup of the production database with a subset of the data is restored to environment
     
  • Current release number is queried from database

The Current Flow

The SQL Server Problem

  • The new release number retrieved from variable set in Bamboo
     
  • All Master Scripts with release numbers after the database value and up to the Bamboo value are executed

The Current Flow

The SQL Server Problem

  • Ensure that the database is in a consistent state for testing
     
  • Ensure it's consistent with Production

The Rationale

The SQL Server Problem

  • SQL Server is blown away and recreated during every CI & QA deployment
     
  • This means QA must re-enter their test data after every deployment
     
  • We are assuming that the subset of data from production will never change

The Problem

The SQL Server Problem

  • We are dependent on an external binary file (the backup) in order to get the database up and running
     
  • We are dependent on the Bamboo variable being set to the correct value - if it's not set correctly then we'll run the risk of executing updated code against an out of date schema

The Problem

The SQL Server Problem

  • The database is no longer blown away and restored as part of the deployment process
     
  • Instead, this action is now performed on-demand when QA needs to reset the database

The New Strategy - The Easy Part

The SQL Server Problem

  • We no longer restore the database from production
     
  • Instead, we will update the database schema via SQL scripts with a naming convention that indicates the order in which they should be executed
     
  • A new table in the database will save the last executed script

The New Strategy - The Redesign

The SQL Server Problem

  • No more MasterScript.sql
     
  • File structure:
    • 3.66
      • 01-pp12345.sql
      • 02-pp12346
    • 3.67
      • 01-pp12347.sql
      • 02-pp12348.sql

The New Strategy - The File Structure

The SQL Server Problem

  • Move version number from SystemConfiguration to its own table $version:
     
  • Schema:
    • CurrentVersion VARCHAR(16)
       
  • This decouples the database state from business-related configuration

The New Strategy - The $version Table

The SQL Server Problem

The New Strategy - The New Flow

  • Query $version table to get file name of last release executed against it
     
  • Find the SQL scripts in folders whose names come after the last release number
     
  • Execute each script in order, updating the $version table after each script to ensure we know the state of the database if there is an error during one of the scripts

The SQL Server Problem

  • The SQL will contain the schema changes that need to be applied
     
  • Scripts will be idempotent so they can be re-executed if needed
     
  • It will also contain the data for enum tables (e.g. MasterStatus)
     
  • It will not contain any seed data

The New Strategy - The SQL

The SQL Server Problem

  • QA's test data is persisted between deployments
     
  • It's easy to determine the version of the database in any environment
     
  • Developers can now create the database on their machines and work locally
     
  • We ensure the DB and code are in sync

The Result

The MongoDB Problem

  • Changes to the MongoDB schema are made in Krypton
     
  • MongoDB is cleared out and re-seeded

The Current Flow

The MongoDB Problem

  • If MongoDB is not re-seeded after there are schema changes, exceptions may be thrown when accessing records with old schema - this depends on the type of schema change that was made
     
  • This means we can't make incremental schema changes once we go to Production

The Problem

The MongoDB Problem

  • It's all about defensive coding
     
  • Change Krypton to allow extra properties when deserializing - the default behavior is to throw an exception
     
  • Provide logical defaults for properties in the scenarios where the record uses an old schema that doesn't have the given property

The New Strategy

The MongoDB Problem

  • Use attributes on properties if the MongoDB deserializer requires more information about how to deserialize a property (e.g. a property used to be an int but now it's a string)
     
  • Records with old schemas will automatically be upgraded to the new schema the next time they are saved

The New Strategy

The MongoDB Problem

  • No more deserialization exceptions
     
  • This means it is no longer a requirement to re-seed MongoDB when the schema changes

The Result

The Seeding Problem

  • Krypton contains classes that allow us to seed each type of MongoDB collection
     
  • There is also a console app that consumes these classes to allow Neon to seed data from a JSON file
     
  • Only data for certain clients is modified - the whole collection is not dropped

The Current Architecture

The Seeding Problem

  • On app start, bootstrapper determines whether environment is supposed to have its MongoDB data seeded via environment config 
     
  • If seeding is supposed to happen, app looks for all seeding objects and seeds MongoDB based on what is defined in the seeding objects

The Current Flow - Krypton

The Seeding Problem

  • Before Integration or E2E tests are executed, the seeding console app is executed with a link to the JSON files that contain the seed data

The Current Flow - Neon

The Seeding Problem

  • Krypton and Neon have different ways of providing the seed data to the seeder
     
  • Test writers need to know too much about how the data is stored in order to be able to write integration & e2e tests
     
  • This can lead to test failures that are simply due to bad data

The Problem

The Seeding Problem

  • There is currently no way to automatically seed SQL - we have to seed MongoDB with client IDs that we know are set up in SQL

The Problem

The Seeding Problem

  • Requirements-based seeding:
    • Abstract away the seeding from this data goes in that table to I need an entity that has this set-up
       
  • The seeding engine will transform the given requirements into MongoDB / SQL upserts

The New Strategy

The Seeding Problem

The New Strategy

/* Example seeding call */
seeder
  .seed
  .client({
    vanityUrl: 'foocompany'
  })
  .withOfferings({
    rowCount: 9,
    uniqueCategoryCount: 3
  })

The Seeding Problem

  • The preceding example says I need a client with vanity URL 'foocompany' that has 9 offerings that are grouped into 3 unique categories
     
  • It doesn't care about all the other plumbing that needs to take place to make that happen

The New Strategy

The Seeding Problem

  • In order to ensure the tests know what the entities' properties will be consistent values will be provided.  For example, for the 9 offerings in the previous example:
    • Offering 1's name is offering 1
    • ...
    • Offering n's name is offering n

The New Strategy

The Seeding Problem

  • When the database needs to be reset, a seed batch file can be executed that ensure the required data is in the database for QA testing

The New Strategy

The Seeding Problem

  • Seeding becomes a lot easier to maintain
     
  • When the schema changes, only the seeding engine needs to change instead of all of the seed data
     
  • Requirements-based seeding aligns with Neon AJAX mocking strategy

The Result

PaySimple - Database Management & Seeding Strategy

By Ryan Haugh

PaySimple - Database Management & Seeding Strategy

  • 521