Database Management & Seeding Strategy

The Problem

The SQL database is blown away and recreated during every CI & QA deployment
QA must manually add back their test data in order to perform their tests
There isn't a migration story for MongoDB
Test creator needs to know too much about how the data is stored in order to be able to write integration / e2e tests

The SQL Server Problem

During deployment, database is blown away
A backup of the production database with a subset of the data is restored to environment
Current release number is queried from database

The Current Flow

The SQL Server Problem

The new release number retrieved from variable set in Bamboo
All Master Scripts with release numbers after the database value and up to the Bamboo value are executed

The Current Flow

The SQL Server Problem

Ensure that the database is in a consistent state for testing
Ensure it's consistent with Production

The Rationale

The SQL Server Problem

SQL Server is blown away and recreated during every CI & QA deployment
This means QA must re-enter their test data after every deployment
We are assuming that the subset of data from production will never change

The Problem

The SQL Server Problem

We are dependent on an external binary file (the backup) in order to get the database up and running
We are dependent on the Bamboo variable being set to the correct value - if it's not set correctly then we'll run the risk of executing updated code against an out of date schema

The Problem

The SQL Server Problem

The database is no longer blown away and restored as part of the deployment process
Instead, this action is now performed on-demand when QA needs to reset the database

The New Strategy - The Easy Part

The SQL Server Problem

We no longer restore the database from production
Instead, we will update the database schema via SQL scripts with a naming convention that indicates the order in which they should be executed
A new table in the database will save the last executed script

The New Strategy - The Redesign

The SQL Server Problem

No more MasterScript.sql
File structure:
- 3.66
  - 01-pp12345.sql
  - 02-pp12346
- 3.67
  - 01-pp12347.sql
  - 02-pp12348.sql

The New Strategy - The File Structure

The SQL Server Problem

Move version number from SystemConfiguration to its own table $version:
Schema:
- CurrentVersion VARCHAR(16)
This decouples the database state from business-related configuration

The New Strategy - The $version Table

The SQL Server Problem

The New Strategy - The New Flow

Query $version table to get file name of last release executed against it
Find the SQL scripts in folders whose names come after the last release number
Execute each script in order, updating the $version table after each script to ensure we know the state of the database if there is an error during one of the scripts

The SQL Server Problem

The SQL will contain the schema changes that need to be applied
Scripts will be idempotent so they can be re-executed if needed
It will also contain the data for enum tables (e.g. MasterStatus)
It will not contain any seed data

The New Strategy - The SQL

The SQL Server Problem

QA's test data is persisted between deployments
It's easy to determine the version of the database in any environment
Developers can now create the database on their machines and work locally
We ensure the DB and code are in sync

The Result

The MongoDB Problem

Changes to the MongoDB schema are made in Krypton
MongoDB is cleared out and re-seeded

The Current Flow

The MongoDB Problem

If MongoDB is not re-seeded after there are schema changes, exceptions may be thrown when accessing records with old schema - this depends on the type of schema change that was made
This means we can't make incremental schema changes once we go to Production

The Problem

The MongoDB Problem

It's all about defensive coding
Change Krypton to allow extra properties when deserializing - the default behavior is to throw an exception
Provide logical defaults for properties in the scenarios where the record uses an old schema that doesn't have the given property

The New Strategy

The MongoDB Problem

Use attributes on properties if the MongoDB deserializer requires more information about how to deserialize a property (e.g. a property used to be an int but now it's a string)
Records with old schemas will automatically be upgraded to the new schema the next time they are saved

The New Strategy

The MongoDB Problem

No more deserialization exceptions
This means it is no longer a requirement to re-seed MongoDB when the schema changes

The Result

The Seeding Problem

Krypton contains classes that allow us to seed each type of MongoDB collection
There is also a console app that consumes these classes to allow Neon to seed data from a JSON file
Only data for certain clients is modified - the whole collection is not dropped

The Current Architecture

The Seeding Problem

On app start, bootstrapper determines whether environment is supposed to have its MongoDB data seeded via environment config
If seeding is supposed to happen, app looks for all seeding objects and seeds MongoDB based on what is defined in the seeding objects

The Current Flow - Krypton

The Seeding Problem

Before Integration or E2E tests are executed, the seeding console app is executed with a link to the JSON files that contain the seed data

The Current Flow - Neon

The Seeding Problem

Krypton and Neon have different ways of providing the seed data to the seeder
Test writers need to know too much about how the data is stored in order to be able to write integration & e2e tests
This can lead to test failures that are simply due to bad data

The Problem

The Seeding Problem

There is currently no way to automatically seed SQL - we have to seed MongoDB with client IDs that we know are set up in SQL

The Problem

The Seeding Problem

Requirements-based seeding:
- Abstract away the seeding from this data goes in that table to I need an entity that has this set-up
The seeding engine will transform the given requirements into MongoDB / SQL upserts

The New Strategy

The Seeding Problem

The New Strategy

/* Example seeding call */
seeder
  .seed
  .client({
    vanityUrl: 'foocompany'
  })
  .withOfferings({
    rowCount: 9,
    uniqueCategoryCount: 3
  })

The Seeding Problem

The preceding example says I need a client with vanity URL 'foocompany' that has 9 offerings that are grouped into 3 unique categories
It doesn't care about all the other plumbing that needs to take place to make that happen

The New Strategy

The Seeding Problem

In order to ensure the tests know what the entities' properties will be consistent values will be provided. For example, for the 9 offerings in the previous example:
- Offering 1's name is offering 1
- ...
- Offering n's name is offering n

The New Strategy

The Seeding Problem

When the database needs to be reset, a seed batch file can be executed that ensure the required data is in the database for QA testing

The New Strategy

The Seeding Problem

Seeding becomes a lot easier to maintain
When the schema changes, only the seeding engine needs to change instead of all of the seed data
Requirements-based seeding aligns with Neon AJAX mocking strategy

The Result

PaySimple - Database Management & Seeding Strategy

By Ryan Haugh

PaySimple - Database Management & Seeding Strategy

11 years ago
538

Database Management & Seeding Strategy

The Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The SQL Server Problem

The MongoDB Problem

The MongoDB Problem

The MongoDB Problem

The MongoDB Problem

The MongoDB Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

The Seeding Problem

PaySimple - Database Management & Seeding Strategy

More from Ryan Haugh