Database Management & Seeding Strategy
The Problem
- The SQL database is blown away and recreated during every CI & QA deployment
-
QA must manually add back their test data in order to perform their tests
- There isn't a migration story for MongoDB
- Test creator needs to know too much about how the data is stored in order to be able to write integration / e2e tests
The SQL Server Problem
- During deployment, database is blown away
- A backup of the production database with a subset of the data is restored to environment
- Current release number is queried from database
The Current Flow
The SQL Server Problem
- The new release number retrieved from variable set in Bamboo
- All Master Scripts with release numbers after the database value and up to the Bamboo value are executed
The Current Flow
The SQL Server Problem
- Ensure that the database is in a consistent state for testing
- Ensure it's consistent with Production
The Rationale
The SQL Server Problem
- SQL Server is blown away and recreated during every CI & QA deployment
- This means QA must re-enter their test data after every deployment
- We are assuming that the subset of data from production will never change
The Problem
The SQL Server Problem
- We are dependent on an external binary file (the backup) in order to get the database up and running
- We are dependent on the Bamboo variable being set to the correct value - if it's not set correctly then we'll run the risk of executing updated code against an out of date schema
The Problem
The SQL Server Problem
- The database is no longer blown away and restored as part of the deployment process
- Instead, this action is now performed on-demand when QA needs to reset the database
The New Strategy - The Easy Part
The SQL Server Problem
- We no longer restore the database from production
- Instead, we will update the database schema via SQL scripts with a naming convention that indicates the order in which they should be executed
- A new table in the database will save the last executed script
The New Strategy - The Redesign
The SQL Server Problem
- No more MasterScript.sql
- File structure:
- 3.66
- 01-pp12345.sql
- 02-pp12346
- 3.67
- 01-pp12347.sql
- 02-pp12348.sql
- 3.66
The New Strategy - The File Structure
The SQL Server Problem
- Move version number from SystemConfiguration to its own table $version:
- Schema:
- CurrentVersion VARCHAR(16)
- CurrentVersion VARCHAR(16)
- This decouples the database state from business-related configuration
The New Strategy - The $version Table
The SQL Server Problem
The New Strategy - The New Flow
- Query $version table to get file name of last release executed against it
- Find the SQL scripts in folders whose names come after the last release number
- Execute each script in order, updating the $version table after each script to ensure we know the state of the database if there is an error during one of the scripts
The SQL Server Problem
- The SQL will contain the schema changes that need to be applied
- Scripts will be idempotent so they can be re-executed if needed
- It will also contain the data for enum tables (e.g. MasterStatus)
- It will not contain any seed data
The New Strategy - The SQL
The SQL Server Problem
- QA's test data is persisted between deployments
- It's easy to determine the version of the database in any environment
- Developers can now create the database on their machines and work locally
- We ensure the DB and code are in sync
The Result
The MongoDB Problem
- Changes to the MongoDB schema are made in Krypton
- MongoDB is cleared out and re-seeded
The Current Flow
The MongoDB Problem
- If MongoDB is not re-seeded after there are schema changes, exceptions may be thrown when accessing records with old schema - this depends on the type of schema change that was made
- This means we can't make incremental schema changes once we go to Production
The Problem
The MongoDB Problem
- It's all about defensive coding
- Change Krypton to allow extra properties when deserializing - the default behavior is to throw an exception
- Provide logical defaults for properties in the scenarios where the record uses an old schema that doesn't have the given property
The New Strategy
The MongoDB Problem
- Use attributes on properties if the MongoDB deserializer requires more information about how to deserialize a property (e.g. a property used to be an int but now it's a string)
- Records with old schemas will automatically be upgraded to the new schema the next time they are saved
The New Strategy
The MongoDB Problem
- No more deserialization exceptions
- This means it is no longer a requirement to re-seed MongoDB when the schema changes
The Result
The Seeding Problem
- Krypton contains classes that allow us to seed each type of MongoDB collection
- There is also a console app that consumes these classes to allow Neon to seed data from a JSON file
- Only data for certain clients is modified - the whole collection is not dropped
The Current Architecture
The Seeding Problem
- On app start, bootstrapper determines whether environment is supposed to have its MongoDB data seeded via environment config
- If seeding is supposed to happen, app looks for all seeding objects and seeds MongoDB based on what is defined in the seeding objects
The Current Flow - Krypton
The Seeding Problem
- Before Integration or E2E tests are executed, the seeding console app is executed with a link to the JSON files that contain the seed data
The Current Flow - Neon
The Seeding Problem
- Krypton and Neon have different ways of providing the seed data to the seeder
- Test writers need to know too much about how the data is stored in order to be able to write integration & e2e tests
- This can lead to test failures that are simply due to bad data
The Problem
The Seeding Problem
- There is currently no way to automatically seed SQL - we have to seed MongoDB with client IDs that we know are set up in SQL
The Problem
The Seeding Problem
- Requirements-based seeding:
- Abstract away the seeding from this data goes in that table to I need an entity that has this set-up
- Abstract away the seeding from this data goes in that table to I need an entity that has this set-up
- The seeding engine will transform the given requirements into MongoDB / SQL upserts
The New Strategy
The Seeding Problem
The New Strategy
/* Example seeding call */
seeder
.seed
.client({
vanityUrl: 'foocompany'
})
.withOfferings({
rowCount: 9,
uniqueCategoryCount: 3
})
The Seeding Problem
- The preceding example says I need a client with vanity URL 'foocompany' that has 9 offerings that are grouped into 3 unique categories
- It doesn't care about all the other plumbing that needs to take place to make that happen
The New Strategy
The Seeding Problem
- In order to ensure the tests know what the entities' properties will be consistent values will be provided. For example, for the 9 offerings in the previous example:
- Offering 1's name is offering 1
- ...
- Offering n's name is offering n
The New Strategy
The Seeding Problem
- When the database needs to be reset, a seed batch file can be executed that ensure the required data is in the database for QA testing
The New Strategy
The Seeding Problem
- Seeding becomes a lot easier to maintain
- When the schema changes, only the seeding engine needs to change instead of all of the seed data
- Requirements-based seeding aligns with Neon AJAX mocking strategy
The Result
PaySimple - Database Management & Seeding Strategy
By Ryan Haugh
PaySimple - Database Management & Seeding Strategy
- 521