Feature Flags And End-to-End Testing

Gleb Bahmutov

Climate Crisis Is Bad

Adjust your life

Join an organization

Vote

Greenpeace  350  Sunrise  Citizen Climate Lobby  3rd Act  Mothers Out Front

Speaker: Gleb Bahmutov PhD

C / C++ / C# / Java / CoffeeScript / JavaScript / Node / Angular / Vue / Cycle.js / functional programming / testing

🌎 🔥 350.org 🌎 🔥 citizensclimatelobby.org 🌎 🔥

December 2025 Advent Calendar:

Cy vs Pw code examples

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

AGENDA

  • Feature flags

  • Testing challenges

  • Staying sane and safe

  • Words of wisdom

  • Lunch 🥙🥪

I am working on an awesome cutting edge TodoMVC app!

I will add robust social media features aka liking other users' tasks and posting comments

Does anyone need it?

Can we show some users a preview?

Can we hide it if we find problems?

Feature Flags 🏁 🚩 🏴‍☠️

Control parts of your app using

a real-time 3rd-party service

without rebuild / redeploy

import flagsmith from 'flagsmith';

flagsmith.init({
 environmentID: 'QjgYur4LQTwe5HpvbvhpzK',
});

const App = () => (
 <Layout
   darkMode={flagsmith.hasFeature("dark_mode")}
   designV2={flagsmith.hasFeature("design_v2")}>
   {flagsmith.hasFeature("chat") && <ChatWidget>}
 </Layout>
)
import flagsmith from 'flagsmith';

flagsmith.init({
 environmentID: 'QjgYur4LQTwe5HpvbvhpzK',
});

const App = () => (
 <Layout
   darkMode={flagsmith.hasFeature("dark_mode")}
   designV2={flagsmith.hasFeature("design_v2")}>
   {flagsmith.hasFeature("chat") && <ChatWidget>}
 </Layout>
)

on/off feature toggle

feature toggles demo from https://www.flagsmith.com/

feature toggle (on/off)

feature value

hasFeature('greeting')
getValue('greeting', {
  fallback: 'Aloha'
})

feature variations

multiple feature values, including "off" an "on"

Feature Flags

  • Temp experiments
  • Incremental development
  • Product feedback
  • Safe releases
  • Kill switches
  • A/B experiments

Feature Flags

  • web apps
  • native mobile apps
  • desktop apps
  • backend

Feature Flag Tips

Use consistent flag names

<type>-<area>-<name>
enable-ui-chat

hide-social-logins

Feature Flag Tips

Clearly mark temp flags

experiment-social-features

rollout-ui-chat

Use tags instead of suffixes

Feature Flag Tips

Include issue / JIRA url

Feature Flag Tips

plus names and links of any related feature flags

Migrate flags between environments

Feature Flag Tips

DEV
show-ui-chat: enabled
PROD
show-ui-chat: disabled

Feature Flags

Change how the app behaves

creating nightmare for

end-to-end testing

❤️

👎

Why is test A failing every couple of days?!?!

feature A (on/off)

feature B (on/off)

feature C (on/off)

2 states

2 states

2 states

x

x

8 states

=

Did we test the Chat widget in Dark mode?!!"

2

n

Exponential number of states to test / reason about

A typical Mercari US Cypress E2E test

Feature flags

🚩🏴‍☠️🏁

tests

✅ ✅

Testing Approach 1

"Total Control"

Testing Approach 1

function render() {
  const shouldShow = flagsmith.hasFeature('feature_a')
  const label = (shouldShow ? 'Showing' : 'Not showing') + ' feature A'
  div.replaceChild(document.createTextNode(label), div.firstChild)
}

// https://docs.flagsmith.com/clients/javascript/
flagsmith.init({
  // comes from the Flagsmith project settings
  environmentID: 'gxzgHaQ84gijocUvctHJFb',
  onChange: render,
})

feature A is off

feature A is on

feature flags are just JSON objects returned by the API

it('shows the loading message', () => {
  // slow down the network call by 1 second
  cy.intercept('/api/v1/flags/', (req) =>
    Cypress.Promise.delay(1000)
  ).as('flags')
  cy.visit('/')
  cy.contains('#feature-area', 'Initializing...')
    .should('be.visible')
  // wait for the feature flags Ajax call
  cy.wait('@flags')
  cy.contains('Initializing...').should('not.exist')
})

Test the loading message

Copy feature flags response using Network Tab in DevTools

Save as JSON fixture file

[
  {
    "id": 56756,
    "feature": {
      "id": 10804,
      "name": "feature_a",
      "created_date": "2021-07-15T23:38:27.661659Z",
      "description": "The first feature",
      "initial_value": null,
      "default_enabled": false,
      "type": "STANDARD"
    },
    "feature_state_value": null,
    "enabled": false,
    "environment": 9128,
    "identity": null,
    "feature_segment": null
  }
]
it('does not show feature A', () => {
  cy.intercept('/api/v1/flags/', { fixture: 'no-feature-a.json' })
    .as('flags')
  cy.visit('/')
  cy.wait('@flags')
  cy.contains('#feature-area', 'Not showing feature A')
})

Stub the feature flags

feature A is off

feature A is off

Testing Approach 2

"Selective Control"

it('does not show feature A (modify response)', () => {
  cy.intercept('/api/v1/flags/', (req) => {
    req.continue((res) => {
      const featureA = Cypress._.find(
        res.body,
        (f) => f.feature.name === 'feature_a',
      )
      featureA.enabled = false
    })
  }).as('flags')
  cy.visit('/')
  cy.wait('@flags')
  cy.contains('#feature-area', 'Not showing feature A')
})

Only modify the feature under test

it('shows feature A' => {
  cy.intercept('/api/v1/flags/', (req) => {
    req.continue((res) => {
      const featureA = Cypress._.find(
        res.body,
        (f) => f.feature.name === 'feature_a',
      )
      featureA.enabled = true
    })
  }).as('flags')
  cy.visit('/')
  cy.wait('@flags')
  cy.contains('#feature-area', 'Showing feature A')
})

Only modify the feature under test

Unpredictable

Total control, selective control using network stubbing

  • what if the page reloads?
  • what if the user navigates to the next page?
  • what if there is backend behavior?
  • what about all other flags?

Testing Approach 3

"Per-user control"

  • 🔐 all feature flags are "locked" to match the production as much as possible
  • 👤 each test enables feature flags when needed via user session ID
import { withLDProvider } from 'launchdarkly-react-client-sdk'
const App = () => (
  ...
)

const user = {
  key: 'USER_1234'
}
export default withLDProvider({ 
  clientSideID: 'YOUR_CLIENT_SIDE_ID', 
  user 
})(App)

could be user id or browser session id

The current user USER_1234 receives the default Casual variation

Target USER_1234 via LaunchDarkly UI to receive the Formal variation

The user USER_1234 sees the Formal variation

Control Per-user Feature Flags Via LaunchDarkly API

# install using NPM
$ npm i -D cypress-ld-control
# install using Yarn
$ yarn add -D cypress-ld-control

Control Per-user Feature Flags Via LaunchDarkly API

const { initCypress } = require('cypress-ld-control')
...
e2e: {
  setupNodeEvents(on, config) {
    initCypress(on, config)
    // IMPORTANT: return the updated config object
    return config
  },
}

LAUNCH_DARKLY_PROJECT_KEY, LAUNCH_DARKLY_AUTH_TOKEN

const featureFlagKey = 'testing-launch-darkly-control-from-cypress'
const userId = 'USER_1234'

it('shows the casual greeting', () => {
  cy.setFeatureFlagForUser(
    featureFlagKey,
    userId,
    0, // variation
  )
  cy.visit('/')
  cy.contains('h1', 'Hello, World').should('be.visible')
})

Test feature flag variation 0

const featureFlagKey = 'testing-launch-darkly-control-from-cypress'
const userId = 'USER_1234'

it('shows the casual greeting', () => {
  cy.setFeatureFlagForUser(
    featureFlagKey,
    userId,
    0, // variation
  )
  cy.visit('/')
  cy.contains('h1', 'Hello, World').should('be.visible')
})

Test feature flag variation 0

Sets the feature flag variation on LaunchDarkly server

it('shows formal greeting', () => {
  cy.setFeatureFlagForUser(
    featureFlagKey,
    userId,
    1,
  )
  cy.visit('/')
  cy.contains('h1', 'How do you do, World').should('be.visible')
})

Test feature flag variation 2

cy.setFeatureFlagForUser(
  featureFlagKey,
  userId,
  2, // variation
)

Test feature flag variation 3

after(() => {
  cy.removeUserTarget(featureFlagKey, userId)
})

Don't forget to clean up

User ids appear and disappear

during testing

Humans

Catch Feature Flag Changes

npx list-ld-flags --project demo-project --environment test > ld-flags.json
- name: LD flag differences
  run: npx list-ld-flags --environment test --diff ld-flags.json
  env:
    # our CI has the project key as an environment variable
    LAUNCH_DARKLY_PROJECT_KEY: ${{ secrets.LAUNCH_DARKLY_PROJECT_KEY }}
    LAUNCH_DARKLY_AUTH_TOKEN: ${{ secrets.LAUNCH_DARKLY_AUTH_TOKEN }}

Catch any unexpected

feature flag changes

Feature + Test Lifecycle

Stage Feature Tests
experiment opt-in none

Feature + Test Lifecycle

Stage Feature Tests
experiment opt-in none
prototype opt-in a few, set feature variation for the user

Feature + Test Lifecycle

Stage Feature Tests
experiment opt-in none
prototype opt-in a few, set feature variation for the user
an alternative opt-in expanded, old tests set previous default variation

Feature + Test Lifecycle

Stage Feature Tests
experiment opt-in none
prototype opt-in a few, set feature variation for the user
an alternative opt-in expanded, old tests set previous default variation
switch default new tests remove opt-in, old tests use opt-in for the old variation

Feature + Test Lifecycle

Stage Feature Tests
experiment opt-in none
prototype opt-in a few, set feature variation for the user
an alternative opt-in expanded, old tests set previous default variation
switch default new tests remove opt-in, old tests use opt-in for the old variation
the new default default feature cleanup, old tests are removed

Archive The Feature Flag

Personal Advice

  • Do not write your own feature flags system
  • Set feature flag deadlines and retire flags aggressively
  • All new features should be explicit opt-in

Further Info ↗️

Feature Flags And End-to-End Testing

Gleb Bahmutov

Thank You 👏

Feature Flags And End-to-End Testing

By Gleb Bahmutov

Feature Flags And End-to-End Testing

Using feature flags is a popular way of developing new software features, but it presents problems for the integration and end-to-end tests. In my talk I will show principles for dealing with feature flags from E2E web tests. Shown at DevMTL

  • 2