Codeless AI Test Automation Makes Me Code A Lot

Gleb Bahmutov

Speaker: Gleb Bahmutov PhD

C / C++ / C# / Java / CoffeeScript / JavaScript / Node / Angular / Vue / Cycle.js / functional programming / testing

🌎 πŸ”₯ 350.org 🌎 πŸ”₯ citizensclimatelobby.org 🌎 πŸ”₯

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

A typical Mercari US Cypress E2E test

image source: https://www.popularmechanics.com/culture/g2759/starship-uss-enterprise-ranked/

Computer, one cup of hot tea

Computer, ????, and three glasses

Computer, one web app!

Now we wait...

Wait some more...

Pretty good

result after 90 seconds!

Greenfield development is the best

Q: How does AI know how to answer this?

A: It was trained. A lot.

  • Hallucinations
  • Weird code
  • Verbosity

Bad Training Leads To:

  • Maintenance nightmare

Easy Code Generation Leads To:

Computer, one end-to-end test!

Hmm, how do I ...

Not a greenfield project

  • Pick elements to interact with?
  • Assert the results on the page?
  • Check other data?

AI needs to "know" your app

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"
Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"
Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"
Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"
Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"
Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

institutional knowledge (in your head)

Computer, read the source code / design docs and find out!

Copy the entire repo source code

and include with your prompt...

Copy the entire repo source code

and include with your prompt...

$ npx repomix path/to/directory
$ npx repomix --remote bahmutov/todo-ai-example

The larger the context ...

the longer we wait πŸ•°οΈ

and pay more πŸ’°Β 

Prompt 1

+

Lots of context

"Thinking"

Prompt 2

+

Lots of context

"Thinking"

Prompt 3

+

Lots of context

"Thinking"

Prompt 4

+

Lots of context

"Thinking"

Prompt 2

+

Lots of context

"Thinking"

Prompt 3

+

Lots of context

"Thinking"

Prompt 4

+

Lots of context

"Thinking"

Prompt 1

+

Lots of context

"Thinking"

Review

Reviewing AI code

Not Ideal

Human code

AI code

context

time

Slow

Complex

Likely to 🚨

context

time

Simple

Fast

Likely βœ…

  • Inline code completions

  • Retrieval Augmented Generation

  • Simple tasks

  • Picking test tags

Slow

Complex

Likely to 🚨

Copilot inline suggestion

speed up coding πŸ‘πŸ‘πŸ‘

comments give Copilot

all the context

Inline code completions

Write good comments

They help:

  • you

  • your coworkers

  • AI

Β 

Retrieval Augmented Generation

Code Generation

  1. Take existing code and comments
  2. Generate more code
  1. Take existing code and comments
  2. Find relevant high quality documents
  3. Add found results to the prompt
  4. Generate more code

Augmented Code Generation

Retrieval Augmented Generation

Β πŸ“ "Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

Subtle timing issue whenΒ we get the text

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})
cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })

retrieved

code example

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})
cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })
it('changes the label after the click', () => {
  cy.visit('/')
  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((oldText) => {
      // click the button
      cy.get('#bar').click()
      cy.get('#foo').should('not.have.text', oldText)
    })
})

Training Quality beats quantity

(you still need good examples!)

Simple tasks

Triaging a failed Cypress test at Mercari US

Triaging a failed Cypress test at Mercari US

Ask AI agent to fix it via Slack interface

Simple tasks

Simple tasks

πŸ‘

  • Likely to succeed
  • Easy to review
  • Tested

context

time

Simple

Async

βœ… is optional

  1. AI code reviews

  2. Meaningful abstractions

Simple

Fast

Likely βœ…

Slow

Complex

Likely to 🚨

When performing a code review:

- confirm that there are no hard-coded magic numbers.
  Prefer using named constants.
- do not allow unreachable code
- check each HTML element that shows any unique application data,
  like prices, values, names, address, etc to have a `data-testid`
  attribute to be used in end-to-end tests. If the attribute is missing,
  add a `data-testid` attribute with a meaningful value.
  Also add `data-testid` attributes to the top level forms, pages,
  large components.

copilot-instructions.md

AI code reviews

Copilot review can detect page elements without β€œdata-testid” attributes and even suggest good attribute names

Custom "linter" rules

// ANTI-PATTERN: hardcoded wait
cy.wait(45_000)
import { defineConfig } from 'eslint/config'
import pluginCypress from 'eslint-plugin-cypress'
export default defineConfig([
  {
    plugins: {
      cypress: pluginCypress,
    },
    rules: {
      'cypress/no-unnecessary-waiting': 'warn',
    },
  },
])

cypress-io/eslint-plugin-cypress

eslint.config.js

What if I want to warn on waits longer than 30 seconds?!

When performing a code review, if the modified spec file has `cy.wait(n)` call, suggest replacing it with `cy.wait(seconds(n/1000))` value. Also suggest changing it if the duration is longer than 30 seconds.

copilot-instructions.md

Meaningful

Abstractions

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

Without AI instructions πŸ‘Ž

With AI instructions βœ…

Voice prompt

πŸ“ blog post "Good examples" https://glebbahmutov.com/blog/good-examples/

Sept 14, 2014

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

πŸ“ blog post "Copilot Instructions Example" https://glebbahmutov.com/blog/copilot-instructions-example/

Oct 9, 2025

examples

Replicator

Small simple steps following a plan

"prompt: assemble Millennium Falcon"

Final Thoughts

Codeless AI Test Automation Makes Me Code A Lot

Gleb Bahmutov

πŸ™ Thank you πŸ™