Codeless AI Test Automation Makes Me Code A Lot

Gleb Bahmutov

gleb.dev

Speaker: Gleb Bahmutov PhD

🦋 bahmutov.bsky.social

gleb.dev

github.com/bahmutov

glebbahmutov.com/blog

C / C++ / C# / Java / CoffeeScript / JavaScript / Node / Angular / Vue / Cycle.js / functional programming / testing

www.youtube.com/glebbahmutov

🌎 🔥 350.org 🌎 🔥 citizensclimatelobby.org 🌎 🔥

https://cypress.tips/courses

gleb.dev

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

https://slides.com/bahmutov/decks/mercari

A typical Mercari US Cypress E2E test

gleb.dev

image source: https://www.popularmechanics.com/culture/g2759/starship-uss-enterprise-ranked/

Computer, one cup of hot tea

Computer, ????, and three glasses

gleb.dev

Computer, one web app!

gleb.dev

Now we wait...

Wait some more...

Pretty good

result after 90 seconds!

Greenfield development is the best

Q: How does AI know how to answer this?

A: It was trained. A lot.

gleb.dev

Hallucinations
Weird code
Verbosity

gleb.dev

Bad Training Leads To:

Maintenance nightmare

gleb.dev

Easy Code Generation Leads To:

Computer, one end-to-end test!

gleb.dev

Hmm, how do I ...

Not a greenfield project

Pick elements to interact with?
Assert the results on the page?
Check other data?

AI needs to "know" your app

gleb.dev

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

Prompt:
- create a "Completes a todo" end-to-end Cypress test
- visit the base url
- confirm the application has finished loading its data
- enter a todo with random text
- confirm the same text is visible in the list of todos

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

gleb.dev

Context:
- the application is hosted at "staging.acme.co"
- the application finishes loading when the "body" element has the class "loaded"
- user can enter new todos using an input element with class "new-todo"
- list of todo items has class "todo-list". Each item has class "todo"
- completed todo items have class "completed"

institutional knowledge (in your head)

Computer, read the source code / design docs and find out!

gleb.dev

https://gitingest.com/

gleb.dev

Copy the entire repo source code

and include with your prompt...

gleb.dev

Copy the entire repo source code

and include with your prompt...

gleb.dev

$ npx repomix path/to/directory
$ npx repomix --remote bahmutov/todo-ai-example

gleb.dev

The larger the context ...

the longer we wait 🕰️

and pay more 💰

Prompt 1

Lots of context

"Thinking"

Prompt 2

Lots of context

"Thinking"

Prompt 3

Lots of context

"Thinking"

Prompt 4

Lots of context

"Thinking"

gleb.dev

Prompt 2

Lots of context

"Thinking"

Prompt 3

Lots of context

"Thinking"

Prompt 4

Lots of context

"Thinking"

gleb.dev

Prompt 1

Lots of context

"Thinking"

Review

Reviewing AI code

Not Ideal

gleb.dev

Human code

AI code

gleb.dev

context

time

gleb.dev

Slow

Complex

Likely to 🚨

context

time

Simple

Fast

Likely ✅

gleb.dev

Inline code completions
Retrieval Augmented Generation

Simple tasks
Picking test tags

Slow

Complex

Likely to 🚨

Copilot inline suggestion

speed up coding 👍👍👍

comments give Copilot

all the context

gleb.dev

Inline code completions

Write good comments

They help:

you
your coworkers
AI

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Take existing code and comments
Generate more code

Take existing code and comments
Find relevant high quality documents
Add found results to the prompt
Generate more code

Augmented Code Generation

gleb.dev

Retrieval Augmented Generation

📝 "Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((newText) => {
      cy.get('@initialText').then((initialText) => {
        expect(newText).to.not.equal(initialText)
      })
    })
})

Subtle timing issue when we get the text

gleb.dev

Retrieval Augmented Generation

"Build RAG Using Chroma DB" https://glebbahmutov.com/blog/build-rag-using-chroma-db/

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})

cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })

retrieved

code example

https://glebbahmutov.com/cypress-examples

gleb.dev

Code Generation

Augmented Code Generation

it('changes the label after the click', () => {
  cy.visit('/')
  // get the initial label text and store it
  cy.get('#foo')
    .invoke('text')
    .as('initialText')

  // click the button
  cy.get('#bar').click()

  // verify the label text has changed
})

cy.get('#output')
  .invoke('text')
  .then((text) => {
    cy.get('#change').click()
    cy.get('#output').should('not.have.text', text)
  })

it('changes the label after the click', () => {
  cy.visit('/')
  // verify the label text has changed
  cy.get('#foo')
    .invoke('text')
    .then((oldText) => {
      // click the button
      cy.get('#bar').click()
      cy.get('#foo').should('not.have.text', oldText)
    })
})

https://glebbahmutov.com/cypress-examples

Training Quality beats quantity

gleb.dev

(you still need good examples!)

Simple tasks

gleb.dev

Triaging a failed Cypress test at Mercari US

Ask AI agent to fix it via Slack interface

Simple tasks

gleb.dev

Simple tasks

👍

gleb.dev

Likely to succeed
Easy to review
Tested

context

time

gleb.dev

Simple

Async

✅ is optional

AI code reviews
Meaningful abstractions

Simple

Fast

Likely ✅

Slow

Complex

Likely to 🚨

When performing a code review:

- confirm that there are no hard-coded magic numbers.
  Prefer using named constants.
- do not allow unreachable code
- check each HTML element that shows any unique application data,
  like prices, values, names, address, etc to have a `data-testid`
  attribute to be used in end-to-end tests. If the attribute is missing,
  add a `data-testid` attribute with a meaningful value.
  Also add `data-testid` attributes to the top level forms, pages,
  large components.

copilot-instructions.md

AI code reviews

Copilot review can detect page elements without “data-testid” attributes and even suggest good attribute names

Custom "linter" rules

// ANTI-PATTERN: hardcoded wait
cy.wait(45_000)

import { defineConfig } from 'eslint/config'
import pluginCypress from 'eslint-plugin-cypress'
export default defineConfig([
  {
    plugins: {
      cypress: pluginCypress,
    },
    rules: {
      'cypress/no-unnecessary-waiting': 'warn',
    },
  },
])

cypress-io/eslint-plugin-cypress

eslint.config.js

What if I want to warn on waits longer than 30 seconds?!

When performing a code review, if the modified spec file has `cy.wait(n)` call, suggest replacing it with `cy.wait(seconds(n/1000))` value. Also suggest changing it if the duration is longer than 30 seconds.

copilot-instructions.md

Meaningful

Abstractions

gleb.dev

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

Without AI instructions 👎

With AI instructions ✅

gleb.dev

Voice prompt

gleb.dev

📝 blog post "Good examples" https://glebbahmutov.com/blog/good-examples/

Sept 14, 2014

gleb.dev

## Use the TodoMVC page object

Preferred way is to use the TodoMVC page object from `cypress/e2e/todomvc.po.js`

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.visit()
```

## Reset the backend

Test can reset the backend data to zero todos state using the following commands

```js
cy.request('POST', '/reset', { todos: [] })
```

## Application loaded

Test can confirm the application has finished loading

```js
cy.get('body.loaded')
```

## Set the backend data

You can set the backend to have specific todos before visiting the app. Let's set 2 todos. Each todo must have an `id`, `title`, and `completed` status.

```js
cy.request('POST', '/reset', {
  todos: [
    { id: '1', title: 'learn testing', completed: false },
    { id: '2', title: 'learn cypress', completed: false },
  ],
})
```

Preferably, use the page object method

```js
import { TodoMVC } from './todomvc.po'
// inside the test or beforeEach hook
TodoMVC.reset([
  { id: '1', title: 'learn testing', completed: false },
  { id: '2', title: 'learn cypress', completed: false },
])
```

copilot-instructions.md

📝 blog post "Copilot Instructions Example" https://glebbahmutov.com/blog/copilot-instructions-example/

Oct 9, 2025

gleb.dev

examples

gleb.dev

Replicator

gleb.dev

Small simple steps following a plan

"prompt: assemble Millennium Falcon"

gleb.dev

Final Thoughts

Codeless AI Test Automation Makes Me Code A Lot

Codeless AI Test Automation Makes Me Code A Lot

Gleb Bahmutov

gleb.dev

Speaker: Gleb Bahmutov PhD

Gleb Bahmutov

Sr Director of Engineering

Mercari Does A Lot Of Testing

Computer, one web app!

Greenfield development is the best

Bad Training Leads To:

Easy Code Generation Leads To:

Computer, one end-to-end test!

Reviewing AI code

Not Ideal

Human code

AI code

Inline code completions

Retrieval Augmented Generation

Simple tasks

Picking test tags

Inline code completions

Write good comments

They help:

you

your coworkers

AI

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Retrieval Augmented Generation

Code Generation

Augmented Code Generation

Code Generation

Augmented Code Generation

Training Quality beats quantity

Simple tasks

Simple tasks

Simple tasks

👍

AI code reviews

Meaningful abstractions

AI code reviews

Custom "linter" rules

Meaningful

Abstractions

Replicator

Final Thoughts

Codeless AI Test Automation Makes Me Code A Lot

Gleb Bahmutov

gleb.dev

🙏 Thank you 🙏