Testing Your Documentation Search

Gleb Bahmutov

VP of Engineering

Cypress.io 

@bahmutov

our planet is in imminent danger

https://lizkeogh.com/2019/07/02/off-the-charts/

+3 degrees Celsius will be the end.

survival is possible* but we need to act now

  • change your life
  • join an organization

rebellion.global          350.org

Talk Contents

  • The money funnel 

  • Good documentation

  • Documentation search with Algolia

  • Testing the documentation search

    • "blind", using fixture, spying on network

  • Tips & Tricks

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

💵

💰

👀

🧑‍💻

🦸

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

10-20%

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

10-20%

What if you converted all first time visitors to paying customers?

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

10-20%

What if you converted all first time visitors to paying customers?

You would increase your revenue 5 or 10 times!

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

This tool does not do what I need

It does not work on my platform

Valid reasons

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

This tool does not do what I need

It does not work on my platform

Valid reasons

This tool does not do what I need (but it does)

Does not work on my platform (but it does work)

Wrong reasons

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

This tool does not run on my CI

Valid reasons

I don't know how to make this tool work on my CI

Wrong reasons

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

I cannot afford this tool

Valid reasons

I don't know what value this tool gives me

Wrong reasons

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

I am happy with this tool, that's enough for me

Valid reasons

I did not know this tool could do that

Wrong reasons

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

Market fit

Features

Bug fixes

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

Market fit

Features

Bug fixes

Bad UI

Bad docs

Lack of examples

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers
Why the drop outs?

Market fit

Features

Bug fixes

Bad UI

Bad docs

Lack of examples

Goal: drive the "bad" area to zero

The Funnel

aka "You are always losing customers"

funnel of visitors to paying customers

If you avoid losing users in this area, you would increase your revenue 2 or 5 times!

Bad UI

Bad docs

Lack of examples

Goal: drive the "bad" area to zero

Features vs Docs

Bad UI

Bad docs

Lack of examples

⚖️

Market fit

Features

Bug fixes

  • So
  • Much
  • Easier

Documentation makes or breaks projects

Improve the Docs: Personas

First time visitor

First time user

Repeat user

Paying customer

Promoter

🔑 Visitor / User should always find the right answer

First time visitor

  • What does this tool / service do?

  • Will it work for me?

🔑 Visitor / User should always find the right answer

First time user

  • How do I do X?

  • How do I do Y?

  • What do I do for error A?

First time user

🔑 Visitor / User should always find the right answer

Repeat user

  • How do I do X and Y and Z?

  • What do I do for error A, B, C?

  • How do I run using ...?

  • How do I learn more?

Repeat user

🔑 Visitor / User should always find the right answer

Paying customer

  • same as repeat user +

  • Is there training?

  • I got stuck, is there help?

Paying customer

🔑 Visitor / User should always find the right answer

Promoter

  • When are the new releases?

  • What else can this tool do?

  • Can I suggest a feature?

  • Is there an email list or webinars?

Docs keep growing

Information for every user persona

How to measure how good your documentation is ... and start crying

Is there a way to disable command snapshots?

How to preserve session cookie?

How to check the length of text?

How to measure how good your documentation is ... and start crying

Every question here is a failure of the documentation

Every question here is probably a:

  1. a support issue 💵
  2. maybe a lost customer 💸

Documentation miss percentage

all well-defined questions across chat, support emails, github issues
questions that have answer in your documentation
* 100%

Documentation miss percentage

< 50%       write more docs!

≥ 50%

docs have a problem

(many things are not documented)

(users cannot find the right documentation)

You Want Users to Find Answers to Their Questions by Themselves

The documentation demands for different user personas are contradictory

Why Good Docs Are Hard?

First time visitor

First time user

Repeat user

Paying customer

Promoter

💥

"show me Hello, World!"

"show me the changelog diff from version X to Y"

"show me a tutorial"

"show me how to do X"

"how do I solve my issue or bug?"

👀 Optimize documentation structure for beginners

  • "Hello World" example front and center
  • There are 10 beginners for every 1 advanced user. Every user starts as a beginner

👀 Optimize documentation structure for beginners

Then add all your documentation to the site reachable from the index

💡 Make a powerful docs search for every persona

Sample App

Why not use the built-in client-side indexing?

Hint: look at the search results for "about" query

Algolia

  • I am unaffiliated, just a user

  • Generous free plan

  • Good documentation, scraping tools, nice UI

Algolia in action

Create Algolia App

Create Search Index

Workflow

  1. After every deploy 🔼
    1. Scrape the site 🕷
  2. Set search widget to point at the index using Algolia SDKs

API Keys

to add records to the index

Keep Admin API key private!

API Keys

to search index from site

(public)

Scraping the Site

Document Structure

{
  "index_name": "scrape-test",
  "start_urls": ["https://glebbahmutov.com/triple-tested/"],
  "selectors": {
    "lvl0": {
      "selector": ".site-name",
      "global": true
    },
    "lvl1": ".content__default h1",
    "lvl2": ".content__default h2",
    "lvl3": ".content__default h3",
    "lvl4": ".content__default h4",
    "lvl5": ".content__default h5",
    "text": ".content__default p, .content__default li"
  }
}

Algolia config (JSON)

Scraping

# when scraping the site, inject secrets as environment variables
# then pass their values into the Docker container using "-e" syntax
# and inject config.json contents as another variable
- name: scrape the site 🧽
  env:
    APPLICATION_ID: ${{ secrets.APPLICATION_ID }}
    API_KEY: ${{ secrets.API_KEY }}
  run: |
    docker run \
    -e APPLICATION_ID -e API_KEY \
    -e CONFIG="$(cat config.json)" \
    algolia/docsearch-scraper:v1.6.0

use Algolia Docker image

Scraping

# when scraping the site, inject secrets as environment variables
# then pass their values into the Docker container using "-e" syntax
# and inject config.json contents as another variable
- name: scrape the site 🧽
  env:
    APPLICATION_ID: ${{ secrets.APPLICATION_ID }}
    API_KEY: ${{ secrets.API_KEY }}
  run: |
    docker run \
    -e APPLICATION_ID -e API_KEY \
    -e CONFIG="$(cat config.json)" \
    algolia/docsearch-scraper:v1.6.0

use Algolia Docker image

Larger sites: Multiple roots

blog

examples

Larger sites: Multiple roots

{
  "index_name": "cypress",
  "start_urls": [
    {
      "url": "https://docs.cypress.io/",
      "page_rank": 10
    },
    {
      "url": "https://example.cypress.io/",
      "selectors_key": "kitchensink",
      "tags": ["example"],
      "page_rank": 2
    },
    {
      "url": "https://www.cypress.io/blog/",
      "selectors_key": "blog",
      "tags": ["blog post"],
      "page_rank": 1
    }
  ],
  "stop_urls": [
    "^https://docs.cypress.io/ja/",
    "^https://docs.cypress.io/zh-cn/",
    "^https://docs.cypress.io/pt-br/",
    "^https://docs.cypress.io/ru/"
  ],
  "selectors_exclude": [],
  "selectors": {
    "default": {
      "lvl0": "article h1.article-title",
      "lvl1": "article h1.article-heading",
      "lvl2": "article h2.article-heading",
      "text": "article .article-content p, article .article-content tr, article .article-content li, article .article-content pre"
    },
    "blog": {
      "lvl0": "article h1",
      "lvl1": "article h2",
      "text": "article p, article pre"
    },
    "kitchensink": {
      "lvl0": ".container h1",
      "lvl1": ".container h4",
      "text": ".container p, .container pre"
    }
  },
  "nb_hits": 0,
  "min_indexed_level": 2
}

Algolia config (JSON)

Larger sites: Multiple roots

2 minutes to scrape, creates ~ 21k records

<ais-search-box
  // Optional parameters
  placeholder="string"
  submitTitle="string"
  resetTitle="string"
  [searchAsYouType]="boolean"
  [autofocus]="boolean"
></ais-search-box>

ais-search-box

UI

  1. Deploy 🏗

  2. Scrape 🕷

  3. Test 🔎

Let's test the search

describe('Angular Doc Search', () => {
  it('shows native results', () => {
    cy.visit('/', {
      onBeforeLoad(win) {
        // ServiceWorker messes up with the page load
        delete win.navigator.__proto__.serviceWorker
      }
    })
    // delay each keystroke for the demo
    cy.get('input[aria-label=search]').type('testing', {delay: 70})

    // six search results columns
    cy.get('.search-section-header').should('have.length', 6)
    cy.contains('.search-section-header', 'cli')
      .parent('.search-area')
      .find('.search-page').should('have.length.gte', 3)
  })
})

Let's test the search

a "blind" test guesses the search term and results

describe('Angular Doc Search', () => {
  it('shows native results', () => {
    cy.visit('/', {
      onBeforeLoad(win) {
        // ServiceWorker messes up with the page load
        delete win.navigator.__proto__.serviceWorker
      }
    })
    // delay each keystroke for the demo
    cy.get('input[aria-label=search]').type('testing', {delay: 70})

    // six search results columns
    cy.get('.search-section-header').should('have.length', 6)
    cy.contains('.search-section-header', 'cli')
      .parent('.search-area')
      .find('.search-page').should('have.length.gte', 3)
  })
})

Let's test the search

describe('Angular Doc Search', () => {
  it('shows native results', () => {
    cy.visit('/', {
      onBeforeLoad(win) {
        // ServiceWorker messes up with the page load
        delete win.navigator.__proto__.serviceWorker
      }
    })
    // delay each keystroke for the demo
    cy.get('input[aria-label=search]').type('testing', {delay: 70})

    // six search results columns
    cy.get('.search-section-header').should('have.length', 6)
    cy.contains('.search-section-header', 'cli')
      .parent('.search-area')
      .find('.search-page').should('have.length.gte', 3)
  })
})

Let's test the search

describe('Angular Doc Search', () => {
  it('shows native results', () => {
    cy.visit('/', {
      onBeforeLoad(win) {
        // ServiceWorker messes up with the page load
        delete win.navigator.__proto__.serviceWorker
      }
    })
    // delay each keystroke for the demo
    cy.get('input[aria-label=search]').type('testing', {delay: 70})

    // six search results columns
    cy.get('.search-section-header').should('have.length', 6)
    cy.contains('.search-section-header', 'cli')
      .parent('.search-area')
      .find('.search-page').should('have.length.gte', 3)
  })
})

Let's test the search

Let's Dig Deeper 

it('shows single search result', () => {
  // https://on.cypress.io/intercept
  cy.intercept('/search-data.json', { fixture: 'single-result.json' })
  cy.visit('/', {
    onBeforeLoad(win) {
      // ServiceWorker messes up with the page load
      delete win.navigator.__proto__.serviceWorker
    }
  })
  // delay each keystroke for the demo
  cy.get('input[aria-label=search]').type('testing', {delay: 70})
})
[{
  "headingWords": "testing",
  "keywords": "testing unit component e2e",
  "path": "cli/test",
  "title": "Testing is fun",
  "titleWords": "testing is fun",
  "type": "content"
}]

cypress/fixtures/single-result.json

cy.get('input[aria-label=search]')
  .type('testing', { delay: 70 })

cy.contains('.search-section-header', 'cli')
  .parent('.search-area')
  .contains('.search-page', 'Testing is fun')
  .click()
cy.get('input[aria-label=search]')
  .type('testing', { delay: 70 })

cy.contains('.search-section-header', 'cli')
  .parent('.search-area')
  .contains('.search-page', 'Testing is fun')
  .click()
  
 cy.location('pathname').should('equal', '/cli/test')
name: ci
on: [push]
jobs:
  cypress-run:
    runs-on: ubuntu-20.04
    steps:
      - name: Checkout
        uses: actions/checkout@v1
      # Install NPM dependencies, cache them correctly
      # and run all Cypress tests
      - name: Cypress run
        uses: cypress-io/github-action@v2

Continuous Integration Break

🎉 Break is Over 🎊

cy.get('input[aria-label=search]')
  .type('testing', { delay: 70 })

cy.contains('.search-section-header', 'cli')
  .parent('.search-area')
  .contains('.search-page', 'Testing is fun')
  .click()
  
cy.location('pathname')
  .should('equal', '/cli/test')

Hardcoded test data

The same data is in the fixture JSON file ...

import singleResult from '../fixtures/single-result.json'

cy.intercept('/search-data.json', singleResult)
cy.visit()

const {headingWords, title, path} = singleResult[0]
cy.get('input[aria-label=search]')
  .type(headingWords, { delay: 70 })

cy.get('.search-section-header')
  .parent('.search-area')
  .contains('.search-page', title)
  .click()

cy.location('pathname')
  .should('equal', '/' + path)

Load test data from a fixture file & intercept too

If test data comes from a fixture ...

               then we can use the real data from the network call

cy.intercept('/search-data.json').as('search')
cy.visit()

spy on network call

cy.intercept('/search-data.json').as('search')
cy.visit()
cy.wait('@search').its('response.body')

spy on network call

cy.wait('@search').its('response.body')
  .then(list => {
    return Cypress._.find(list, { title: 'Accessibility in Angular' })
  })
  .then(result => {
    expect(result).to.be.an('object')
  })
.then(result => {
  expect(result).to.be.an('object')
  const { headingWords, title, path } = result
  const search = headingWords.split(' ')[0]
  // delay each keystroke for the demo
  cy.get('input[aria-label=search]')
    .type(search, { delay: 70 })

  cy.contains('.search-page a', title).click()
  cy.location('pathname')
    .should('equal', '/' + path)
})

Talk Contents

  • The money funnel

  • Good documentation

  • Documentation search with Algolia

  • Testing the documentation search

    • "blind", using fixture, spying on network

  • Tips & Tricks

code comments are not indexed 😟

💡 Include code comments in the index

💡 Include code comments in the index

{
  "index_name": "cypress-examples",
  "start_urls": ["https://glebbahmutov.com/cypress-examples/"],
  "selectors": {
    "lvl0": {
      "selector": ".site-name",
      "global": true
    },
    "lvl1": ".content__default h1",
    "lvl2": ".content__default h2",
    "lvl3": ".content__default h3",
    "lvl4": ".content__default h4",
    "lvl5": ".content__default h5",
    "text": ".content__default p, .content__default li, .content__default pre .comment"
  }
}

algolia-config.json

scrape paragraphs AND list items AND code comments

💡 Include code comments in the index

{
  "index_name": "cypress-examples",
  "start_urls": ["https://glebbahmutov.com/cypress-examples/"],
  "selectors": {
    "lvl0": {
      "selector": ".site-name",
      "global": true
    },
    "lvl1": ".content__default h1",
    "lvl2": ".content__default h2",
    "lvl3": ".content__default h3",
    "lvl4": ".content__default h4",
    "lvl5": ".content__default h5",
    "text": ".content__default p, .content__default li, .content__default pre .comment"
  }
}

algolia-config.json

Tip: use $$(selector) in DevTools

💡 Look at searches with no results

⚠️ free Algolia plans only get weekly emails with such queries

write docs for these searches!

💡 Search from CLI

Final Thoughts

Good documentation is essential to your project's success

Final Thoughts

Creating an ideal documentation structure is hard and might be impossible

Final Thoughts

Creating a good enough text search using Algolia is pretty simple 🎉

Do not skip testing the search 🔎

Gleb Bahmutov

VP of Engineering

Cypress.io 

@bahmutov

👏 Thank you 👏

Testing Your Documentation Search

By Gleb Bahmutov

Testing Your Documentation Search

Good documentation with powerful search is the key to the project's success with users. I will show how to configure Algolia search to scrape your site, and how to test the search using Cypress.io test runner. Presented at AngularUp in Nov 2020. Video at https://www.youtube.com/watch?v=cqhV8UbT5LQ

  • 4,246