CONTINuOUS

INTEGRATION

Software Engineering Lab.
Spring 2017

definition

  • Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.
  • It is NOT 'yet another tasting method' (not to be mistaken with Integration Testing)
    • It defines: when/how to run out test cases

 

what we'll do

  1. TLDR; talk through the basics and answer some fundamental question: 
    • What exactly is it? 
    • Why should it be considered? 
    • For what cost? Is it worth it? 
    • How should it be applied in practice? 
  2. Example tools:
    1. TravisCI 
    2. Jenkins
  3. Recap by looking at the advantages and disadvantages

The basics

the holy integration process

  • Integration Testing:
    Assuring that different components/services that make an artifact cooperate with each other without an error
  • Continues Integration:
    Assuring that a new piece of code will not cause a crash after merging with what's already stabilized
    • Integration will probably include both Unit and Integration testing

The basics

the holy integration process

  • An inevitable part of software development
    • Even if you are developing by yourself
  • Long, Frustrating, Redundant
  • Reason: Accumulation of a large amount of codebase changes, waiting to be merged into the original codebase
  • Solution: what we intuitively do agains accumulation
    • Instead of postponing a ponderous task to future --wrongly pretexting that it is 'for the greater good'--, do it so frequent so that it becomes a normal event

The basics

an example

  • Let's assume I have to do something to a piece of software. assume it's small and can be done in a few hours
  1. Get a clone of the stable version from a source control
  2. Do whatever needs to be done. Should include:
    • Changes in source code
    • Changes in Test Cases
  3. Build and Test locally
  4. Pull the latest version from the mainline stream (why?)
  5. Rebuild and Re-test
  6. Fix the conflicts if step 5 fails

The basics

an example

7. Push changes for the Integration Machine to be built tested again

8. Wait for test results

9. Fix the conflicts if step 9 fails

10. It's Friday. Go home.

This example might seem both naive and overcomplicated.

Let's look at some practices used in adopting this testing habit.

practice #1: Maintain a Single Source Repository

  • EVERYTHING must be included in the core source code repository
    • "The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system" -- Martin Fowler
    • Even IDE configurations are recommended
    • Build scripts are compelled
      • Makefiles
    • The build result is not recommended
      • Note that 3rd party libs. also require build processes
  • An absolute must-have for maintaining stable/staging versions of the source code

practice #2: Automate the Build

  • Elaborate the last quote: 
    • "The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system with a single command" -- Martin Fowler
  • An initiative goal was speed, automation will indeed help that.
  • IDE build tools should be avoided in main tests.
  • Long build should be avoided
    • Ideally, a good build tool analyzes what needs to be changed as part of the process and only compiles them.

practice #3: Make Your Build Self-Testing

  • Traditionally a build means compiling, linking, and all the additional stuff required to get a program to execute. A program may run, but that doesn't mean it does the right thing.
  • A good way to catch bugs more quickly and efficiently is to include automated tests in the build process.

  • The rise of TDD and XP has had a great impact on what's called Self-Testing code/build

    • They both emphasize: Writing test before code

    • We have weaker requirements for Self-Testing code:

      • Good coverage - Simple to run - Embeddable in build process

practice #3: Make Your Build Self-Testing

  • Of course you can't count on tests to find everything
    • ​"Imperfect tests, run frequently, are much better than perfect tests that are never written at all"

practice #4: Everyone Commits To the Mainline Every Day

  • Integration is primarily about communication and responsibility

  • Integration allows developers to tell other developers about the changes they have made. Frequent communication allows people to know quickly as changes develop.

  • By keeping the time between changes short, bugs are easier to find. Changes are not widespread

    • "The key to fixing problems quickly is finding them quickly"

  • The more frequently you commit, the less places you have to look for conflict errors, and the more rapidly you fix conflicts.

  • Frequent commits encourage developers to break down their work into small chunks of a few hours each

practice #5: Every Commit Should Build the Mainline on an Integration Machine

  • Using daily commits, a team gets frequent tested builds. This ought to mean that the mainline stays in a healthy state. In practice, however, things still do go wrong.

    • An untested commit

    • Development machine variation

  • As a result, each commit should be tested on a separate machine (aka. Continuous Integration Server)

  • a CI server is a monitor over the mainline source code

    • Checkout a new version after every commit, build, test, notify the developer

    • Scheduled builds and tests

practice #6: Fix Broken Builds Immediately

  • The mainline is the Holy Grail of the development focus

  • If it fails against a commit, it should be fixed immediately

    • Most of the time, due to the short gap between current commit and the last one, the reason is obvious

    • Might lead to reverting the mainline and giving the commit a second thought.

  • "nobody has a higher priority task than fixing the build" -- Kent Beck

    • Not everyone needs to stop doing what they do and struggle to fix the build

    • It further encourages strong communication and collaboration for a unified goal

practice #7: Keep the Build Fast

  • The whole point of Continuous Integration is to provide rapid feedback

    • At most, 10 minutes build time is within reason

  • Most small projects will build within minutes, but not enterprise applications

    • End to End testing

    • Service Discovery Scenarios

    • DB testing: inserting and modifying millions of records

    • Load Testing: benchmarking server's response time under a huge load for a long time

  • An accepted solution is to have multiple build stages 

practice #7: Keep the Build Fast

  • Stage 1:

    • a commit will be tested agains fast Unit Tests

    • The mainline is updated for other developers. The end product might not. he or she can go home afterwards

  • Stage 2:

    • Longs tests will run later, perhaps in parallel

    • Bugs caught by stage n should be transformed into small chunks of fast test and migrated into stage m, where m < n

    • Tests must progress/improve through time

practice #8: Test in a Clone of the Production Environment

  • CI Server was introduced for primarily two reasons: 

    • Avoid dependency on development environment

    • 24/7 monitoring over the mainline

  • Further emphasis on this leads to striving to duplicate everything from production machine.

  • Might not always be possible. Mimicking every single parameter of the production environment is time consuming (about which we talked in #7)

    • Nowadays, most CI servers do this up to a certain degree

practice #9: Everyone can see what's happening

  • One of the most important things to communicate is the state of the mainline build

  • Most open source projects have multiple github badges


     

  • Some companies that use internal source control and CI servers change the ambient of their room based on build status.

  • Making everyone involved encourages them to stay involved with the project (simple, yet effective Gamification)

continues integration tools

jenkins

travis ci

ci tools

travis ci

  • Minimal
  • Simple configuration
  • Vast Language support
  • Sufficient reporting
  • Used for most independent and open source projects
  • Unlimited and free for public repositories
  • Supports ONLY github 

ci tools

Jenkins

  • Comprehensive
    • It comes at a price: Learning curve
  • Many plugins and reporters: 
    • Code quality
    • Code style
  • Self-Contained executable
    • You need a private server

ci tools: travis ci

build pipeline: node-js

Text

  • Everything starts with a very simple .travis.yml file
language: node_js
node_js:
  - "7"
  • This file must be added to the root of the source code

ci tools: travis ci

build pipeline: node-js

Text

  • Each build has two phases
    • install: default setup script of the language (npm install)
    • script: the default test script of the language (npm test)
  • Both can be overwritten: 

ci tools: travis ci

build pipeline: node-js

Text

language: node_js
node_js:
  - "7"

install: ./install-dependencies.sh
// or 
install:
  - bundle install --path vendor/bundle
  - npm install

script: ./custom-test.sh
// or 
script: 
  - mytest --run 
  - npm test

ci tools: travis ci

build pipeline: node-js

install: ./foo.sh

before_script: 
  - apt-get install redis
  - redis-server

after_success: 
  - ./yoohoo.sh
after_failure: 
  - ./revert-all.sh

Hooks can be added to different phases of the build process

Install

script

ci tools: travis ci

build pipeline: node-js

deploy:
  provider: npm

after_deploy: ./update-doc.sh
before_deploy: ./clean-up.sh

Optional deploys can be added using Continues Providers (npm)

ci tools: travis ci

build pipeline: node-js

before_install:
  - sudo apt-get update -qq
  - sudo apt-get install -qq [packages list]

Packages can / should be installed using of the hooks depending on type

ci tools: travis ci

build pipeline: target branch

# blocklist
branches:
  except:
  - legacy
  - experimental

# safelist
branches:
  only:
  - master
  - stable

ci tools: travis ci

build pipeline: skipping a commit

git commit -am "this will be ignored [ci skip]"
git commit -am "also this [skip ci]"

ci tools: travis ci

build pipeline: Build matrix

language: ruby
rvm:
- 1.9.3
- 2.0.0
- 2.1.0
env:
- DB=mongodb
- DB=redis
- DB=mysql
gemfile:
- Gemfile
- gemfiles/rails4.gemfile
- gemfiles/rails31.gemfile
- gemfiles/rails32.gemfile

ci tools: travis ci

build pipeline: Build matrix

  • The last .travis.yml file included 3 * 3 * 4 tasks
  • This can be further modified
matrix:
  exclude:
  - rvm: 2.0.0
    gemfile: Gemfile
matrix:
  exclude:
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=mongodb
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=redis
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=mysql

ci tools: travis ci

example

  • Start with an existing project
  • Create a new branch
  • Add a commit -> Test locally 
  • Push to new branch 
  • Observe build status online
  • Create a pull request

ci tools: travis ci

additional

  • many features were omitted: 
  • Docker support 
  • GUI testing: Sauce Labs
  • Cron Jobs
  • 1 Build -> n Tasks
  • Pull request integration
  • Badges
  • Command line interface

recap: CI benifits

Text

  • The trouble with deferred integration is that it's very hard to predict how long it will take to do, and worse it's very hard to see how far you are through the process (complete blind spot). 
    • There's no long integration, you completely eliminate the blind spot.
  • Continuous Integrations doesn't get rid of bugs, but it does make them dramatically easier to find and remove.
  • As a result projects with Continuous Integration tend to have dramatically less bugs, both in production and in process

recap: CI benifits

  • CI removes one of the biggest barriers to frequent deployment.
  • Significant effect on team efficiency.

CI - Software Engineering Lab

By Kian Peymani

CI - Software Engineering Lab

  • 639