CONTINuOUS

INTEGRATION

Software Engineering Lab.
Spring 2017

definition

Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.
It is NOT 'yet another tasting method' (not to be mistaken with Integration Testing)
- It defines: when/how to run out test cases

what we'll do

TLDR; talk through the basics and answer some fundamental question:
- What exactly is it?
- Why should it be considered?
- For what cost? Is it worth it?
- How should it be applied in practice?
Example tools:
1. TravisCI
2. Jenkins
Recap by looking at the advantages and disadvantages

The basics

the holy integration process

Integration Testing:
Assuring that different components/services that make an artifact cooperate with each other without an error
Continues Integration:
Assuring that a new piece of code will not cause a crash after merging with what's already stabilized
- Integration will probably include both Unit and Integration testing

The basics

the holy integration process

An inevitable part of software development
- Even if you are developing by yourself
Long, Frustrating, Redundant
Reason: Accumulation of a large amount of codebase changes, waiting to be merged into the original codebase
Solution: what we intuitively do agains accumulation
- Instead of postponing a ponderous task to future --wrongly pretexting that it is 'for the greater good'--, do it so frequent so that it becomes a normal event

The basics

an example

Let's assume I have to do something to a piece of software. assume it's small and can be done in a few hours

Get a clone of the stable version from a source control
Do whatever needs to be done. Should include:
- Changes in source code
- Changes in Test Cases
Build and Test locally
Pull the latest version from the mainline stream (why?)
Rebuild and Re-test
Fix the conflicts if step 5 fails

The basics

an example

7. Push changes for the Integration Machine to be built tested again

8. Wait for test results

9. Fix the conflicts if step 9 fails

10. It's Friday. Go home.

This example might seem both naive and overcomplicated.

Let's look at some practices used in adopting this testing habit.

practice #1: Maintain a Single Source Repository

EVERYTHING must be included in the core source code repository
- "The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system" -- Martin Fowler
- Even IDE configurations are recommended
- Build scripts are compelled
  - Makefiles
- The build result is not recommended
  - Note that 3rd party libs. also require build processes
An absolute must-have for maintaining stable/staging versions of the source code

practice #2: Automate the Build

Elaborate the last quote:
- "The basic rule of thumb is that you should be able to walk up to the project with a virgin machine, do a checkout, and be able to fully build the system with a single command" -- Martin Fowler
An initiative goal was speed, automation will indeed help that.
IDE build tools should be avoided in main tests.
Long build should be avoided
- Ideally, a good build tool analyzes what needs to be changed as part of the process and only compiles them.

practice #3: Make Your Build Self-Testing

Traditionally a build means compiling, linking, and all the additional stuff required to get a program to execute. A program may run, but that doesn't mean it does the right thing.
A good way to catch bugs more quickly and efficiently is to include automated tests in the build process.
The rise of TDD and XP has had a great impact on what's called Self-Testing code/build
- They both emphasize: Writing test before code
- We have weaker requirements for Self-Testing code:
  - Good coverage - Simple to run - Embeddable in build process

practice #3: Make Your Build Self-Testing

Of course you can't count on tests to find everything
- "Imperfect tests, run frequently, are much better than perfect tests that are never written at all"

practice #4: Everyone Commits To the Mainline Every Day

Integration is primarily about communication and responsibility
Integration allows developers to tell other developers about the changes they have made. Frequent communication allows people to know quickly as changes develop.
By keeping the time between changes short, bugs are easier to find. Changes are not widespread
- "The key to fixing problems quickly is finding them quickly"
The more frequently you commit, the less places you have to look for conflict errors, and the more rapidly you fix conflicts.
Frequent commits encourage developers to break down their work into small chunks of a few hours each

practice #5: Every Commit Should Build the Mainline on an Integration Machine

Using daily commits, a team gets frequent tested builds. This ought to mean that the mainline stays in a healthy state. In practice, however, things still do go wrong.
- An untested commit
- Development machine variation
As a result, each commit should be tested on a separate machine (aka. Continuous Integration Server)
a CI server is a monitor over the mainline source code
- Checkout a new version after every commit, build, test, notify the developer
- Scheduled builds and tests

practice #6: Fix Broken Builds Immediately

The mainline is the Holy Grail of the development focus
If it fails against a commit, it should be fixed immediately
- Most of the time, due to the short gap between current commit and the last one, the reason is obvious
- Might lead to reverting the mainline and giving the commit a second thought.
"nobody has a higher priority task than fixing the build" -- Kent Beck
- Not everyone needs to stop doing what they do and struggle to fix the build
- It further encourages strong communication and collaboration for a unified goal

practice #7: Keep the Build Fast

The whole point of Continuous Integration is to provide rapid feedback
- At most, 10 minutes build time is within reason
Most small projects will build within minutes, but not enterprise applications
- End to End testing
- Service Discovery Scenarios
- DB testing: inserting and modifying millions of records
- Load Testing: benchmarking server's response time under a huge load for a long time
An accepted solution is to have multiple build stages

practice #7: Keep the Build Fast

Stage 1:
- a commit will be tested agains fast Unit Tests
- The mainline is updated for other developers. The end product might not. he or she can go home afterwards
Stage 2:
- Longs tests will run later, perhaps in parallel
- Bugs caught by stage n should be transformed into small chunks of fast test and migrated into stage m, where m < n
- Tests must progress/improve through time

practice #8: Test in a Clone of the Production Environment

CI Server was introduced for primarily two reasons:
- Avoid dependency on development environment
- 24/7 monitoring over the mainline
Further emphasis on this leads to striving to duplicate everything from production machine.
Might not always be possible. Mimicking every single parameter of the production environment is time consuming (about which we talked in #7)
- Nowadays, most CI servers do this up to a certain degree

practice #9: Everyone can see what's happening

One of the most important things to communicate is the state of the mainline build
Most open source projects have multiple github badges
Some companies that use internal source control and CI servers change the ambient of their room based on build status.
Making everyone involved encourages them to stay involved with the project (simple, yet effective Gamification)

continues integration tools

jenkins

travis ci

ci tools

travis ci

Minimal
Simple configuration
Vast Language support
Sufficient reporting
Used for most independent and open source projects
Unlimited and free for public repositories
Supports ONLY github

ci tools

Jenkins

Comprehensive
- It comes at a price: Learning curve
Many plugins and reporters:
- Code quality
- Code style
Self-Contained executable
- You need a private server

ci tools: travis ci

build pipeline: node-js

Text

Everything starts with a very simple .travis.yml file

language: node_js
node_js:
  - "7"

This file must be added to the root of the source code

ci tools: travis ci

build pipeline: node-js

Text

Each build has two phases
- install: default setup script of the language (npm install)
- script: the default test script of the language (npm test)
Both can be overwritten:

ci tools: travis ci

build pipeline: node-js

Text

language: node_js
node_js:
  - "7"

install: ./install-dependencies.sh
// or 
install:
  - bundle install --path vendor/bundle
  - npm install

script: ./custom-test.sh
// or 
script: 
  - mytest --run 
  - npm test

ci tools: travis ci

build pipeline: node-js

install: ./foo.sh

before_script: 
  - apt-get install redis
  - redis-server

after_success: 
  - ./yoohoo.sh
after_failure: 
  - ./revert-all.sh

Hooks can be added to different phases of the build process

Install

script

ci tools: travis ci

build pipeline: node-js

deploy:
  provider: npm

after_deploy: ./update-doc.sh
before_deploy: ./clean-up.sh

Optional deploys can be added using Continues Providers (npm)

ci tools: travis ci

build pipeline: node-js

before_install:
  - sudo apt-get update -qq
  - sudo apt-get install -qq [packages list]

Packages can / should be installed using of the hooks depending on type

ci tools: travis ci

build pipeline: target branch

# blocklist
branches:
  except:
  - legacy
  - experimental

# safelist
branches:
  only:
  - master
  - stable

ci tools: travis ci

build pipeline: skipping a commit

git commit -am "this will be ignored [ci skip]"
git commit -am "also this [skip ci]"

ci tools: travis ci

build pipeline: Build matrix

language: ruby
rvm:
- 1.9.3
- 2.0.0
- 2.1.0
env:
- DB=mongodb
- DB=redis
- DB=mysql
gemfile:
- Gemfile
- gemfiles/rails4.gemfile
- gemfiles/rails31.gemfile
- gemfiles/rails32.gemfile

ci tools: travis ci

build pipeline: Build matrix

The last .travis.yml file included 3 * 3 * 4 tasks
This can be further modified

matrix:
  exclude:
  - rvm: 2.0.0
    gemfile: Gemfile

matrix:
  exclude:
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=mongodb
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=redis
  - rvm: 2.0.0
    gemfile: Gemfile
    env: DB=mysql

ci tools: travis ci

example

Start with an existing project
Create a new branch
Add a commit -> Test locally
Push to new branch
Observe build status online
Create a pull request

ci tools: travis ci

additional

many features were omitted:
Docker support
GUI testing: Sauce Labs
Cron Jobs
1 Build -> n Tasks
Pull request integration
Badges
Command line interface

recap: CI benifits

Text

The trouble with deferred integration is that it's very hard to predict how long it will take to do, and worse it's very hard to see how far you are through the process (complete blind spot).
- There's no long integration, you completely eliminate the blind spot.
Continuous Integrations doesn't get rid of bugs, but it does make them dramatically easier to find and remove.
As a result projects with Continuous Integration tend to have dramatically less bugs, both in production and in process

recap: CI benifits

CI removes one of the biggest barriers to frequent deployment.
Significant effect on team efficiency.