Using bash more efficiently

Primer for predocs - 2019

Michael Hall - Iqbal Group

TLDR

Simplified man pages with practical examples

$  tldr awk

# awk

  A versatile programming language for working on files.

- Print the fifth column (a.k.a. field) in a space-separated file:

  awk '{print $5}' filename

- Print the second column of the lines containing "something" in a space-separated file:

  awk '/something/ {print $2}' filename

- Sum the values in the first column of a file and print the total:

  awk '{s+=$1} END {print s}' filename

- Sum the values in the first column and pretty-print the values and then the total:

  awk '{s+=$1; print $1} END {print "--------"; print s}' filename

- Print every third line starting from the first line:

  awk 'NR%3==1' filename

TLDR

Simplified man pages with practical examples

$  tldr tar

# tar

  Archiving utility.
  Often combined with a compression method, such as gzip or bzip.

- Create an archive from files:

  tar cf target.tar file1 file2 file3

- Create a gzipped archive:

  tar czf target.tar.gz file1 file2 file3

- Extract an archive in a target folder:

  tar xf source.tar -C folder

- Extract a gzipped archive in the current directory:

  tar xzf source.tar.gz

- Extract a bzipped archive in the current directory:

  tar xjf source.tar.bz2

TLDR

Simplified man pages with practical examples

Online version and docs - https://tldr.sh/

Install

Python client:  pip3 install tldr

NPM client:   npm install -g tldr

Alias

Brevity is the soul of wit"

- William Shakespeare

$  tldr alias

alias

  Creates aliases -- words that are replaced by a command string.
  Aliases expire with the current shell session, unless they're defined in the shell's 
  configuration file, e.g. ~/.bashrc.

  - Create a generic alias:
    alias word="command"

  - View the command associated to a given alias:
    alias word

  - Remove an aliased command:
    unalias word

  - List all aliased words:
    alias -p

  - Turn rm into an interactive command:
    alias rm="rm -i"

  - Create la as a shortcut for ls -a:
    alias la="ls -a"

Alias

Useful aliases


# ALIASES
alias ..="cd .."  # go up one directory
alias ...="cd ../.."  # go up two directories
alias ....="cd ../../.."  # go up three directories
alias .....="cd ../../../.."  # you get the idea
alias c="clear"  # clear the screen
alias ls="ls --color=auto "  # colourise the output of ls
alias la="ls -a "  # show all files
alias ll="ls -la " # use long list format
alias bc="bc -l"  # start calculator with maths support
alias df="df -H "  # human readable df by default
alias gst="git status"
alias untar="tar -zxvf " # very common option combo
alias hs="history | grep " # will be useful later

Bash functions

More complex aliasing

# make directory and change into it.
mkcd() { 
    mkdir -p "$1" && cd "$1"; 
}

# count number of sequences in fasta/fastq
cs () {
   if [ -f $1 ] ; then
       case $1 in
        *.fastq)     wc -l "$1" | awk '{print $1/4}' ;;
        *.fq)        wc -l "$1" | awk '{print $1/4}' ;;
        *.fastq.gz)  zcat "$1" | wc -l | awk '{print $1/4}' ;;
        *.fq.gz)     zcat "$1" | wc -l | awk '{print $1/4}' ;;
        *.fasta)     grep -c "^>" "$1" ;;
        *.fa)        grep -c "^>" "$1" ;;
        *.fasta.gz)  zcat "$1" | grep -c "^>" ;;
        *.fa.gz)     zcat "$1" | grep -c "^>" ;;
        *)           echo "don't know how to count sequences in '$1'..." ;;
       esac
   else
       echo "'$1' is not a valid file"
   fi
}

for loops

for value in {1..5}
do
  echo "$value"
done
for value in {1..5}
do
  touch sample_"$value".fa
done
for fname in $(ls *.fa)
do
  echo "$fname"
done

realpath

Display the resolved absolute path for a file or directory.

$ realpath sample_1.fa
full/path/to/sample_1.fa

Mac users:

# install via homebrew
brew install coreutils
# or from github repo
git clone https://github.com/harto/realpath-osx.git
cd realpath-osx
make
sudo make install

Exercise

echo the absolute path for all fasta files in directory.

Exercise

echo the absolute path for all fasta files in directory.

for fname in $(ls *.fa)
do
  echo $(realpath $fname)
done

# or

for fname in $(ls *.fa)
do
  fullname=$(realpath $fname)
  echo "$fullname"
done

Path manipulation

filename=$(realpath sample_3.fa)
echo "$filename"

base=$(basename -- "$filename")
echo "$base"

dir=$(dirname -- "$filename")
echo "$dir"

Note: -- handles an edge case where the filename begins with a '-' and is interpreted as a program option.

Path manipulation

filename=$(realpath sample_3.fa)
echo "$filename"

# base=$(basename -- "$filename")
base="${filename##*/}"
echo "$base"

# dir=$(dirname -- "$filename")
dir="${filename%/*}"
echo "$dir"

For more on parameter substitution in bash see https://www.tldp.org/LDP/abs/html/parameter-substitution.html

${var##Pattern} Remove from $var the longest part of $Pattern that matches the front end of $var.

${var%Pattern} Remove from $var the shortest part of $Pattern that matches the back end of $var.

Exercise

Get the file extension of a path.

Exercise

Get the file extension of a path.

filename=$(realpath sample_3.fa)
echo "$filename"

extension="${filename##*.}"
echo "$extension"

Exercise

Write a loop that creates a matching VCF file for each fasta.

Exercise

Write a loop that creates a matching VCF file for each fasta.

for fname in $(ls *.fa)
do
  # Remove extension from path
  base=${fname%.*}
  touch "$base".vcf
done

ls

find and xargs

Find files and execute commands on a per-file basis

mkdir deleteme
touch deleteme/ex.fa
find . -name '*.fa'
# limit the search depth
find . -maxdepth 1 -name '*.fa' 

find and xargs

find . -name '*.fa' | xargs -I @ realpath @

xargs takes input from pipe and executes the specified command on each element (filepath in this case)

find and xargs

find . -name '*.fa' | \
  xargs -I @ sh -c 'p=$(realpath @);touch ${p%.*}.vcf'

xargs takes input from pipe and executes the specified command on each element (filepath in this case)

Exercise

Remove all VCF and fasta files using find and xargs

Exercise

find . -name '*.fa' -or -name '*.vcf' | xargs -I § rm §

Remove all VCF and fasta files using find and xargs

All-in-one

# create text file with one sample name per line
echo -e "sample_1\nsample_2\nsample_3" > samples.txt

# create fasta and vcf files with sample names
cat samples.txt | \
  xargs -I @ sh -c 'touch @.fa;touch @.vcf'

Using history

Those who do not remember the past are condemned to repeat it"

George Santayana

history

Using history

history | grep find
# we also created an alias `hs`

Rather than copy and paste a line of interest, we can use 'Event Designators'

  697  find . -name '*.{fa,vcf}'
  698  # create text file with one sample name per line
  699  echo -e "sample_1\nsample_2\nsample_3" > samples.txt
  700  # create fasta and vcf files with sample names
  701  cat samples.txt |   xargs -I @ sh -c 'touch @.fa;touch @.vcf'
  702  ls

$ !699
echo -e "sample_1\nsample_2\nsample_3" > samples.txt

Using history

# rerun the last command
$ !!
# can be useful with other commands such as `sudo` or `time`
$ time !!
echo -e "sample_1\nsample_2\nsample_3" > samples.txt

# reuse the last argument passed to the previous command
$ ls -l !$
ls -l samples.txt

Rather than copy and paste previous commands and arguments of interest, we can use 'Word Designators'

Using history

Reverse search with control-r

Searches backwards through history - as you type! Repeat to cycle through options.

(reverse-i-search)`f': cd dnadiff_reports/
(reverse-i-search)`fi': bsub.py 2 "$log" "$script" "$assembly" "$query" "$prefix"
(reverse-i-search)`fin': mv final.vcf pandora_genotyped.vcf
(reverse-i-search)`find': find . -name '*.fa' -or -name '*.vcf' | xargs -I § rm §
(reverse-i-search)`find': find . -name '*.fa' -or -name '*.vcf'

Singularity

A very quick introduction to containers 📦📦📦 

Singularity

Bootstrap: shub
From: mbhall88/Singularity_recipes:template

%environment
  export LC_ALL=C.UTF-8
  export LANG=C.UTF-8

%post
  apt update
  apt install -y software-properties-common
  apt-add-repository universe
  apt update
  apt install -y cmatrix cowsay fortune lolcat
  echo "export PATH=/usr/games:$PATH" >> $SINGULARITY_ENVIRONMENT
Singularity.fun

Singularity

sudo singularity build fun.simg Singularity.fun
singularity exec fun.simg cmatrix
singularity exec fun.simg fortune
singularity exec fun.simg sh -c "fortune | cowsay"
singularity exec fun.simg lolcat Singularity.fun

Singularity

Bootstrap: shub
From: mbhall88/Singularity_recipes:template

%environment
    export LC_ALL=C.UTF-8
    export LANG=C.UTF-8

%post
    apt update
    apt install -y software-properties-common
    apt-add-repository universe
    apt update
    apt install -y wget build-essential

    # ================================
    # INSTALL samtools
    # ================================
    VERSION="1.9"
    URL=https://github.com/samtools/samtools/releases/download/${VERSION}/samtools-${VERSION}.tar.bz2
    apt install -y libncurses5-dev \
        libbz2-dev \
        liblzma-dev \
        zlib1g-dev
    wget "$URL" -O - | tar -jxf -
    cd samtools*
    ./configure --prefix=/usr/local
    make
    make install
    make test
    echo "export PATH=/usr/local/bin:$PATH" >> $SINGULARITY_ENVIRONMENT

Singularity

wget https://raw.githubusercontent.com/mbhall88/Singularity_recipes/master/recipes/Singularity.samtools
wget https://raw.githubusercontent.com/samtools/samtools/develop/examples/toy.sam
sudo singularity build samtools.simg Singularity.samtools
singularity exec samtools.simg samtools flagstat toy.sam

Exercise

Make your own container

Using Bash more efficiently

By Michael Hall

Using Bash more efficiently

  • 397