Michael Hall
I am a Bioinformatics PhD student in Zam Iqbal’s lab at EMBL-EBI. I currently work on using nanopore data and genome graphs to better call variation in bacterial genomes and to compare pan-genomes.
bash more efficiently
$ tldr awk
# awk
A versatile programming language for working on files.
- Print the fifth column (a.k.a. field) in a space-separated file:
awk '{print $5}' filename
- Print the second column of the lines containing "something" in a space-separated file:
awk '/something/ {print $2}' filename
- Sum the values in the first column of a file and print the total:
awk '{s+=$1} END {print s}' filename
- Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
- Print every third line starting from the first line:
awk 'NR%3==1' filename$ tldr tar
# tar
Archiving utility.
Often combined with a compression method, such as gzip or bzip.
- Create an archive from files:
tar cf target.tar file1 file2 file3
- Create a gzipped archive:
tar czf target.tar.gz file1 file2 file3
- Extract an archive in a target folder:
tar xf source.tar -C folder
- Extract a gzipped archive in the current directory:
tar xzf source.tar.gz
- Extract a bzipped archive in the current directory:
tar xjf source.tar.bz2Online version and docs - https://tldr.sh/
Python client: pip3 install tldr
NPM client:
npm install -g tldr
Brevity is the soul of wit"
- William Shakespeare
$ tldr alias
alias
Creates aliases -- words that are replaced by a command string.
Aliases expire with the current shell session, unless they're defined in the shell's
configuration file, e.g. ~/.bashrc.
- Create a generic alias:
alias word="command"
- View the command associated to a given alias:
alias word
- Remove an aliased command:
unalias word
- List all aliased words:
alias -p
- Turn rm into an interactive command:
alias rm="rm -i"
- Create la as a shortcut for ls -a:
alias la="ls -a"
# ALIASES
alias ..="cd .." # go up one directory
alias ...="cd ../.." # go up two directories
alias ....="cd ../../.." # go up three directories
alias .....="cd ../../../.." # you get the idea
alias c="clear" # clear the screen
alias ls="ls --color=auto " # colourise the output of ls
alias la="ls -a " # show all files
alias ll="ls -la " # use long list format
alias bc="bc -l" # start calculator with maths support
alias df="df -H " # human readable df by default
alias gst="git status"
alias untar="tar -zxvf " # very common option combo
alias hs="history | grep " # will be useful later# make directory and change into it.
mkcd() {
mkdir -p "$1" && cd "$1";
}
# count number of sequences in fasta/fastq
cs () {
if [ -f $1 ] ; then
case $1 in
*.fastq) wc -l "$1" | awk '{print $1/4}' ;;
*.fq) wc -l "$1" | awk '{print $1/4}' ;;
*.fastq.gz) zcat "$1" | wc -l | awk '{print $1/4}' ;;
*.fq.gz) zcat "$1" | wc -l | awk '{print $1/4}' ;;
*.fasta) grep -c "^>" "$1" ;;
*.fa) grep -c "^>" "$1" ;;
*.fasta.gz) zcat "$1" | grep -c "^>" ;;
*.fa.gz) zcat "$1" | grep -c "^>" ;;
*) echo "don't know how to count sequences in '$1'..." ;;
esac
else
echo "'$1' is not a valid file"
fi
}
for loopsfor value in {1..5}
do
echo "$value"
donefor value in {1..5}
do
touch sample_"$value".fa
donefor fname in $(ls *.fa)
do
echo "$fname"
donerealpath
$ realpath sample_1.fa
full/path/to/sample_1.faMac users:
# install via homebrew
brew install coreutils
# or from github repo
git clone https://github.com/harto/realpath-osx.git
cd realpath-osx
make
sudo make installecho the absolute path for all fasta files in directory.echo the absolute path for all fasta files in directory.for fname in $(ls *.fa)
do
echo $(realpath $fname)
done
# or
for fname in $(ls *.fa)
do
fullname=$(realpath $fname)
echo "$fullname"
donefilename=$(realpath sample_3.fa)
echo "$filename"
base=$(basename -- "$filename")
echo "$base"
dir=$(dirname -- "$filename")
echo "$dir"
Note: -- handles an edge case where the filename begins with a '-' and is interpreted as a program option.
filename=$(realpath sample_3.fa)
echo "$filename"
# base=$(basename -- "$filename")
base="${filename##*/}"
echo "$base"
# dir=$(dirname -- "$filename")
dir="${filename%/*}"
echo "$dir"
For more on parameter substitution in bash see https://www.tldp.org/LDP/abs/html/parameter-substitution.html
${var##Pattern} Remove from $var the longest part of $Pattern that matches the front end of $var.
${var%Pattern} Remove from $var the shortest part of $Pattern that matches the back end of $var.
filename=$(realpath sample_3.fa)
echo "$filename"
extension="${filename##*.}"
echo "$extension"
for fname in $(ls *.fa)
do
# Remove extension from path
base=${fname%.*}
touch "$base".vcf
done
lsfind and xargs
mkdir deleteme
touch deleteme/ex.fa
find . -name '*.fa'
# limit the search depth
find . -maxdepth 1 -name '*.fa' find and xargs
find . -name '*.fa' | xargs -I @ realpath @xargs takes input from pipe and executes the specified command on each element (filepath in this case)
find and xargs
find . -name '*.fa' | \
xargs -I @ sh -c 'p=$(realpath @);touch ${p%.*}.vcf'xargs takes input from pipe and executes the specified command on each element (filepath in this case)
find . -name '*.fa' -or -name '*.vcf' | xargs -I § rm §# create text file with one sample name per line
echo -e "sample_1\nsample_2\nsample_3" > samples.txt
# create fasta and vcf files with sample names
cat samples.txt | \
xargs -I @ sh -c 'touch @.fa;touch @.vcf'Those who do not remember the past are condemned to repeat it"
George Santayana
historyhistory | grep find
# we also created an alias `hs`Rather than copy and paste a line of interest, we can use 'Event Designators'
697 find . -name '*.{fa,vcf}'
698 # create text file with one sample name per line
699 echo -e "sample_1\nsample_2\nsample_3" > samples.txt
700 # create fasta and vcf files with sample names
701 cat samples.txt | xargs -I @ sh -c 'touch @.fa;touch @.vcf'
702 ls
$ !699
echo -e "sample_1\nsample_2\nsample_3" > samples.txt# rerun the last command
$ !!
# can be useful with other commands such as `sudo` or `time`
$ time !!
echo -e "sample_1\nsample_2\nsample_3" > samples.txt
# reuse the last argument passed to the previous command
$ ls -l !$
ls -l samples.txtRather than copy and paste previous commands and arguments of interest, we can use 'Word Designators'
control-r
Searches backwards through history - as you type! Repeat to cycle through options.
(reverse-i-search)`f': cd dnadiff_reports/
(reverse-i-search)`fi': bsub.py 2 "$log" "$script" "$assembly" "$query" "$prefix"
(reverse-i-search)`fin': mv final.vcf pandora_genotyped.vcf
(reverse-i-search)`find': find . -name '*.fa' -or -name '*.vcf' | xargs -I § rm §
(reverse-i-search)`find': find . -name '*.fa' -or -name '*.vcf'
Bootstrap: shub
From: mbhall88/Singularity_recipes:template
%environment
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
%post
apt update
apt install -y software-properties-common
apt-add-repository universe
apt update
apt install -y cmatrix cowsay fortune lolcat
echo "export PATH=/usr/games:$PATH" >> $SINGULARITY_ENVIRONMENTSingularity.fun
sudo singularity build fun.simg Singularity.funsingularity exec fun.simg cmatrixsingularity exec fun.simg fortunesingularity exec fun.simg sh -c "fortune | cowsay"singularity exec fun.simg lolcat Singularity.funBootstrap: shub
From: mbhall88/Singularity_recipes:template
%environment
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
%post
apt update
apt install -y software-properties-common
apt-add-repository universe
apt update
apt install -y wget build-essential
# ================================
# INSTALL samtools
# ================================
VERSION="1.9"
URL=https://github.com/samtools/samtools/releases/download/${VERSION}/samtools-${VERSION}.tar.bz2
apt install -y libncurses5-dev \
libbz2-dev \
liblzma-dev \
zlib1g-dev
wget "$URL" -O - | tar -jxf -
cd samtools*
./configure --prefix=/usr/local
make
make install
make test
echo "export PATH=/usr/local/bin:$PATH" >> $SINGULARITY_ENVIRONMENT
wget https://raw.githubusercontent.com/mbhall88/Singularity_recipes/master/recipes/Singularity.samtools
wget https://raw.githubusercontent.com/samtools/samtools/develop/examples/toy.sam
sudo singularity build samtools.simg Singularity.samtools
singularity exec samtools.simg samtools flagstat toy.sam
By Michael Hall
I am a Bioinformatics PhD student in Zam Iqbal’s lab at EMBL-EBI. I currently work on using nanopore data and genome graphs to better call variation in bacterial genomes and to compare pan-genomes.