unix & BASH

by Nathan LeClaire


Rob Pike and Brian Kernighan (credit: http://farm1.staticflickr.com/86/282176411_7928541066_z.jpg)

First off, why should i care?

You touch UNIX every day.




It's one of the most wildly successful operating systems of all time.  Developed at Bell Labs around 1970, it continues to power countless numbers of servers and desktops around the world.

It runs on the servers you interact with

we use it here at work all the time (and so does everyone else)




(technically Linux is a "UNIX-like" operating system, but arguably UNIX provided the foundation for everything that Linux is built on and continues to be)

It even powers shiny macbooks


Which can levitate.  With the power of UNIX.

What is UNIX? (in a narrow sense)

What is 'UNIX'?  In the narrowest sense, it is a time-sharing operating system kernel: a program that controls the resources of a computer and allocates them among its users.


Users can:

  • Run their programs
  • Control peripheral devices (discs, terminals, printers, etc.)
  • Interact with a file system that manages the long-term storage of information such as programs, data, and documents



What is unix? (in a broader sense)


In a broader sense, UNIX is often taken to include not only the kernel, but also essential programs like compilers, editors, command languages, programs for copying and printing files, and so on . . . UNIX may even include programs developed by you or other users to be run on your system . . .

Back up a second - what's a kernel?


(http://upload.wikimedia.org/wikipedia/commons/8/8f/Kernel_Layout.svg)

A kernel is a very low-level abstraction for the hardware which the Operating System is running on.

Manages memory, program execution, and Input/Output so you don't have to.

So what?


the kernel is the main component of most computer operating systems; it is a bridge between applications and the actual data processing done at the hardware level. The kernel's responsibilities include managing the system's resources (the communication between hardware and software components)

- Wikipedia

http://en.wikipedia.org/wiki/Kernel_%28computing%29

HOW DO I USE THIS SO-CALLED "KERNEL"?


Slow down there, Sparky.


It's rare that you will want to talk to the kernel directly.


You probably want something else.  A higher level of abstraction, perhaps.


Something a bit more interactive.


Something you can type commands into, and see the output from those commands.

You want a shell


A shell is the program that interprets your requests to run programs.  It manifests in the form of what most of us know as the "command line" or "terminal prompt". 


By typing commands, you can see the output of running those programs, and interact with your computer.  Let's look at some examples.


Examples

  The simplest shell command is simply a word:

$ who                                           
you        tty2        Sep 28 07:51
jpl        tty4 Sep 28 08:32


$ date; who
Wed Sep 28 09:07:15 EDT 1983
you        tty2        Sep 28 07:51
jpl        tty4 Sep 28 08:32

$ whoami

nathanl


$ ls
autojump Desktop Documents Downloads go gofun hello.go Music

benefits of the shell


  • Filename shorthands: pick a set of files to operate on with a program based on a pattern or regular expression
  • Input-output redirection: arrange for the output of a program to go into a file instead of onto the terminal, or from the input to come from a file instead of the terminal.  Or connect the output of one program to the input of another.
  • Customizability.  Define your own commands and shorthands.

Wait, what?

Let's learn by example.  You can use the cat command to print the contents of a file to the screen.  It's shorthand for "concatenate", because you can print out the contents of multiple files this way, and cat sticks them all together.

$ cat hello_world.c
#include <stdio.h>

int main() {
printf("hello, world!");
return 0;
}
$ cat hello_world.c goodbye_world.c
#include <stdio.h>

int main() {
printf("hello, world!");
return 0;
}
#include <stdio.h>

int main() {
printf("goodbye, world!");
return 0;
}
$

Wildcard


You can use the splat (*), otherwise known as the wildcard character, to signify "any sequence of any character", in telling the shell which files to operate on.  This is a very powerful tool, and must be used wisely.

Usage

Be careful about using this character with rm, the command to remove files!

$ cat *.c
#include <stdio.h>

int main() {
printf("hello, world!");
return 0;
}
#include <stdio.h>

int main() {
printf("goodbye, world!");
return 0;
}
$ rm *
$ echo "Whoops, I just deleted my project"

Whoops, I just deleted my project
$

The filesystem

Speaking of files, let's talk a bit about UNIX's filesystem model.

Files are arranged in a directory hierarchy that starts at /.



Getting Around

You can see which directory you are currently in with the pwd command.  

$ pwd
/home/nathanl

The ls command will present you with a list of the files which reside in the directory you specify (by default, the current directory). 

$ ls
autojump Desktop Documents Downloads go gofun hello.go Music

If you pass in arguments to UNIX commands that start with a dash ("-"), they represent optional "flags" that change the operation of the command slightly.

$ ls -a
. .. autojump .bashrc Dekstop Documents Downloads go gofun hello.go Music .vimrc

Getting around (continued)


The filenames . and .. have a special meaning in UNIX.  They mean "the current directory" and "the directory above this one", respesctively.  Filenames that start with a dot are "hidden".


You can use the cd command to navigate to any directory in your filesystem. 


POSSIBLE INTERVIEW QUESTION ALERT!

The tilde ~ character is an alias for your home directory (/home/yourusername), which is where you start out when you log in. 

manpages

To learn more about any UNIX command, just type 

$ man commandname

at the command prompt.  You will be greeted with a manual page with exhaustive information about the command.  Handy if you are without an Internet connection, or just need to review a few of the command line options.


And yes, you can type man man  .  So meta!

A cautionary tale (ROB pike)

Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. I'm not sure but I believe .. went in during the Version 2 rewrite, when the file system became hierarchical (it had a very different structure early on).  When one typed ls, however, these files appeared, so either Ken or Dennis added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:
   if (name[0] == '.') continue;
This statement was a little shorter than what it should have been, which is
   if (strcmp(name, ".") == 0 || strcmp(name, "..") == 0) continue;
but hey, it was easy.

A cautionary tale (continued)

First, a bad precedent was set. A lot of other lazy programmers introduced bugs by making the same simplification. Actual files beginning with periods are often skipped when they should be counted.

Second, and much worse, the idea of a "hidden" or "dot" file was created. As a consequence, more lazy programmers started dropping files into everyone's home directory . . .


. . .I'm pretty sure the concept of a hidden file was an unintended consequence. It was certainly a mistake.


don't just hack it in

How many bugs and wasted CPU cycles and instances of human frustration (not to mention bad design) have resulted from that one small shortcut about  40 years ago?

Keep that in mind next time you want to cut a corner in your code.

pipes


(and input-output redirection)

the unix way is to have modular components

"The Unix philosophy emphasizes building short, simple, clear, modular, and extendable code that can be easily maintained and repurposed by developers other than its creators."

- http://en.wikipedia.org/wiki/Unix_philosophy


As an example, let's say you have a list of hundreds of names in random order in a text file.  You need them sorted, a task which is maddening and prone to error when done by hand.


With UNIX, we can accomplish this easily.

USe i/o REDIRECTION

Unix provides us with pipes which allow us to chain the output of a command into another command.  To do so, you use the | operator.


You can direct the output of a command into a file instead of the terminal with the > operator, and read input from a file with the < operator (try visualizing the flow of data following the direction the arrow is pointing).


Also, the >> operator will append the output to the contents of a file instead of writing over them.  Let's see this stuff in action.

LET's PUT THINGS TOGETHER

$ cat names.txt
Bob
Suzy
Fred
Nathan
Anthony
Dignan
Sterling
$ cat names.txt | sort
Anothony
Bob
Dignan
Fred
Sterling
Suzy
$ cat names.txt | sort >sorted_names.txt
$ ls

names.txt sorted_names.txt

Grep it

There's a very useful command built in to almost every UNIX-based operating system called grep.


Grep is a "Global Regular Expression Pattern" matcher.


It will find the lines of a file which match a certain pattern.  This is insanely useful if you have to search for all instances of, say, FooBarWidgetFactory in your codebase.

$ cat debts.txt | wc -l
20000
$ cat debts.txt | grep "Johnny"
8/16 Johnny owes Maple $40
9/24 Nils owes Johnny $45
$ cat debts.txt | grep -n "Johnny"
2456:8/16 Johnny owes Maple $40
15689:9/24 Nils owes Johnny $45

enter the bash

bash stands for "Bourne Again Shell", and it's the shell that will most likely be running by default if you happen to find yourself at a terminal.  There are other shells, such as zsh (which is pretty awesome), that offer slightly different features, but bash is extremely dominant.


Handy things to know

bash uses emacs keyboard shortcuts out of the box.  As all good Starcraft players know, hotkeys are absolutely essential to operational efficiency, so learning the shortcuts will serve you very well as time goes on!


<C-a> : Move to the front of the prompt

<C-e> : Move to the end of the prompt

<M-f> : Move forward by a word

<M-b> : Move backward by a word

<C-k> : Kill (delete) everything on the prompt after the cursor

MORE HANDY THINGS TO KNOW

You can have a program run in daemon mode (in the background) by appending an asterisk to the end of the invocation.  This is very handy for, say, servers, which will block the prompt until you end execution of them with <C-c>.

$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...
^C
$ python -m SimpleHTTPServer &
$ echo "Yay, I can continue executing commands at this prompt"

Appending double asterisks will allow you to run any number of commands in sequence.

$ cd ~/awesomescripts/dir/node/jsboilerstrap && ./boilerstrap.js
Running JS Boilerstrap...

even more handy things to know

 !! will fill in as the last command used.

$ echo "I'm awesome"
I'm awesome
$ !!
I'm awesome
$ apt-get install quux
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
$ sudo !!

Note:  there is some controversy over the use of sudo !!.  Make your own decisions, but remember that with great power comes great responsibility.


Even more handy things to know?

You can use the history command to see a numerical list of what you have typed into the prompt.  Using grep on the output of this command is very useful, as you can invoke previously used commands by number (just put a ! in front of the number).

$ history
1802 apt-get install node
1803 sudo !!
1804 cd ~/some_long/directory/path/to/a/script && ./run_the_script.sh
1805 fortune
1806 grep -r FooBarWidgetFactory
$ !1804

1804 cd ~/some_long/directory/path/to/a/script && ./run_the_script.sh

sed

sed is a stream editor.  You can pipe in input and it will replace certain patterns with others, printing the result to the standard output on the terminal.  Many people now use Python or Perl for tasks of this nature, but sed is still an extremely useful tool, and it's available out of the box on every UNIX under the sun.


$ echo "all your base are belong to us" | sed 's/us/me/g'
all your base are belong to me


sed can also edit files in place without echoing to the standard output.

let's talk a tiny bit about scripting


Part of the true power of bash and one of the reasons why it's been so popular for so long is that it is highly scriptable.  It is extremely useful to be able to have a list of shell commands that you can run over and over at "the push of a button" to automate otherwise repetitive developer or QA tasks such as data setup.


bash also provides a variety of programming language constructs to extend and enhance this functionality.


Let's walk through the construction of a simple script to demonstrate this utility.

how do I know which song id i'm listening to on soundcloud?



On SoundCloud (a music sharing/listening website), each track can be identified by an eight digit number, which is utilized by their publicly accessible APIs.


I was curious to see if I could scrape the URL of a song posted to Soundcloud to quickly extract this ID for use with their API.


scripts start off with a shebang

Scripts, be they bash or otherwise, usually start off with a declaration for the shell to use to identify which language they are written in, and consequently how to understand their instructions.


This declaration starts with a "shebang" (pronounced shuh - bang), composed of the characters # (the she) and ! (the bang) and followed by a path which indicates which language interpreter to use.


bash scripts' usually look like:

#!/bin/bash

Then, you just tell bash a sequence of commands to run, as if you were typing them into the command line yourself.

The SoundCloud script ended up looking like this:

#!/bin/bash

URL_USE_HTTP=`echo $1 | sed 's/https/http/g'`
SONGID=`curl -s $URL_USE_HTTP | egrep -o 'data-sc-track="[0-9]+"' \
| head -n 1 \
| egrep -o '[0-9]+'` echo $SONGID

Variables in bash scripts are traditionally all caps, and assigned with the = operator.  You can interpolate their value into a string or a command by prefacing their name with $ , as you can see an example of above with $URL_USE_HTTP.

Conditionals and loops


You have access to most of the same styles of conditionals and loops in bash as you do in full-fledged programming languages.


#!/bin/bash

echo "This scripts checks the existence of the messages file."
echo "Checking..."
if [ -f /var/log/messages ]
  then
    echo "/var/log/messages exists."
fi
echo
echo "...done."

Running your script

To run your very own script, you need to set its permissions to executable.  This can be accomplished with

$ chmod +x my_script.sh

There are other ways to represent permissions that chmod understands as well.  For example, you can set them numer


Then to run your script:

$ ./my_script.sh

One last thing: Don't Copy paste from the interwebs to your terminal

See here: 

http://thejh.net/misc/website-terminal-copy-paste

The "bad guys" can and will try to trick you into running code / exploits which you will be helpless to defend against since you told the system it was okay!

ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"

You're at risk when using an installation procedure such as the above, and you should get into the habit of typing things from the Internet into the terminal by hand for security.



tHAT'S ALL FOLKS


Questions?


Made with Slides.com