Contents

Lecture/Practice 2 - Bash

Introduction to the (unix) command line: bash

Peer Herholz (he/him)
Postdoctoral researcher - NeuroDataScience lab at MNI/McGill, UNIQUE
Member - BIDS, ReproNim, Brainhack, Neuromod, OHBM SEA-SIG logo

26/04/2021

logo logo   @peerherholz

Before we get started 1…


  • most of what you’ll see within this lecture was prepared by Ross Markello and further adapted by Peer Herholz

  • based on the Software Carpentries “Introduction to the Shell” under CC-BY 4.0

Before we get started 2…

We’re going to be working with a dataset from https://swcarpentry.github.io/shell-novice/data/data-shell.zip.

Download that file and unzip it on your Desktop!

(The link will be poste so you can just click on it.)

Goals

  • learn basic and efficient usage of the shell for various tasks

    • navigating directories

    • file handling: copy, paste, create, delete

What is the “shell”?

  • The shell is a command-line interface (CLI) to your computer

    • This is in contrast to the graphical user interfaces (GUIs) that you normally use!

  • The shell is also a scripting language that can be used to automate repetitive tasks

But what’s this “bash shell”?

It’s one of many available shells!

  • sh - Bourne SHell

  • ksh - Korn SHell

  • dash - Debian Almquist SHell

  • csh - C SHell

  • tcsh - TENEX C SHell

  • zsh - Z SHell

  • bash - Bourne Again SHell <– We’ll focus on this one!

WHY so many?

  • They all have different strengths / weaknesses

  • You will see many of them throughout much of neuroimaging software, too!

    • sh is most frequently used in FSL

    • csh/tcsh is very common in FreeSurfer and AFNI

So we’re going to focus on the bash shell?

Yes! It’s perhaps the most common shell, available on almost every OS:

  • It’s the default shell on most Linux systems

  • It’s the default shell in the Windows Subsytem for Linux (WSL)

  • It’s the default shell on Mac <=10.14

    • zsh is the new default on Mac Catalina (for licensing reasons 🙄)

    • But bash is still available!!

Alright, but why use the shell at all?

Isn’t the GUI good enough?

  • Yes, but the shell is very powerful

  • Sequences of shell commands can be strung together to quickly and reproducibly make powerful pipelines

  • Also, you need to use the shell to accesss remote machine / high-performance computing environments (like Compute Canada)

NOTE: We will not be able to cover all (or even most) aspects of the shell today.

But, we’ll get through some basics that you can build on going forward.

The (bash) shell

Now, let’s open up your terminal!

  • Windows: Open the Ubuntu application

  • Mac/Linux: Open the Terminal

When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input:

$

The shell typically uses $ as the prompt, but may use a different symbol.

IMPORTANT: When typing commands, either in this lesson or from other sources, do not type the prompt, only the commands that follow it!

Am I using bash?

Let’s check! You can use the following command to determine what shell you’re using:

echo $SHELL

If that doesn’t say something like /bin/bash, then simply type bash, press Enter, and try running the command again.

  • then simply type bash, press Enter, and try running the command again

  • there might be other ways depending on your OS/installation, please let us know

Note: The echo command does exactly what its name implies: it simply echoes whatever we provide it to the screen!

(It’s like print in Python / R or disp in MATLAB or printf in C or …)

What’s with the $SHELL?

  • Things prefixed with $ in bash are (mostly) environmental variables

    • All programming languages have variables!

  • We can assign variables in bash but when we want to reference them we need to add the $ prefix

  • We’ll dig into this a bit more later, but by default our shell comes with some preset variables

    • $SHELL is one of them!

Soooo, let’s try our ~first~ second command in bash!

This command lists the contents of our current directory:

ls

What happens if we make a typo? Or if the program we want isn’t installed on our computer?

Will the computer magically understand what we were trying to do?

ks

Nope! But you will get a (moderately) helpful error message 😁

The cons of the CLI

  • You need to know the names of the commands you want to run!

  • Sometimes, commands are not immediately obvious

    • E.g., why ls over list_contents?

Key Points

  • A shell is a program whose primary purpose is to accept commands and run programs

  • The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access remote machines

  • The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be

Working with Files and Directories

How do we actually make new files and directories from the command line?

First, let’s remind ourselves of where we are:

cd ~/Desktop/data-shell
pwd
ls -F

Creating a directory

We can create new directories with the mkdir (make directory) command:

mkdir thesis

Since we provided a relative path, we can expect that to have been created in our current working directory:

ls -F

(You could have also opened up the file explorer and made a new folder that way, too!)

Good naming conventions

  1. Don’t use spaces

  2. Don’t begin the name with -

  3. Stick with letters, numbers, ., -, and _

    • That is, avoid other special characters like ~!@#$%^&*()

Creating a text file

Let’s navigate into our (empty) thesis directory and create a new file:

cd thesis

We can make a file via the following command:

touch draft.txt

touch creates an empty file. We can see that with ls -l:

ls -l

Moving files and directories

Let’s start by going back to the data-shell directory:

cd ~/Desktop/data-shell

We now have a thesis/draft.txt file, which isn’t very informatively named. Let’s move it:

mv thesis/draft.txt thesis/quotes.txt

The first argument of mv is the file we’re moving, and the last argument is where we want it to go!

Let’s make sure that worked:

ls thesis

Note: we can provide more than two arguments to mv, as long as the final argument is a directory! That would mean “move all these things into this directory”.

Also note: mv is quite dangerous, because it will silently overwrite files if the destination already exists! Refer to the -i flag for “interactive” moving (with warnings!).

More on mv

Note that we use mv to change files to a different directory (rather than just re-naming):

mv thesis/quotes.txt .

The . means “the current directory”, so we should have moved quotes.txt out of the thesis directory into our current directory.

Let’s check that worked as expected:

ls thesis
ls quotes.txt

(Note: providing a filename to ls instead of a directory will list only that filename if it exists. Otherwise, it will throw an error.)

Exercise: Moving files to a new folder

After running the following commands, Jamie realizes that she put the files sucrose.dat and maltose.dat into the wrong folder. The files should have been placed in the raw folder.

$ ls -F
 analyzed/ raw/
$ ls -F analyzed
fructose.dat glucose.dat maltose.dat sucrose.dat
$ cd analyzed

Fill in the blanks to move these files to the raw/ folder (i.e. the one she forgot to put them in):

$ mv sucrose.dat maltose.dat ____/____
mv sucrose.dat maltose.dat ../raw

Remember, the .. refers to the parent directory (i.e., one above the current directory)

Copying files and directories

The cp (copy) command is like mv, but copies instead of moving!

cp quotes.txt thesis/quotations.txt
ls quotes.txt thesis/quotations.txt

We can use the -r (recursive) flag to copy a directory and all its contents:

cp -r thesis thesis_backup
ls thesis thesis_backup

Exercise: Renaming files

Suppose that you created a plain-text file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt

After creating and saving this file you realize you misspelled the filename! You want to correct the mistake and remove the incorrectly named file. Which of the following commands could you use to do so?

  1. cp statstics.txt statistics.txt

  2. mv statstics.txt statistics.txt

  3. mv statstics.txt .

  4. cp statstics.txt .

  1. No: this would create a file with the correct name but would not remove the incorrectly named file

  1. Yes: this would rename the file!

  1. No, the . indicates where to move the file but does not provide a new name.

  1. No, the . indicates where to copy the file but does not provide a new name.

Moving and Copying

What is the output of the closing ls command in the aequence shown below:

$ pwd
/Users/jamie/data
$ ls
proteins.dat
$ mkdir recombine
$ mv proteins.dat recombine
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls
  1. proteins-saved.dat recombine

  2. recombine

  3. proteins.dat recombine

  4. proteins-saved.dat

  1. No: proteins-saved.dat is located at /Users/jamie

  1. Yes!

  1. No: proteins.dat is located at /Users/jamie/data/recombine

  1. No, proteins-saved.dat is located at /Users/jamie

Removing files

Let’s go back to data-shell and remove the quotes.txt file we created:

cd ~/Desktop/data-shell
rm quotes.txt

The rm command deletes files. Let’s check that the file is gone:

ls quotes.txt

Deleting is FOREVER 💀💀

  • The shell DOES NOT HAVE A TRASH BIN.

  • You CANNOT recover files that have been deleted with rm

  • But, you can use the -i flag to do things a bit more safely!

    • This will prompt you to type Y or N before every file that is going to be deleted.

Removing directories

Let’s try and remove the thesis directory:

rm thesis

rm only works on files, by default, but we can tell it to recursively delete a directory and all its contents with the -r flag:

rm -r thesis

Because deleting is forever 💀💀, the rm -r command should be used with GREAT CAUTION.

Operations with multiple files and directories

Oftentimes you need to copy or move several files at once. You can do this by specifiying a list of filenames

Exercise: Copy with Multiple Filenames

(Work through these in the data-shell/data directory.)

In the example below, what does cp do when given several filenames and a directory name?

$ mkdir backup
$ cp amino-acids.txt animals.txt backup/

What does cp do when given three or more filenames?

$ ls
amino-acids.txt  animals.txt  backup/  elements/  morse.txt  pdb/  planets.txt  salmon.txt  sunspot.txt
$ cp amino-acids.txt animals.txt morse.txt
  1. When given multiple filenames followed by a directory all the files are copied into the directory.

  1. When give multiple filenames with no directory, cp throws an error:

cp: target morse.txt is not a directory

Using wildcards for accessing multiple files at once

* is a wildcard which matches zero or more characters.

Consider the data-shell/molecules directory:

ls molecules/*

This matches every file in the molecules directory.

ls molecules/*pdb

This matches every file in the molecules directory ending in .pdb.

ls molecules/p*.pdb

This matches all files in the molecules directory starting with p and ending with .pdb

Using wildcards for accessing multiple files at once (cont’d)

? is a wildcard matching exactly one character.

ls molecules/?ethane.pdb

This matches any file in molecules that has one character followed by ethane.pdb. Compare to:

ls molecules/*ethane.pdb

Which matches any file in molecules that ends in ethane.pdb.

Using wildcards for accessing multiple files at once (cont’d)

You can string wildcards together, too!

ls molecules/???ane.pdb

This matches and file in molecules that has any three characters and ends in ane.pdb

Wildcards are said to be “expanded” to create a list of matching files. This happens before running the relevant command. For example, the following command will fail:

ls molecules/*pdf

Exercise: List filenames matching a pattern

When run in the molecules directory, which ls command(s) will produce this output?

ethane.pdb methane.pdb

  1. ls *t*ane.pdb

  2. ls *t?ne.*

  3. ls *t??ne.pdb

  4. ls ethane.*

  1. No: This will give ethane.pdb methane.pdb octane.pdb pentane.pdb

  1. No: this will give octane.pdb pentane.pdb

  1. Yes!

  1. No: This only shows file starting with ethane

Key points

  • cp old new copies a file

  • mkdir path creates a new directory

  • mv old new moves (renames) a file or directory

  • rm path removes (deletes) a file

  • * matches zero or more characters in a filename, so *.txt matches all files ending in .txt

  • ? matches any single character in a filename, so ?.txt matches a.txt but not any.txt

  • The shell does not have a trash bin: once something is deleted, it’s really gone

Summary

  • The bash shell is very powerful!

  • It offers a command-line interface to your computer and file system

  • It makes it easy to operate on files quickly and efficiently (copying, renaming, etc.)

  • Sequences of shell commands can be strung together to quickly and reproducibly make powerful pipelines

Soapbox

  • Bash is fantastic and you will (likely) find yourself using it a lot!

  • However, for complex pipelines and programs we would strongly encourage you to use a “newer” programming lanuage

    • Like Python, which we will also be discussed in this workshop!

  • There are a number of reasons for this (e.g., better control flow, error handling, and debugging)

References

There are lots of excellent resources online for learning more about bash:

Finding Things

Oftentimes, our file system can be quite complex, with sub-directories inside sub-directories inside sub-directories.

What happens in we want to find one (or several) files, without having to type ls hundreds or thousands of times?

First, let’s navigate to the data-shell/writing directory:

cd ~/Desktop/data-shell/writing

The directory structure of data-shell/writing looks like:

Let’s get our bearings with ls:

ls

Unfortunately, this doesn’t list any of the files in the sub-directories. Enter find:

find .

Remember, . means “the current working directory”. Here, find provides us a full list of the entire directory structure!

Filtering find

We can add some helpful options to find to filter things a bit:

find . -type d

This will list only the directories underneath our current directory (incluing sub-directories).

Alternatively, we can list only the files with:

find . -type f

We can also match things by name:

find . -name *.txt

Why didn’t this also get the other files??

Remember: wildcards are expanded BEFORE being passed to the command. So, we really want:

find . -name "*.txt"

Executing with find

What if we want to perform some operation on the output of our find command? Say, list the file sizes for each file (as in ls -lh)?

We can do that with a bit of extra work:

find . -name "*.txt" -exec ls -lh {} \;

Note the very funky syntax:

  • The -exec option means execute the following command,

  • ls -lh is the command we want to execute,

  • {} signifies where the output of find should go so as to be provided to the command we’re executing, and

  • \; means “this is the end of command we want to execute”

We can also “pipe” the output of find to the ls -lh command as follows:

ls -lh $( find . -name "*.txt" )

Here, the $( ) syntax means “run this command first and insert it’s output here”, so ls -lh is provided the output of the find . -name "*.txt" command as arguments.