Command-line tools

Preamble

Shell, terminal and console

Every Operating System (Linux, MacOS, Windows, …) comes with a program able to interpret and run command lines.

  • The shell is the program that processes commands and returns output, e.g., Bash, zsh, etc…

  • A terminal refers to a wrapper program that runs a shell.

  • The console is a special sort of terminal (low-level).

Reference: super user: shell, console and terminal

Prompt

As you already know, the bash prompt is a $ sign when you are a standard user. When you are an administrator (often called root user) the prompt is a #.

EXERCISE: recognize different prompts

Find the different prompts for R, Python and the terminal on Windows

The Unix directory structure

Reference: The Linux Directory Structure, Explained, by Chris Hoffman.

Some aliases:

  • ~ is an alias to your home directory.
  • . is an alias to the current directory.
  • .. is an alias to the parent directory.

For instance,

$ cd ~
$ pwd
$ cd ../../home/../etc/../home/
$ pwd
EXERCISE:
  1. What is the difference between cd ./toto/tata, cd toto/tata, cd ~/toto/tata and cd /toto/tata
  2. Use the which command to determine which instance of Python is used when you use the python command. Same question with python2.

Getting help

To get some help on a command, please use the man command. You may also use the --help option as in

$ man ls
$ ls --help

Paging programs

A paging program displays, one windowful at a time, the contents of a file on a terminal. It pauses after each windowful and prints on the window status line, the screen, the file name, the current line number, and the percentage of the file so far displayed. This is not an editor (no modification of the file can be done).

more (deprecated) less (best choice) most (default on your machine, more features than less, but bad keybindings).

$ man less
$ man most

Useful tips:

  • To search for a word type s. To go to the next (resp. previous) occurrence type n (resp. N).

  • [less only] to go down type j, to go up type k.

  • To go to the beginning of a file, type g, to the end G.

  • To quit type q.

  • to change the default paging program to less.

    $ export MANPAGER=less

Reference: What are the differences between most, more and less? on StackExchange.

Pattern matching (part I): Pathname expansion (a.k.a. globbing)

It is often very useful to select some files whose filename contains (or not!) a specific pattern. Shells (bash, zsh, etc.) come with a “pattern matching” syntax allowing us to express such constraints on the filenames.

This syntax is commonly called globs and is quite simple (more advanced syntaxes called regexp will be introduced later on). globs are shell commands and can be transmitted to various programs (ls, grep, find, etc…). For instance to display all the files with an extension in .txt in the current directory:

$ ls *.txt

Most shells have similar glob rules, and they usually consist of:

  • A marker for zero-or-more characters: *
  • A marker for exactly one character: ?
  • A way to express one of a certain set of characters: [...]
  • A way to express a choice of one or more strings: {...,...}
  • A way to escape any of the above special characters: \
EXERCISE: listing
  1. Go to /usr/lib/R/bin/ and list every file starting with the letter R and containing i
  2. Go to /usr/lib/R/bin/ and list every file containing the letter c, then any character, and then an n (e.g. config or javareconf)
  3. Go to /var/log/ and list every file with a double extension: the former one is a dot followed by a number, and the last one is .log (e.g. Xorg.3.log or Xorg.0.log)
  4. Got to /var/log/ and list every file with a name starting with an a and containing at least a digit

Listing files

To list the files in a folder use the command ls.

EXERCISE: ls options
  1. Describe the option -a.
  2. Describe the option -R.
  3. Describe the option -lh.
  4. List all the files in the directory /usr/lib/ without cd in it.

The file command can be used to display the information of a file (if not given by the extension itself).

EXERCISE: ls on a directory
  1. List all the files in the directory /usr/lib/R/bin and sort them by size.
  2. Display the type information of the files in /var/log/ with one call to file.

Users

To list the groups you belong to, in a terminal use the command

$ groups

To list the connected user on your machine

$ w
$ who

File permissions

Each file has an owner (a user) and a group (a group of users). To change the user that owns use chown and to change the group use chgrp. There are 3 types of permissions:

  1. read r
  2. write w
  3. execute x

There are three permissions triads

  1. First triad: what the user can do (the letter u)
  2. Second triad: what the group members can do (the letter g)
  3. Third triad: what other users can do (the letter o)

Each triad

  1. Tirst character r: readable
  2. Second character w: writable
  3. Third character x: executable

To change the permissions of a file, use the chmod. For instance, to add execution x right to the owner u:

$ chmod u+x toto.txt
EXERCISE: permissions
  1. Create an empty file called foo.py in the current directory
  2. Display its owner, group and permissions
  3. Change the group of foo.py to pulse
  4. Add read and write permissions to users in the group pulse

Reference: File system permission on Wikipedia. See also chown and chgrp.

Environment variables

An environment variable (in short env or envs) is a dynamic-named value that can affect the way running processes will behave on a computer. Many options of bash may be changed with the command envs. To print all the defined envs:

$ printenv

To display a single variable, you may use the prefix $. For instance, to display the content of PATH

$ echo ${PATH}

To set a new variable (in bash)

$ export ENV_NAME=toto:tata

Lists are often separated by :. To append a new value at the end

$ export ENV_NAME=${ENV_NAME}:tutu
$ echo ${ENV_NAME}

Reference: How To Read and Set Environmental and Shell Variables on Linux by Justin Ellingwood.

EXERCISE: path
  1. Display the PATH env
  2. Is the order of the list important?

Useful tips: To avoid setting up an env every time you open a terminal, you can append the export MYENV=xxxxx command to the ~/.bashrc file.

Text editor

In bash, many configuration files are in fact text files. You may need to choose a text editor to modify them. Very powerful (and thus complicated) text editors exist: emacs, vim, but we will focus on nano (gedit is another alternative):

$ nano

or joe (default on your system).

EXERCISE: default editor
  1. Set nano as your default text editor

Useful unix commands

  • List files and get information: ls, file, find
  • Display text content: echo, cat, head, tail, grep, fgrep, rgrep
  • File handling: touch, mv, cp, rsync, rename
  • Unix admin: which, who, top, htop, kill, pkill, killall

System

Getting system information

To display the system information

$ uname -a

To show the system hostname you may use hostname command.

To show information about your processor use lscpu and to list the devices connected to your machine use lspci.

EXERCISE: hardware info
  1. Determine how many physical cores you have on your machine.
  2. Determine the vendor of the network card of your machine.

Process

Here we learn how to use ps, top, htop, kill, pkill, etc.

Reference: Tutorials Point: Unix/Linux - Process management

EXERCISE: Linux shortcuts
  1. Describe the effect of Ctrl+C in a terminal
  2. Describe the effect of Ctrl+Z in a terminal
  3. Describe the effect of Ctrl+D in a terminal

Display text content

Get the data

The dataset we are going to use is a modified version of the dataset available on the data.gouv.fr platform. We will focus on bicycle accidents in France between 2011 and 2018.

EXERCISE: downloading using command lines
  1. Create a folder data_bicycle and cd to it.
  2. Download the compressed database (this is a .xz file) available at the following URL: as bicycle_db.csv.xz (use the option -O of wget).
  3. Uncompress the .xz file using the xz command (use documentation!)

Text commands: tail, head, cat, wc and split

Please read the manual of tail, head, cat, wc and split

EXERCISE: data analysis in bash
  1. Use the word count wc command to display the number of lines of bicycle_db.csv
  2. Display the 53 first line with the head command. Same with the 30 last lines (see tail)
  3. Use the split command and its options -d -l and --additional-suffix to create files with a maximum number of lines of 10000 (e.g., : if the number of lines is 55379, you should get only 6 files with names bike00.csv, …, bike05.csv)

The grep command

grep prints lines of a file matching a pattern (regex).

$ man grep
EXERCISE: grep
  1. Count the number of accidents in 2015 using the command grep (hint: remarks that each line starts with the string "YYYY where YYYY is the year)
  2. Display the line number of the accident occurring on a Wednesday, in October 2017 using a regular expression.

The find command

The find command searches for files in a directory hierarchy. Read the manual. For instance, the following command lists all the files in /usr/lib/ containing the qt5 string in its name:

$ find /usr/lib/ -name "*qt5*" -type f
EXERCISE: permission on files
  1. What is the aim of the -exec option?
  2. Change the permissions of any file with extensions .csv in your home to 777

Reference: TecMint, 35 Practical Examples of Linux Find Command

Pipes and redirections

stream image

The I/O of any program launch through the bash is organized in three data streams:

  • STDIN (0): standard input (input)
  • STDOUT (1): standard output (data output by the command and printed in the terminal)
  • STDERR (2): standard error (reserved for error messages, also printed in the terminal)

Piping and redirection is the process used to connect these streams between programs and files.

Reference: Piping and Redirection! by Ryan Chadwick.

Pipes

In bash, the pipe operator is denoted |. It allows one to compose (mathematically) the output of a program as an input of another one. For instance to display the 10 largest files given by du (disk use)

$ du | sort -nr | head

or display it in a pager

$ du | sort -nr | less
EXERCISE: head and tail
  1. Display the last 15 accidents occurring with Vent fort condition
  2. Display the lines with type of crossing being Intersection en X or Intersection en T of the accident occurring in 2015.

Redirection

The operator > redirects the stdout of a command (LHS) into a file (RHS). Warning! it erases the file content. The operator >> appends the output of the LHS to a file.

$ ls /etc > toto.txt
$ cat toto.txt
$ wc -l toto.txt >> toto.txt
$ cat toto.txt

Finally, the operator < reads from the file (RHS) and sends the content to stdin (LHS)

$ wc -l < toto.txt
EXERCISE: CSV creation
  1. Create a single file bike2016.csv containing all the accidents that occurred in 2016.
  2. Append the accidents from 2017 to the previous file and then rename it bike2016_17.csv.

The xargs command

A Unix killer feature! xargs reads items from the standard input and executes a command given by the user on each component of this list. For instance, this command

echo 'one two three' | xargs mkdir

creates 3 folders named one, two and three. Caveat: If the list items contain spaces or newline characters, it may behave badly with the xargs command. There is a special option -0 or -d to help the end user deal with this.

The most common usage of xargs is to use it with the find command. This uses find to search for files or directories and then uses xargs to operate on the results. Typical examples of this are changing the ownership of files or moving files.

find and xargs can be used together to operate on files that match certain attributes. In the following example files older than two weeks in the temp folder are found and then piped to the xargs command which runs the rm command on each file and removes them.

find /tmp -mtime +14 | xargs rm

Reference: Examples with xargs

Pattern matching (part II): Regexp

A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that define a search pattern. Many languages implement such syntaxes (beware, there may be some differences!). Some of the most common regular expressions (shared by almost all implementations) are

  • \ escape character
  • ^ start of a line
  • . any single character
  • $ end of line
  • x* zero or more occurrence of character x
  • x+ one or more occurrences of character x
  • x? zero or one occurrence of character x
  • x{n} exactly n occurrence of character x
  • [...] range of characters (e.g. [a-z], [A-Z], [a-zA-Z], [0-9], etc…)
  • [^...] forbidden characters range
  • (...) marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, ). A marked subexpression is also called a block or capturing group. …

For instance, to capture all the words starting with a capital letter in a text, you may use the regexp:

([A-Z][a-zA-Z0-9_]*)+

References:

  • regexr.
  • Wikipedia on regular expressions.
  • The sed, awk programs and the perl language documentations.
EXERCISE: regexp
  1. Go to https://regex101.com/ and copy/paste the following list (in the TEST STRING frame):
'01 !!!!!!!.flac'
'02 bad guy.flac'
'03 xanny.flac'
'04 you should see me in a crown.flac'
'05 all the good girls go to hell.flac'
'06 wish you were gay.flac'
"07 when the party's over.flac"
'08 8.flac'
'09 my strange addiction.flac'
'10 bury a friend.flac'
'11 ilomilo.flac'
'12 listen before i go.flac'
'13 i love you.flac'
'14 goodbye.flac'
  1. Why the name of the 7th song is double-quoted (” instead of ’)?

  2. Capture with a regexp all the song names (between the track number and the extension). You should get this in the MATCH INFORMATION frame on the right:

result of regexp capture

Secure Shell

Back to top