Linux Introduction
Contents
Shell Scripts
More Advanced Tools
Editing Files
What is Linux?
Linux is a computer operating system created by Linux Torvalds in 1991 that has become an essential part of modern computing all around the world. A key philosophy of Linux is that the source code is "open" meaning that software developers can add or change the software as needed. While the term Linux mainly refers to the operating system kernel that interacts with computer hardware, over time, thousands of other open source software packages have been developed to run on Linux. From cloud computing to cell phones, Linux and open source software have made possible many of the key technologies we depend on.
Linux has become important in research computing because it can run on a huge variety of computer hardware, and it provides a rich assortment of software development tools. This is particularly true in High Performance Computing clusters where essentially all clusters today run Linux.
Basic Commands and Concepts
Topics
The Shell
Print working directory [ pwd ]
Change directory [ cd ]
The Linux directory structure
Absolute versus Relative Paths
Parent directories and ..
File and directory names on LInux
Creating and removing directories [ mkdir, rmdir ]
Listing files and directories [ ls ]
Manual pages [ man ]
Long listings of files directories [ ls -l ]
File and directory permissions [ chmod ]
Copying files and directories [ cp ]
Using Rsync to copy files and directories [ rsync ]
Move or rename files and directories [ mv ]
Removing files [ rm ]
The Shell
The most direct way to interact with Linux is by typing commands in a shell. A shell is the program that interprets the commands you type and executes them. By default, users will automatically be running a shell called bash when they log in using SSH. When you first log in to the cluster head node using SSH, you will be given a shell session with a prompt similar to this:
[jedicker@nova ~]$
This prompt shows the name of the account that is logged in (jedicker), the name of the computer (nova), and the current work directory. In Linux, a folder is known as a directory, and the tilde ~ is shorthand for your home directory ). Commands are entered after the $.
pwd - Print working directory
When you first log in, your working directory location will be set to your home directory. In Linux, every user must have a home directory for storing personal settings files. The home directory can store regular data files as well, though we strongly recommend using your group's work directory under the /work path. On most Linux systems, a user's home directory is set to /home/<username>, where the <username> is your account name. To see what your current working directory is, use the pwd command:
$ pwd
/home/jedicker
Also note that file paths in Linux use forward slashes (/) to separate different parts of the path, while Windows uses a blackslash (\).
The Linux directory structure
This is a good time to talk about the general layout of directories on Linux. It's not too complicated. It can be helpful to think of the file system structure as an upside down tree with the base or root of the tree at the top, and the branches at the bottom. At the top of the structure is the "root" directory represented by the / character. Beneath the root directory you will usually find the following branches or sub-directory names:
/bin - In Linux, programs are often placed in a directory called bin. There are often multiple bin directory paths on a system including /usr/bin and /usr/local/bin.
/boot - This is where Linux stores its boot software.
/dev - This is a directory where "device files" used by the operating system kernel are located.
/etc - In Linux, most all of the system configuration files are stored under /etc.
/home - Home directories for all regular users are stored under /home/<username>
/lib - A lib directory usually stores software libraries that may be shared by one or more applications.
/opt - A common location for installing third-party software (often commercial applications).
/root - The home directory of the superuser account called "root". On Linux, the root account owns all of the operating system files and has the power to do just about anything on the system.
/usr - This is the path under which most Linux user software is installed, including /usr/bin and /usr/local/bin.
/work - On the ISU HPC clusters, the /work directory is where all research group work directories are located.
cd - Change directory
To change your working directory, use the cd command.
$ cd /work/ccresearch/jedicker
$ pwd
/work/ccresearch/jedicker
Absolute vs. Relative Paths
When specifying a path to a directory or file, you can use an absolute path, or relative path. The difference is that an absolute path always starts with a / while a relative path does not. Relative paths are always assumed to be relative to the working directory. Here's a simple example. Let's say I'd like to go to my /work/ccresearch/jedicker/project1 folder. Using an absolute path, I can do directly there:
$ cd /work/ccresearch/jedicker/project1
$ pwd
/work/ccresearch/jedicker/project1
But I can also do:
$ cd /work/ccresearch
$ pwd
/work/ccresearch
$ cd jedicker/project1
$ pwd
/work/ccresearch/jedicker/project1
In the command above, the jedicker/project1 directory which is underneath /work/ccresearch.
Parent Directories and ..
When you want to go to the parent directory of the current directory, you can use ..
to mean "the parent directory".
$ pwd
/work/ccresearch/jedicker/project1
$ cd ..
$ pwd
/work/ccresearch/jedicker
You can use the .. shorthand anywhere in a path to refer to the parent directory of a directory:
$ pwd
/work/ccresearch/jedicker/project1
$ cd ../../..
$ pwd
/work
$ cd /work/ccresearch/jedicker/project1/../..
$ pwd
/work/ccresearch
File and directory names on Linux
There are some important things to understand about the names of files and directories in Linux:
- Files and directory names are case sensitive. This means that upper- and lower-case letters matter:
$touch ABC.txt
$ls ABC.txt
ABC.txt
$ls abc.txt
ls: cannot access 'abc.txt': No such file or directory
The file abc.txt is NOT the same as ABC.txt.
- Avoid using spaces in names. In Linux, using spaces in file and directory names is generally frowned upon. Although you can create files and directories with spaces in their names, when you use these files on the command line you will often need to put quotes around the name of the file to avoid confusion. This can be tedious.
Creating and removing directories [ mkdir, rmdir ]
The mkdir command is used to create a directory. The rmdir command is used to remove a directory. The path to the directory can be a relative or absolute path.
mkdir
To create a directory called simulation1 in the current directory using a relative path:
$ mkdir simulation1
To create the directory using an absolute path:
$ mkdir /work/ccresearch/jedicker/simulation1
rmdir
The rmdir command can be used to remove a directory, though the directory must be empty:
$ rmdir /work/ccresearch/jedicker/simulation1
rm -r
The rm command with the -r (recursive) option can also be used to remove a directory. The command below will remove the simulation1 directory /work/ccresearch/jedicker:
$ rm -r /work/ccresearch/jedicker/simulation1
The above will remove simulation1 and any files/folders it contains.
Listing files and directories [ ls ]
The ls command is used to list files and directories. To list what's in your current working directory, just type
$ ls
This will list any files or directories in the current work directory except any files and directories whose name begins with a dot (.) Linuraditionally stored configuration settings for various applications inside files or directories that begin with a dot. By default, these "dot files" are not shown when you type ls.
To list all files, including those that begin with a dot, you can use the -a option:
$ ls -a
You can list the contents of any path:
$ ls /work/ccresearch/jedicker
You can also list files whose name matches a particular pattern. To demonstrate this, we'll first create an directory:
$ mkdir ~/testdir
$ cd ~/testdir
$ ls
Note that since the testdir directory has no files in it, the ls command above will have no output. Next, we will use the touch command to create a bunch of empty text files. The primary purpose of the touch command is to update the timestamps on a file. However, if the file doesn't exist, touch will create the desired file with nothing in it, a zero length file. So we can use touch to easily create several (empty) files:
$ touch apple.txt orange.doc banana.csv file1.txt file2.txt file3.txt
Now let's list the all the contents of testdir:
$ ls
apple.txt banana.csv file1.txt file2.txt file3.txt orange.doc
You can also use wildcard patterns in the file name. For instance, to list all files that begin with "file":
$ ls file*
file1.txt file2.txt file3.txt
List all files that end with .txt:
$ ls *.txt
apple.txt file1.txt file2.txt file3.txt
In the above commands, the asterisk ( * ) means "match one or more characters". Like many commands in Linux, the ls command has a number of useful options. Before we delve into these options, this is a good time to mention man pages.
Man Pages
Linux has a built-in reference manual system called man pages that provide detailed explanations of how to use most Linux commands. To view a man page for a command, do:
$ man <command>
For instance, to find all of the options for the ls command, you can do this:
$ man ls
LS(1) User Commands LS(1)
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is speci‐
fied.
Mandatory arguments to long options are mandatory for short options
too.
-a, --all
This will display the man page for ls which is displayed one screen full of text at a time. You can press the spacebar key to move to read the next page,.
Back to Top
Long listings of files and directories
The ls man page tells us that the -l option provides a long listing format. Let's examine the output of the long listing using the testdir directory we made in our home directory earlier:
$ ls -l testdir
total 0
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 apple.txt
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 banana.csv
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 file1.txt
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 file2.txt
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 file3.txt
-rw-r--r--. 1 jedicker domain users 0 May 24 16:31 orange.doc
This output takes some explanation. The first line of the output total 0
shows us how many kilobytes of data the directory contains. (This value is zero here because we used touch to create several empty files).
Next, each item is shown on its own line with the name of the file or directory in the last column. We'll explain each column of the list, left to right:
-
The first character indicates what type of item this is. If it's a dash (-), then the item listed is a file. If the line begins with a d, the item is a directory. rw-r--r--.
This represents the permissions on the file or directory. We'll cover this below. 1
This is the inode count. The number is always 1 if the item is a file. If the item is a directory, the inode count will indicate the number of items in the directory. jedicker
The file owner (what user owns the file). domain users
The group owner (what group owns on the file). The ISU Active Directory sets a primary group for all users to be "domain users". Note that it is unusual for a group in Linux to have a space in the name. 0
The file size in bytes (we created empty files so they have zero bytes). May 24
The date the file was last modified. 16:31
The time the file was last modified. apple.txt
The name of the file or directory.
File and directory permissions
In Linux, all files have three sets of permissions:
- The permissions assigned to the user who owns the file or directory.
- The permissions assigned to the group owner of the file/directory.
- The permissions given to "other" ( not the owner or group ). These are also known as "world" permissions.
Within each of these sets, there are three privileges: read, write, and execute. It is common for the file owner to have read and write privileges, the group to have read privileges, and other to have no permissions. When you view a long listing, these permissions are displayed using a compact format such as: rw-r-----
. Notice that there are 9 characters. The first three characters, rw-
, show the owner has read (r) and write(w) privileges, but does not have execute privilege (-). The second three characters, r--
, show that the file is readable by members of the group, but not writable or executable. The last three characters of the permissions ( ---
) are all dashes meaning a user who is not the file owner or a member of the group has no privileges on the file.
The execute privilege has a different meaning depending on whether the item is a file or directory. The execute permission on a file means that the file can be run (executed) as a program. Naturally, in order to be executable, the file must be a compiled program or a script. With a directory, execute privilege has an entirely different meaning. When a user has execute privilege on a directory, the user can navigate through the directory into a sub-directory.
Changing the permissions on a file or directory can be done with the chmod command. There are two ways to specify the desired privileges. First, we can represent the permissions as a four-digit octal number where each digit represents four types of permissions:
The first digit sets special permissions on files and directories. Most often, it is used to set the "sticky" permissions on directories like /tmp that need to be writable by anyone. We recommend reading the man page on chmod to learn more about some of these advanced permissions.
The second digit of the octal number sets the permissions for the user.
The third digit sets the permissions for the group.
The fourth digit sets permissions on other.
The values of the octal number are fairly easy to understand: read privilege has an octal value 4, write privilege has a value of 2, and execute privilege has a valued of 1. The overall security privileges on the item is obtained by summing up the desired values. Let's say we want to set privileges so the owner has read and execute, the group has read only, and others have no privileges. We can apply the desired permissions to a files with the chmod command like so:
$ chmod 0640 abc.txt
$ ls -l abc.txt
-rw-r-----. 1 jedicker domain users 0 Jun 6 16:32 abc.txt
The second way to specify privileges in the chmod command is with a symbolic form. In place of octal digits, the permission specification is closer to what you see in the long listing form. The permission specification would be something like
u=rw,g=r,o=r
where u is for user(owner) privileges, g is for group privileges, and o is for others. The r,w, and x correspond to read, write, and execute privileges.
It's probably easiest to explain with examples.
Example 1:
To set the permissions so that you the owner can read and write file (rw), a member of your group can read but not modify (r), and anyone else has no privileges(-), you can do this:
$ ls -l abc.txt
-rw-r--r--. 1 jedicker jedicker 0 Jun 6 18:30 abc.txt
$ chmod u=rw,g=r,o= abc.txt
$ ls -l abc.txt
-rw-------. 1 jedicker jedicker 0 Jun 6 18:30 abc.txt
In the above, note that chmod 0640 abc.txt
will produce the same results as chmod u=rw,g=r,o= abc.txt
. Note that o=
is empty, meaning others have no privileges.
Example 2:
Let's say you want to make a file modifiable by someone else in your group. You can add the write privilege for the group while leaving other permissions unchanged like so:
$ chmod g+w abc.txt
To make a file executable, it is common to
$ chmod +x abc.txt
Copying files and directories
Files
The most basic way to copy a file is with the cp command.
$ cp abc.txt abc.txt.backup
$ ls abc*
abc.txt abc.txt.backup
If you want to copy abc.txt in the current directory to the Documents folder in your home directory:
$ cp abc.txt ~/Documents
If you want copy the file Makefile from your Downloads directory to your current directory:
$ cp ~/Downloads/Makefile .
NOTE: In the above, we use the single dot ( a period ) to mean "the current location". The use of a single dot to indicate the current directory is a very common idiom in Linux commands.
Directories
We can also use the cp command to copy directories, but there are some nuances. First, if the directory you want to copy has files in it, you must include the -r (recursive) option:
$ mkdir testdir
$ cp testdir backup
$ ls
testdir backup
$ rmdir backup
$ touch testdir/testtfile
$ cp testdir backup
cp: -r not specified; omitting directory 'testdir'
The cp command above didn't succeed because the testdir directory is not empty. In order to copy the directory when it has files in it is to use the -r option:
$ cp -r testdir backup
$ ls backup
testfile
Using rsync to copy files and directories
Rsync is one of the most useful commands in Linux. If you find yourself copying a lot of files, you should know how to use rsync to do it efficiently. The basic way to use Rsync is to provide the source and destination, like so:
$ rsync <source> <destination>
where <source> and <destination> are the name of what you want to copy and where you want it to end up, respectively.
Archive and Verbose options
While there are many options you can give Rsync (see man rsync ), most of the time you want Rsync to be verbose about what it is copying, and to copy everything exactly like an archive. For this reason we usually use the -v and -a options with rsync:
$ rsync -av <source> <destination>
Example 1: Copy a directory
Let's start with a simple case. Let's say you want to copy a directory called master and all the files in it to a backup directory called backup. You could do it like this:
$ rsync -av master backup
sending incremental file list
created directory final
master/
master/data001
master/data002
master/data003
...
(full list truncated)master/data098
master/data099
master/data100
sent 5,305 bytes received 1,948 bytes 14,506.00 bytes/sec
total size is 0 speedup is 0.00
From the output we can see that rsync created a directory called master under the backup directory then copied the all of files in the master directory to the new master directory. The important thing to know is that if you run the command a second time, no files will be copied because the destination files have already been copied. That is, if the files of the both the source and the destination are exactly the same, there is no reason to copy anything. So Rsync can be used to keep two directories in sync, while only copying the files that are different.
Example 2: Copy the files in a directory
Let's say, that instead of copying the folder we just want to copy the files in the folder. This is done by adding a trailing / to the source.
$ rsync -av master/ final
This time, instead of creating a directory called master in the bakup directory, only the files within master are copied. This is a subtle difference that it is important to understand. Adding a / to the source directory completely changes how the files are copied.
Example 3: Copy a directory exactly, removing any extra files in the destination.
In the last example, we're going to add a --delete flag that tells Rsync to delete any files in the destination that are not present in the source directory. So if there are files in the backup folder that aren't present in the source, Rsync will delete them.
$ rsync -av --delete master backup
Moving (or renaming) files or directories [ mv ]
Now that we have an easy way to create files we can easily show how to use the mv command to move files from one location to another.
Starting in our current directory, let's move up one level and create a directory called test2:
$ pwd
$ cd ..
$ mkdir test2
$ ls test2
The ls command above will return no output because test2 is empty. Now let's use the mv command to move all the files in the testdir to directory test2:
$ mv test/* test2
$ ls test2
apple.txt banana.csv file1.txt file2.txt file3.txt linux.words myfile.txt orange.doc
In the above example we have used the mv command to move files from one folder (test) to another (test2). The mv command can also be used to rename a file or directory:
$ cd test2
$ pwd
$ mv myfile.txt newtest.txt
$ ls
apple.txt banana.csv file1.txt file2.txt file3.txt linux.words newtest.txt orange.doc
When using the mv command to move (or rename) a file, if there is already a file with the new name, mv will simply overwrite the existing file.
Note that if the destination directory already exists, the source file or directory being moved will be placed into the existing directory:
$ mv test test2
$ ls test2
Removing Files and Directories [rm]
Removing files is done with the rm command as below:
$ ls abc.txt
abc.txt
$ rm abc.txt
$ ls abc.txt
The second ls command gives no output, which means the file is no longer there.
Topics:
Environment Variables
Environment variables play an important role in Linux. They are used to set special types of variables that are used by applications. Environment variables can be used to customize how applications behave.
Setting and Displaying Variables
Creating and assigning variables is pretty straightforward:
$ PROJECTNAME="My Project"
$ echo $PROJECTNAME
My Project
The echo command is simply used to output text. In this case, we want to see the value of the PROJECTNAME variable, so we must place a dollar sign ($) in front of the variable name, so echo $PROJECTNAME
outputs the value of PROJECTNAME.
To see how important environment variables can be, let's look explore one of the most important environment variables, PATH. The PATH variable contains a colon-separated list of software paths that the shell will look through to find the programs a user runs.
$ echo $PATH
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin
Conversely, the which
command can be used to show the path to specific command:
$ which date
/usr/bin/date
On this system the date program is found in the directory /usr/bin which is where hundreds of common Linux commands are stored. However, in many environments, major applications are usually not in the default PATH locations. On HPC environments, the module command is often used to "add" directory path for different applications to the user's PATH.
$ which matlab
/usr/bin/which: no matlab in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin)
The message above indicates that the command matlab is not in any of the default path locations. We can use the module
command to modify the directory path where the matlab program is stored to the PATH variable locations:
$ module load matlab
$ which matlab
/opt/rit/proprietary/matlab/R2022a/bin/matlab
In this case, we can see that the module command adds a new software path to the PATH variable:
$ echo $PATH
/opt/rit/proprietary/matlab/R2022a/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/cuda/bin
Notice that the path /opt/rit/proprietary/matlab/R2022a/bin was added. It's instructive to point out that you don't need to use a module command to add a path to the PATH variable. You can just do it yourself:
$ export PATH=/opt/rit/proprietary/matlab/R2022a/bin:$PATH
The above command creates a new value for PATH. It starts with the new desired path /opt/rit/proprietary/matlab/R2022a/bin, and then adds a colon separator (:) and then includes the existing value of $PATH to the end of the variable. By placing the new path at the front of the list of paths means that the bash shell will find the matlab command in /opt/rit/proprietary/matlab/R2022a/bin first. Another detail to note is the use of the export
command. The export
command ensures that this variable will be shared to any sub processes.
Some Terminology
Here are some terms that you should become familiar with:
bash | The most common shell on Linux. The path is usually /bin/bash |
directory | What Linux calls a folder. A specific location in the file system tree. |
home directory | The primary place where a user's files and configuration settings are stored. This is usually /home/<username>. Though HPC users will store most of their data in their work directory, /work/<groupname>. |
file system | The organizational structure used by the storage system to keep track of files and folders. |
prompt | Text that the shell displays when it is ready for you to enter another command. |
shell | A program that interprets the commands you type and enables you to interact with the system. |
root | In Linux, root is the name of the "super user" account that has privileges to install software and make changes to the system. We also use the term "root" to refer to the top of the file system tree. |
working directory | When you are working in the shell, your working directory is the path in the file system tree to where you are working. Users always start out with /home/<username>. As you move around file system your working directory get update |
The File System
Linux uses a straightforward hierarchical file system structure. The top of the file system is called "slash" or "root" and has the path of /. All other paths are sub-directories under /. For instance, most Linux systems will typically contains the following directories in the file system tree. subdirectories under /:
/boot | Where the software for booting the system is stored. |
/dev | Where special "device files" are stored which represent |
/etc | Where system configuration files are located. Usually only the root user can change things there. |
/home | Path to users' home directories. Each regular user has a own home directory set to /home/<username>. |
/opt | A path under which software is often installed. On the ISU clusters, many applications are installed under /opt/rit. |
/tmp | Historically, a place for "temporary" files that are used for a very short time. The use of /tmp is discouraged. We recommended that you use a path under /work/<groupname> to store temporary data files. |
/usr | Historically, this is the path under which most Linux software used by regular users is installed. |
/usr/bin | The path under where the most common Linux programs are found. "bin" is short for "binary. For instance, the default version of python on the system has a path of: /usr/bin/python |
/work | All research groups on ISU clusters have a directory under /work/<groupname> for storing their group data. |
Viewing Text Files
Use Grep to search for Patterns in Files
Using Wildcards
Output Redirection
Processes