UNIX USER TRAINING
Session 2 : The UNIX Environment
This session will cover the following topics
The Shell
Types of Shell
The C shell
What happens at login time?
Everything you never wanted to know about variables
Command History
C shell Aliases
Input/Output Redirection
C shell Pipes
Command Substitution
Running Commands in the Background
C shell Limitations
1) The Shell - A Command Interpreter
When you type commands at the UNIX prompt they are 'captured' by a program called the shell and interpreted before being passed to the operating system kernel for execution. The kernel then runs the command and displays the output (if any) on your terminal. But what is the shell and what is it doing when it interprets your commands?
The shell is a standard UNIX program like any other. When you log into a UNIX system the shell is the first program (or process) that you run. You do not need to deliberately run this first shell program; the login process does that for you. Each time you log into a UNIX system, a shell process will be set up. On UNIX workstations or PCs running some sort of X Windows emulator, each terminal window you run also constitutes a shell process.
The shell interprets your commands. This is quite a big job for such a small program. Some of the tasks it performs include
Expanding filename wildcards
Expanding shell and environment variables
Searching for the command you've just entered
Setting up a new process for the command
Passing control to the kernel for the command to run
Dealing with input and output redirection
2) Types of Shell
Solaris includes a number of different shells. The most important of these are described briefly below:
The Bourne Shell (sh) - The original UNIX shell. Good for writing shell scripts but poor for interactive use.
The C shell (csh) - A more advanced shell program. This has good facilities for interactive use, such as a command history mechanism, filename completion and a rudimentary command line editing facility. For writing shell scripts, csh uses a syntax similar to the C programming language.
The Korn Shell (ksh) - A re-write of the Bourne Shell, designed to combine the best programming features of the Bourne Shell with the interactive facilities of the C-shell. It has a more flexible command line editing facility (vi based!), command history and aliasing.
In addition to these, there are a number of other shells which have been developed over time. Such shells include the Bourne-Again Shell (bash) and the Terminal C shell (tcsh) which are not supported under Solaris.
Your user account, when initially set up by IT, will have the C shell as your default login shell. The remainder of this course will, therefore, concentrate on csh . It is important to understand that, nine times out of ten, the shell you are using will make no difference whatsoever to the output of commands like ls , cat , rm and all the others you will be using on a regular basis.
3) The C Shell
a) At login time
When you log in to UNIX with the C shell as your command interpreter, several scripts are run before you even see your prompt. Briefly, these scripts are:
/etc/.login A system-wide parameter file
~/.cshrc Owned by the user, the .cshrc file can contain your own settings
~/.login Again owned by the user, .login can also contain your settings
Thereafter, such as when running a new shell or a command terminal under X, only the .cshrc file is run.
Why have two user-owned files to set up the C shell environment? The difference is subtle: the .login file is run only at login time; the .cshrc file is run every time a new C shell is started. Therefore, place only those commands that need to be run once in the .login file. Typically, .login contains commands to specify the terminal type and environment.
NB: the ~ (tilde) symbol is a C shell abbreviation for your UNIX home directory (e.g. /home/colinbr)
b) Everything you never wanted to know about variables
There are two types of variable.
Environment variables are exported to the environment and will be inherited by subsequent shell processes.
Shell variables are local to that shell process and are not inherited by child shells.
An example to clarify these statements will follow. But first, some basic information on variables.
Legitimate characters that may be used in variable names are
A-Z , a-z , 0-9 , _
It is convention to define environment variables in UPPER CASE while shell variables are usually lower or MixedCase. Variable names must start with a letter; the _ (underscore) is considered a letter.
Use the env or setenv commands to display all the environment variables currently in force. This can generate quite a lot of output, particularly during an X Windows session (CDE or OpenWindows) where the login process sets a lot of display and library related environment variables. To refer to the value of a variable, simply precede the variable name with a $ (dollar) symbol. The examples below use the echo command to display a couple of basic variables.
mclaren% echo $HOME
/home/colinbr
mclaren% echo $PATH
/bin:/usr/bin:/usr/ucb:/usr/sbin:/etc:.:/opt/dbs/oracle/product/816/bin
ALL shells support these; setting them differs from shell-to-shell
NB: .login does NOT get called by CDE logins!
HOME - your home directory (eg /home/colinbr)
PATH - directory search path for executables
TERM - the terminal type
SHELL - the shell interpreter you are using
USER - your username
Other important variables
LD_LIBRARY_PATH path to software library files
EDITOR the default editor to use
VISUAL the default full-screen editor to use
PAGER the page-display program (normally pg or more)
Due to the vagaries of C shell start up, it's probably best to set all environment variables in your .cshrc file.
For example
setenv PATH /bin:/usr/bin:/usr/ucb:.
setenv PAGER more
setenv LD_LIBRARY_PATH /opt/dbs/oracle/product/816/lib:/usr/local/lib
Environment variables can reference themselves. Given the PATH variable shown above, the following command can append /usr/local/bin to the end of the path:
setenv PATH ${PATH}:/usr/local/bin
Note the use of ${PATH} . The curly brackets are necessary to protect the value of the existing $PATH variable from the shell. The : is a character that can be used (normally by accident!) in a variable name. Without these, the command would be
setenv PATH $PATH:/usr/local/bin
This will fail because the shell will look for a variable called PATH:
Simply use the unsetenv command, for example
unsetenv PAGER
Unsetting HOME , USER , PATH and SHELL is not recommended!
Within your .cshrc file, you can set variables using the set command. Built into csh are several useful variables
filec file name completion
prompt the prompt displayed on the command line
history the number of commands kept in the history buffer
ignoreeof the shell will ignore control-D; you must type exit to exit a shell
noclobber prevents > and >> redirections overwriting existing files
savehist the number of commands to be saved in the $HOME/.history file
Most of these are simple switches: they are either on or off and can be set as follows, typically within the .cshrc file.
set noclobber
set ignoreeof
set filec
The prompt , savehist and history variables need extra parameters. For history and savehist, this is the number of commands to be saved in the buffer, for example
set history=50
set savehist=50
For prompt , this is a string which will be displayed at the start of each command line, replacing the default % sign used by the C shell.
set prompt="Whaddyawant? "
or
set prompt="! ${USER}@`hostname`> "
which gives a prompt like:
3 colinbr@mclaren>
The values of the USER , TERM and PATH environment variables are automatically set from the corresponding C shell variables user , term and path .
It is possible to set any variable in your environment - just don't override any of the important system variables. In many cases, these can save a lot of tedious typing. Consider the path
/home/colinbr/PROJECTS/ORACLE/oracle8i/SQL
Then
setenv SQ /home/colinbr/PROJECTS/ORACLE/oracle8i/SQL
and type
cd $SQ
Further, the path can be appended to. If there is a "plsql" directory under the one shown, use the following
cd $SQ/plsql
A couple of simple examples
Running under the C shell, first set a variable called MY_NUMBER
5 colinbr@mclaren> set MY_NUMBER=12345
Spaces, or the lack thereof, are important.
6 colinbr@mclaren> echo $MY_NUMBER
12345
Now run a new C shell process. This is called a child process; the original shell is referred to as the parent.
7 colinbr@mclaren> csh
1 colinbr@mclaren>
In this case, you can see the current command number is reset from 7 to 1, indicating no commands have yet been typed in this shell process. Now display the value of MY_NUMBER .
1 colinbr@mclaren> echo $MY_NUMBER
MY_NUMBER: Undefined variable
2 colinbr@mclaren>
In the new shell, the value of MY_NUMBER has not been set: we say that its value has not been inherited from the parent shell.
Exit from the child shell using ctrl-D. Now, back in the parent shell, unset the value of MY_NUMBER
8 colinbr@mclaren> echo $MY_NUMBER
12345
9 colinbr@mclaren> unset MY_NUMBER
10 colinbr@mclaren> echo $MY_NUMBER
MY_NUMBER: Undefined variable
Then create MY_NUMBER as an environment variable using setenv
11 colinbr@mclaren> setenv MY_NUMBER 12345
Note the difference in syntax here. No "=" sign. Now run another child shell and check the value of MY_NUMBER .
12 colinbr@mclaren> csh
1 colinbr@mclaren> echo $MY_NUMBER
12345
2 colinbr@mclaren>
And you can see that MY_NUMBER has been inherited by the child process.
What if you run a different shell?
13 colinbr@mclaren> echo $MY_NUMBER
12345
14 colinbr@mclaren> sh
$ echo $MY_NUMBER
12345
$
MY_NUMBER is still inherited, in this case by a Bourne shell child process.
c) Command History Mechanism
The C shell includes a mechanism that can store previously run commands so that they can be conveniently recalled and re-run, a feature which saves much tedious typing. Command history, however, is not enabled by default and requires two or three changes to be made in the ~/.cshrc file. Add the following to your .cshrc file
set history=50
alias h ‘history’
Optionally add the following
set savehist=50
The first two commands, respectively, set the history list length to 50 commands and set an alias so the saved commands can be recalled simply by typing h . The third command specifies that 50 commands should be saved in the file ~/.history as a persistent record of commands.
When these settings are in effect, you can recall your history list and re-run commands from that list. For example:
5 colinbr@williams> h
1 cd MISC
2 ls
3 ls pci*
4 vi pcift_archive_index
5 h
6 colinbr@williams>
To then re-run the command ls pci* , simply type
6 colinbr@williams> !3
The ! symbol instructs the C shell to re-run command number 3. The re-run of this command then appears in the history list, thus:
6 colinbr@williams> !3
ls pci*
pci_cica_errors pcift_archive_index pcift_sep
pci_opt pcift_archive_readme pcift_uploads
pci_segs_cds pcift_archives pcift_useful
pciesp_useful pcift_march
7 colinbr@williams> h
1 cd MISC
2 ls
3 ls pci*
4 vi pcift_archive_index
5 h
6 ls pci*
7 h
8 colinbr@williams>
Note that the chosen command is displayed before being run.
Commands from the history list can also be appended to quite easily. Command number 2 in the above list can be recalled and piped (see the section on pipes, below) into the more command, like this:
8 colinbr@williams> !2 | more
This runs the command ls | more , where ls is derived from the substitution of command number 2 from the history list.
A very simple way of running your most recent command is to use !! (two exclamation marks) as shown below:
9 colinbr@williams> !!
There are a number of other useful history manipulations, some of which are described briefly below. In these descriptions str is any string of characters.
!str re-run the last command that started with str
!str additional re-run the last command starting with str and add the additional characters
!?str? re-run the last command containing str
!?str? additional as above but append the additional characters to the new command
d) C shell Aliases
Another method for reducing tedious typing is the command line alias in which
a long or complex command line can be reduced to a few characters. Aliases are
defined in the .cshrc file. Simple examples include
alias lsl 'ls -alF'
alias lsf 'ls -aF'
alias cls 'clear'
More useful aliases include
alias cp ‘cp –i’
alias mv ‘mv –i’
alias rm ‘rm –i’
It is not normally sensible to alias a command back to itself. The following is legitimate syntax:
alias ls 'ls -a'
If you then write a script whose behaviour depends on ls acting in this way and then port that script to machine where the alias does not exist, the script will fail (or at least produce strange results).
In the case of the mv, cp and rm aliases shown above, these can prevent horrible mistakes. Should you wish to switch off an alias on a temporary basis (for example when copying a hundred files to a new directory, this will be tedious with cp –i) use a backslash character to ‘escape’ the special meaning of the alias. For example
\cp *.tif ../Backups
e) Input/Output Redirection in the C shell
What is I/O redirection? In a standard command line UNIX session, the shell will read its input from the keyboard and send its output back to the screen. The keyboard is referred to as the standard input device (abbreviated std.in) and the screen may be called the standard output device (std.out). In addition, error messages are usually sent to the screen, so the display is also called the standard error device (std.err). It is important to realize that, even though std.out and std.err are the same device (i.e. the screen) UNIX treats them as two different output streams and handles each separately.
Within all the UNIX shells, it is possible to change std.in , std.out and std.err to be a file, another command or even a device like a tape drive. This is I/O redirection.
Simple I/O redirections in C shell are performed as follows:
< filename Redirect standard input from filename
> filename Redirect standard output to filename
>> filename Redirect standard output and append to filename
Examples of these simple cases:
13 colinbr@williams> pg < install_gcc
This will read the file install_gcc with the pg command. In reality, this is a nonsensical example as the command
14 colinbr@williams> pg install_gcc
performs the same task without the redirection. Such input redirections are more useful when pg is replaced with a shell script of your own devising.
The following command writes a directory listing to a file
15 colinbr@williams> ls –al > my_file.lst
Interestingly, if you then cat my_file.lst you will see an entry similar to that shown below:
-rw-r--r-- 1 colinbr it 0 Sep 5 11:19 my_file.lst
This indicates that the shell has created a new, empty my_file.lst , ready to receive the redirected data before the ls command is executed by the kernel.
More data can then be appended to my_file.lst as follows
16 colinbr@williams> head -20 jordan >> my_file.lst
or, perhaps more simply
17 colinbr@williams> echo “hello world” >> my_file.lst
The matter of redirecting standard error output to a file is one of the areas in which the C shell differs slightly from the Bourne and Korn shells. In the C shell, use the following syntax
>& filename Redirect std.in and std.err to filename
>>& filename Redirect std.in and std.err and append to filename
To take a simple example, consider the directory listing below:
-rw-r--r-- 1 colinbr it 2415 Aug 16 13:54 binaries.html
-rw-r--r-- 1 colinbr it 7421 Aug 16 13:54 build.html
Try the following command
24 colinbr@williams> ls -l b* x
x: No such file or directory
-rw-r--r-- 1 colinbr it 2415 Aug 16 13:54 binaries.html
-rw-r--r-- 1 colinbr it 7421 Aug 16 13:54 build.html
We get the message “x: no such file or directory” because x does not exist. All this output appears on the screen. Now try
25 colinbr@williams> ls -l b* x > error1
x: No such file or directory
26 colinbr@williams> cat error1
-rw-r--r-- 1 colinbr it 2415 Aug 16 13:54 binaries.html
-rw-r--r-- 1 colinbr it 7421 Aug 16 13:54 build.html
In command number 25, the error message appears on the screen. The output of the successful part of the command line (ls –l b*) is written to the file errors1 (as seen in command 26). Standard error has not been redirected. Then use the following command:
27 colinbr@williams> ls -l b* x >& error2
28 colinbr@williams> cat error2
x: No such file or directory
-rw-r--r-- 1 colinbr it 2415 Aug 16 13:54 binaries.html
-rw-r--r-- 1 colinbr it 7421 Aug 16 13:54 build.html
In command 27 we see no output at all. Both standard output and standard error have been written to the file error2 (as displayed by command 28).
It is possible, but slightly tricky in C shell, to redirect standard output to one destination and standard error to another. This involves running the original command in a sub-shell of its own, within that sub-shell, redirecting std.out, then outside the sub-shell, redirecting std.err . The description is a mouthful but the syntax is not that complex. For example:
30 colinbr@williams> (ls -l b* x > error3) >& error4
32 colinbr@williams> cat error3
-rw-r--r-- 1 colinbr it 2415 Aug 16 13:54 binaries.html
-rw-r--r-- 1 colinbr it 7421 Aug 16 13:54 build.html
33 colinbr@williams> cat error4
x: No such file or directory
In command 30, the parentheses instruct the C shell to run the ls command in a sub-shell and place its standard output in the file error3 . The rest of the command line directs the standard error into the file error4 . Commands 32 and 33 display the two files.
This may seem to be a nit-picking point but there may be times when you need to capture the errors along with the output, for example, of a custom written script.
Some general points about I/O redirection
With both the >> and >>& forms, if filename does not exist, it will be created.
If the noclobber C shell variable is set, output redirection will fail if the target file already exists unless one of the following forms is used
>! >&! >>! >>&!
In effect, the ! overrides the noclobber setting
Redirecting std.out and std.err to different files is easier under the Bourne and Korn shells than under the C shell.
f) C shell Pipes
In the preceding section it was claimed that standard input, standard output and standard error could be changed to be, amongst other things, another command, and then went on to discuss (at exhausting length) redirection to files. When commands are connected together, such that the output of the first command becomes the input to a second command, that is called a pipeline, or simply pipe.
In a command line the pipe operation is represented by the | character (vertical bar) as shown in this simple example
72 colinbr@mclaren> cd /usr/bin
73 colinbr@mclaren> ls -la | more
This command directs the output of the ls –al command into the input of the more command, which of course then sends its output to the screen, and has the practical effect of displaying a long directory listing a page at a time.
More complex pipelines are often best built up in stages, with a little bit of trial and error. Consider the following: the wc command prints the number of lines, words and characters in a file (or files), respectively, as shown in the following output.
85 colinbr@mclaren> wc *.txt
38 356 2196 clancy_bio.txt
36 269 1727 elf.txt
22 175 1100 news.txt
96 800 5023 total
What if we want the total number of characters these files amount to? The answer to that is 5023, the third field of the last line. For just three files, it would be a trivial matter to use
86 colinbr@mclaren> ls –l *.txt
and add up the file sizes with a on paper! However, this becomes tedious in cases where there are hundreds of *.txt files. So try the following, to isolate the last line of the output
86 colinbr@mclaren> wc *.txt | tail -1
96 800 5023 total
But we are still only interested in the total, the third column of this output. Getting into UNIX esoterica at this point, use
87 colinbr@mclaren> wc *.txt | tail -1 | awk '{print $3}'
5023
The awk command, in the simple form above, isolates column 3 of the output (awk can do an awful lot more than this, however).
Why might this be useful? Apart from estimating disk usage, the value output by this command can also be assigned to a shell or environment variable. We will look at this in more detail in a later section.
Here is another example of pipes in action:
48 colinbr@mclaren> ps -ef | grep colinbr | grep -v grep
colinbr 3447 3445 0 Aug 28 pts/11 0:00 -csh
colinbr 12060 12058 0 Aug 31 pts/20 0:00 -csh
colinbr 22157 22155 0 11:03:11 pts/28 0:00 -csh
colinbr 21870 21868 0 10:40:38 pts/22 0:00 –csh
This command line is an example of a filter. It uses the ps –ef command to display all the processes on the system. The first pipe then selects any lines that contain colinbr , i.e. we are looking only for processes owned by Colin Brett. The last portion of the command line then filters out the grep command itself (the –v option to grep inverts the search). We are left with a break-down of what Colin Brett is up to on mclaren (running four C shell processes).
Finally in this section, the output of a pipeline can itself be redirected to a file or device. As a simple example try
91 colinbr@mclaren> who | grep colinbr > cols_logins
92 colinbr@mclaren> cat cols_logins
colinbr pts/22 Sep 6 10:40 (williams)
colinbr pts/28 Sep 6 11:03 (williams)
colinbr pts/11 Aug 28 14:36 (t130)
colinbr pts/20 Aug 31 14:36 (brabham)
Command 91 uses the who command to determine who is logged in and filters that output for the user colinbr before writing that output to the file cols_logins . Basically, what we get here is proof that Colin Brett has been logged into mclaren from various machines (williams, brabham and t130), in some cases for over a week!
g) Command Substitution
We have already seen one example of command substitution in the section on C shell variables. To recap, that example was:
set prompt="! ${USER}@`hostname`> "
What we see here is the shell prompt being set to the string enclosed in double quotes “” but within that string is the expression `hostname` enclosed in single backward quotes ``. These quotes, variously called grave accents, backquotes or backticks, are the mechanism by which command substitution occurs.
When the shell parses a command line, it substitutes in variables, expands file names, resolves aliases and looks for commands in these backquotes. If such commands exist, a sub-shell is spawned, the backquoted command is run, and its results substituted into the original command name. So, in the above example, the shell is performing the following actions
Carries out the hostname command
Works out the value of the $USER variable
Gets the value of the current command number and substitutes that for the ! character
Parses the set prompt command
The final result, on screen, will look something like
100 colinbr@mclaren>
where
100 is the current command number
colinbr is the value of $USER
mclaren is the output of the hostname command
The @ and > characters are treated as literal strings
In the section on Pipes, we looked also at the command
87 colinbr@mclaren> wc *.txt | tail -1 | awk '{print $3}'
5023
Using command substitution within a shell script, we could assign the output of this command to a variable
set text_size=`wc *.txt | tail -1 | awk '{print $3}'`
The value can then be manipulated within the script, perhaps dividing it by 1024 to get sizes in kilobytes, or comparing it to a fixed value for determining file size limits.
Command substitution is an immensely useful facility, though perhaps more useful within a script than in a command line UNIX session.
h) Running Commands in the Background
If you have a command that you know (or suspect) will take a long time to run, consider redirecting the output to a file and running the command in the background. In this way, the C shell begins executing the command and returns your prompt so you can continue working. The syntax to run a command in the background is to add a & (ampersand) character to the end of the command line. As a simple example, consider a command to recursively list the contents of your home directory:
20 colinbr@williams> ls -alR > my_files &
[1] 4687
21 colinbr@williams> ps -ef | grep ls
colinbr 4687 3327 4 15:48:19 pts/16 0:01 ls -alR
colinbr 4689 3327 0 15:48:23 pts/16 0:00 grep ls
Command 20 is the recursive ls command, redirected into the file my_files . The & character instructs the C shell to spawn a new child shell for the ls command. We see no output from this command (as would be expected because std.out is redirected) except for the following numbers:
[1] 4687
The [1] indicates this is job number 1 that has been spawned from the current shell and the 4687 is the process ID number (PID) of the child shell. After this output is displayed we are returned to the prompt ready for the next command.
With the ps command (21) we can see PID 4687 running. While the command runs we can be doing other things, for example
23 colinbr@williams> ls –l my_files
-rw-r--r-- 1 colinbr it 434176 Sep 11 15:48 my_files
24 colinbr@williams> !!
lsl files
-rw-r--r-- 1 colinbr it 458752 Sep 11 15:48 my_files
We can see the size of my_files increasing as more output is written to it. Eventually the ls command will finish. On pressing Return or Enter at the end of a subsequent command, we see output like
[1] + Done ls -alR > my_files
28 colinbr@williams>
This indicates that job number 1 is completed and the command that was running is also echoed to the screen.
i) C shell limitations
As stated in the C shell man page:
Although robust enough for general use, adventures into the
esoteric periphery of the C shell may reveal unexpected
quirks.
This does not necessarily mean the C shell is full of bugs. But there are certain limitations that apply, some of which are summarized below.
A word (i.e. a single string of characters) cannot exceed 1024 characters
An argument list cannot exceed 1,048,576 characters
The maximum number of arguments to a command for which filename expansion applies is 1706
Probably the most commonly-met limit is the last one. For example, an attempt to remove all the files in a directory with the command
32 colinbr@williams> rm *
will fail if there are more than 1706 files in that directory.