Beginning Perl Class

And so our journey begins...

What is Perl?

Perl is a "Practical Extraction and Report Language" that is freely available for many platforms. It fills the niche between the classic interpreted shell languages (sh, csh, DCL) and the compiled languages (C, Pascal, ...). Perl itself is an interpreted language, but it compiles on the fly, so as to give you reasonably fast programs without sacrificing source code availability.

Course Requisites and Goals

This class assumes that participants have some programming experience in programming languages such as C, Pascal or Basic, and access to a system with Perl 5 installed. After having completed this course, participants should expect to be able to do the following:

Where to Get Perl

You're probably going to need to do a little work if you're not on the Sun network (ie, the one I run). You need a current version of Perl. This means Perl 5. Not Perl 4. If you are stuck in the Microsoft world, I pity you, but in spite of that, you can still get real work done. Here are a list of places you should look at to get Perl and other resources related to perl.

The Very Beginnings

Data Structures

In Perl there are a number of different kinds of variables that you need to be aware of. These are as follows:
$bob		# Standard scalar variable, can be string or number
BOB		# Filehandle
@bob		# A numerically subscripted array
%bob		# A string subscripted array
&bob		# A subroutine
Scalar Variables
The first variable $bob is a standard scalar variable. It can be just about any value that you would like. Look at the following examples:
$bob = "I hate Fred";	# A string of characters
$bob = "I";		# A single character
$bob = 1;		# A number
$bob = 3.1;		# Pi defined in Tennessee
$bob = 6.023e23;	# Number in scientific notation
$bob = "I hate $fred";  # A string with interpolation
$bob = `uname -n`;	# A command
One of the wonderful things about Perl, is that it doesn't care what you mean until you say it. As a matter of fact, you could do one of these as an assignment and the immediately follow it with a different one, and Perl would happily comply.

Looking Ahead: You'll notice that I included a ; after every command. This indicates the end of a command to Perl. You'll also notice that I used a # after the semicolon for my comments. # is the comment character in Perl.

Filehandles
Filehandles are a convenient way to deal with files, input and output. They provide the interface between the rest of the Perl program. Filehandles are typically all capital letters as a convention. There are 3 filehandles that Perl uses as defaults . These are STDIN, STDOUT and STDERR . These correspond to stdin, stdout and stderr in Unix.

By default the Perl line separator is a carriage return (\n). This is very convenient, as most text files have lines that are delimited by carriage returns, and users have been trained to hit enter or return after typing in input. This allows us to do convenient actions like the following:

$answer = <STDIN>;

This says to put whatever the user typed in, up to and including the carriage return, into the variable $answer.

Note: This last point bears repeating. Perl tends to keep all of the users input, including the carriage return. I guarantee that at least once, this will bite you. To get around this, you want to make sure that you use the chomp function to remove carriage returns. The syntax is as follows:

chomp $answer;

As for output, you will normally either want to print to a file, or to the users screen. To print to the screen, you will normally do something like this:

print STDOUT "Hello world\n";
print STDERR "Hello world\n";
print "Hello world\n";
"What is that!?!?", you say? It would appear that that third print statement is missing a filehandle to print to. In fact, this will print to to the currently selected filehandle, usually STDOUT . Perl tries to assume logical defaults, so that you do not have to type as much. In this case it assumes that by defaults prints should go to STDOUT.

Now, if you want to print to a file, you need to do a couple of extra things. First, Perl needs a way to know what file you want to either read from, write to, or append to. Here is how this is done for all three cases.

open(INPUT,"input.txt");
open(OUTPUT,">output.txt");
open(APPEND,">>append.txt");

print OUTPUT "Hello world\n";
print APPEND "Hello world\n";

close(INPUT);
close(OUTPUT);
close(APPEND);
You should notice several things here:
Arrays
Perl has several kinds of arrays. The two most commonly used are the typical numerically subscripted array, and the associative array.

The numerically subscripted array looks like this:

@bob
With individual elements set or retrieved as follows:
$bob[0] = "fred";
$bob[2] = "jane";

print "$bob[2]\n";

Note: Please don't use @bob[1]. This is a common mistake and it doesn't mean anything close to $bob[1].

If you want to know how many elements are in an array, you can use the $#bob to find out .

Now, several other things you may want to do are illustrated below:

$bob[0] = 1;		# Set the first element of @bob to 1
$bob[1] = 2;		# Set the second element of @bob to 2
$bob[2] = 3;		# Set the third element of @bob to 3
@bob = (1,2,3)		# Same result as the previous 3 examples
@bob = (6..9);		# This is the same as: @bob = (6,7,8,9);

@fred = (2,3)		# Set $fred[0] = 2 and $fred[1] = 3
@bob = (1,@fred,4)	# Same as @bob = (1,2,3,4)
The other array type is the associative array (or hash array). These arrays use scalars to index. Examples are as follows:
$weekdays{'Wednesday'} = 4;	# Wednesday is the 4th day of the Week
$daysofmonth{'January'} = 31	# There are 31 days in January

In order to address the whole array for an associative array, you use %arrayname. This means in our last examples we would have had %weekdays and %daysofmonth.

Relational Operators

One of the places that people get into the most trouble is using the wrong operator in the wrong context. Or, equally common mistake is to use an assignment where you mean relational operator. Here are some examples:
if ($bob = 1) { return;}	# This is wrong
if ($bob == 1) { return;}	# This is right. (Assuming $bob has a number)
The difference between these two is that the first one is assigning $bob the value 1, whereas the second one is test to see if $bob is equal to 1.

Looking Ahead: You'll notice this is the first use of a conditional in our examples. The use of if indicates you should only do what is in the {}'s if the argument in the ()'s is true. In Perl, the {}'s take the place of the IF ... THEN DO ... END; ELSE IF ... THEN DO ... END.

Note:Perl does have an ELSE IF conditionaly, but it is elsif. Notice that there is only 1 e! It won't work as elseif and you'll get a red spot on your head from beating it on the table if you try to use elseif.

The following is a list of relational operators that are commonly used:

Numeric		String	Meaning
-------		------	-------
 ==		  eq	Equal to
 !=		  ne	Not equal to
 >		  gt	Greater than
 >=		  ge	Greater than or equal to
 <		  lt	Less than
 <=		  le	Less than or equal to
 <=>		  cmp	Not equal to, with signed result
It is very important that you not confuse the numeric relational with the string relational, otherwise you're not going to get what you want. For instance, let's look at the following program:
#!/usr/local/bin/perl
$bob = "aaa";	# Make $bob a string
$fred = "bbb";	# Make $fred a string
if ($bob == $fred) {print "This is probably not what you wanted.\n";}
if ($bob eq $fred) {print "This is not true.\n";}
if ($bob ne $fred) {print "This is true.\n";}
Which produces the following:
node72% perl /tmp/test-program
This is probably not what you wanted.
This is true.

Here, the first example prints something, the second will not and the third will. The first prints because the two strings do not make sense in a numeric context. The second will not print because in a string sense, $bob and $fred are not equal. The third does print because they are not equal.

Note: Check it out, our first fully functional Perl program! Of course it doesn't do much, but it is a complete program. If you'll notice I chose to execute this by typing perl programname. You could also do the following, and get the same result:

node72% chmod 755 /tmp/test-program
node72% /tmp/test-program
This is probably not what you wanted.
This is true.

Loops and I/O

More often than not, the reason that you want to write a program is to do some repetitive and menial task while you surf the net. ;) To do this you need to have handy loops. Here are a few of the more popular:
The for loop
The for loop is often the loop that people are most familiar with. It's basic action is to start at one point, do some action and increment your pointer, until you hit the end pointer. Perl allows you to do this in a bunch of ways. Here are a few:
for(i=1;i<=7;i++)	# C style
for(1,2,3,4,5,6,7)
for(1..7)
These all say the same things, just in different ways that might be easier to understand.

The foreach loop
It turns out that the foreach loop is the same thing as the for loop, only lacking punctuation. Here is an example:
foreach $fred (@bob)
What this does is step through @bob and put each successive element into $fred

The while loop
The while loop runs until you have exhausted whatever is the looping condition. If this is a filehandle, it is normally until it hits the end of the file (EOF). For an array, it is until you have exhausted the array. Here are two examples of this:
while(<>) {
  chomp;
  print "The line was: $_\n";
}

while(@ARGV) {
  $bob = shift @ARGV;
  print "The argument is: $bob\n";
}
OK, I admit it, I'm cheating. I'm sneaking in little lessons you can use later into my examples, to confuse you and to make me look smarter than I am. ;) Let's go over these examples in a little detail.

There are two places in this first example that I use defaults that really reduce readability if you aren't familiar with Perl, but save several characters of typing. They are:

What I Said	What it Meant
-----------	-------------
<>		$_ = <STDIN>
chomp;		chomp $_;
The other concept I'm introducing is $_. $_ is a variable just like $fred or $bob. The difference is that Perl uses it a lot. As a matter of fact, most of the time if you don't specify a variable to use for a function or operation that requires one, it will use $_ by default. Put more precisely $_ is the default input and pattern-searching space. This means, if no variable is specified for either input or a pattern search, $_ is assumed.

This first program just reads from STDIN, gets rid of the newline and then prints what it got and adds a newline back on. Not very useful, but it illustrates an example.

The second example is only a little more subtle in trying to slide in helpful lessons for later. The first question is what is this @ARGV thing? It is the list of all arguments passed on the command line to the program. If you write programs that you want to have different actions depending on what the users provide as switches this is how you will do it. This program goes through the list of arguments, pulls them out of @ARGV and prints what they were. Again, not very useful in what it does, but the loop is very useful.

Note: Unlike C, Perl does not include the program name in the argument list. So, $ARGV[0] is NOT the name of the program. The name of the program actually is stored in $0.

Pattern Matching

I sincerely hope I haven't lost you to this point. We are about to start on pattern matching, or regular expressions. These are arguably the heart and soul of Perl. Before you get used to them, you'll hate them, once you know them, you'll love them.

REGular EXPressions (regexp) are really almost a separate language. So, let's cover the vocabulary of the language first, then we'll move to the grammar. Here are the basic elements:

Expression	Meaning
----------	-------
.		Matches any character except newline
[a-z]		Matches any character in the set
[^a-z]		Matches any character not in the set
^		Negation or beginning of line
$		Anchor to the end of the line
\d		Matches a digit => [0-9]
\w		Matches any word character => [a-zA-Z0-9_]
\s		Matches any whitespace (tab, space, newline)
\n		Matches a newline
\D		Matches any non-digit => [^0-9]
\W		Matches any non-word => [^a-zA-Z0-9_]
\S		Matches any non-whitespace
x?		Matches 0 or 1 occurrences of x
x*		Matches 0 or more occurrences of x
x+		Matches 1 or more occureneces of x
()		Used for later backreferences
The easiest way to understand regular expressions is to just jump in and see how they work. It's painful, but necessary. Let's take make a list of potentially interesting strings that we are interested in manipulating.
6.2345		90	30
400,600		300,90
/net/switchyard/disk/disk3/local_design/l141_NaN
Here are some sample code fragments to parse these three examples:

if ($_ =~ /([0-9]+)\.([0-9]+)\s+([0-9]+)\s+([0-9]+)/) {
  $before_decimal = $1;	# 6
  $after_decimal = $2;	# .2345
  $second_number = $3;	# 90
  $third_number = $4;	# 30
}

if (/([0-9]+,[0-9]+)\s+([0-9]+,[0-9]+)/) {
  $first_pair = $1;	# 400,600
  $second_pair = $2;	# 300,90
}

if (m#^(.*)/disk3/local_design/(.*)$#) {
  $beginning = $1;	# /net/switchyard/disk
  $design_name = $2;	# l141_NaN
}
Yep, you guessed it. Another insidious attempt on my part to sneak more information into easy examples. Especially nasty is my use of defaults again.

Perl has two pattern matching operators, substition and matching. In these examples we are just using the matching operator. This operator is m//. However, if you just use the // for indicating the beginning or end of your search region, the m is assumed. The final example illustrates using something other than // for a pattern delimiter. The reason we chose to do that is because our example had extensive use of the / in it. This would have required us to escape all of those /'s with a \. This would have made our third example look as follows:


if (/^(.*)\/disk3\/local_design\/(.*)$/) {
  $beginning = $1;	# /net/switchyard/disk/disk3
  $design_name = $2;	# l141_NaN
}
This is more difficult to read that the original solution. Also, you'll notice that I didn't indicate what string we should be doing pattern matching on, except in the first example. As was indicated before, a good assumption is that the default is $_.

Finally, you'll notice that the relational operator doesn't appear in our original list. The regexp relational operators are:

Operator	Meaning
--------	-------
=~		Equal to
!~		Not equal to

Search and Replace

The second part of the regexp usage is search and replace. It is often also given as the s/// function. This is a handy function for when you want to change the occurence of one pattern into the another pattern. The other pattern can even be nothing, thus basically allowing you to delete. Let's look at some patterns you might want to operate on:
mroe
teh quik borwn fox jumped over teh lazy dre god hi mom
convienent
Let's assume we have these lines in a file, and we want to write a program to fix all of our spelling mistakes.
#!/usr/local/bin/perl
# examples/fix-spelling.pl

$filename = "bob.txt";
$output = "fred.txt";

open(INPUT,"$filename");
open(OUTPUT,">$output");

while (<INPUT>) {
  chomp;

  s/mroe/more/;
  s/teh/the/g;		# Global
  s/quik/quick/;
  s/borwn/brown/;
  s#convienent#convenient#;
  s/dre/red/;
  s/god/dog/;	
  s/hi mom//;		# Erase the "hi mom"

  print OUTPUT "$_\n";
}

close(INPUT);
close(OUTPUT);
And, when we run the program look what we get:
node72% ./test-program
node72% cat fred.txt
more
the quick brown fox jumped over the lazy red dog
convenient
The only real trick that you haven't seen before is the use of the the s///g. The g indicates to do a global replace. Normally the replace will only replace the first one that it encounters, but since we needed to fix both occurences of teh we needed to use the g .

Subroutines

For some reason early in my Perl career, I was deathly afraid of subroutines. Why? I haven't the foggiest, but I want to make sure that other people don't experience the same thing I did and have to re-write their programs later when they realize that they've been idiots.

Subroutines are great for when you have a task you would like to do within a program, and you want to do it over and over again. You could just duplicate the code a bunch of times, but that could severly cut into your net surfing. ;) This is the perfect application for a subroutine. So, now let's get to some contrived little example to show how to do subroutines.

Let's say you have an input file that looks something like this:

bob	10 10 20
fred	20 20 5 90 1000
jane	5 4 30
The first column is the name of one of your "friends" and the numbers are how much money you've won from them the last several poker games that they haven't paid up on. Now, before you decide to send "Kneecap Manny" after them, you want to send them one last bill so they can pay. So, we're going to write a program to go through this list, and total up all of the money they owe you. Here goes:
#!/usr/local/bin/perl
# examples/manny.pl

$input = "manny.inp";		# The data file from our example

open(INPUT,"$input");		# Open the file to read

while(<INPUT>) {		# While we still have moochers
  chomp;			# remove the \n

  if (/([a-z]+)\s+(.*)/) {	# Parse up the line into the name and the rest
    $total = &total_loot($2);	# Call our subroutine with the numbers

    # Print a warning message
    print "Listen up $1, you little worm, send me $total dollars before\n";
    print "I send Kneecap Manny for you!\n\n";
  }	#end if

} #end while

close(INPUT); 			# Close our filehandle

# Our subroutine that returns the total that they owe us
sub total_loot {

  local (@mulah,$total,$amt);	# Setup local variables so that they
				# don't interfere with our globals

  @mulah = split(/\s+/,$_[0]);	# $_[0] contains the rest of the line.  We
				# want to get those numbers into our array
				# so that we can add them

  foreach $amt (@mulah) {	# For every number in our array
    $total += $amt;		# add it to the total
  }

  return $total;		# return the total
}		# end total_loot()	
If we run this program on our input, we get:
node72% ./test-program
Listen up bob, you little worm, send me 40 dollars before
I send Kneecap Manny for you!

Listen up fred, you little worm, send me 1135 dollars before
I send Kneecap Manny for you!

Listen up jane, you little worm, send me 39 dollars before
I send Kneecap Manny for you!
So, this program has a small main loop (the while loop) and it calls our subroutine total_loot for every line in our file that matches our basic pattern. Pretty simple, eh?

Homework

Here is a list of problems you might want to try to do for the next session. If you take the time to write the programs to solve these problems it should help you make sure you understand the basics of the language.
Problem #1
Given the following input, you should create output as shown below. Additionally, any line that starts with a # sign should be considered a comment and be discarded.
# This is a comment with donkeys,2,fuscia that should be ignored
rabbits,4,brown
dogs,2,purple
cats,28,red
horses,100,green
This should produce output like:
There are 4 brown rabbits.
There are 2 purple dogs.
There are 28 red cats.
There are 100 green horses.
Finally, there are 134 total animals.
Problem #2
Write a program that will print out the name of the largest file in your current directory.

Hints:

Problem #3
One of the original encryption algorithms is something known as ROT-13. It is a standard alphabetic substition. In this case, a letter is substituted for 13 characters further in the alphabet. This means that the letter a becomes the letter n, b becomes o, etc... Your assignment is to write a program that will ROT-13 encrypt input from the user.

Problem #4
This one should be easy. This program should simply de-ROT-13 user input. Ie, translate o to b, n to a, etc...

Problem #5
This program should allow you to either ROT-13 or de-ROT-13 user input, as specified on the command line. The command options should be -encrypt and -decrypt. You should leverage your previous programs to accomplish this.

Credits

Credit where credit is due. Up to the BOFH Home Page
Questions, comments, suggetions or corrections to josh@bofh.com
http://www.bofh.com/perl-class/index.html

© Copyright 1996 by Jot Powers