And so our journey begins...
Perl is a "Practical Extraction and Report Language" that is freely available for many platforms. It fills the niche between the classic interpreted shell languages (sh, csh, DCL) and the compiled languages (C, Pascal, ...). Perl itself is an interpreted language, but it compiles on the fly, so as to give you reasonably fast programs without sacrificing source code availability.
This class assumes that participants have some programming experience in programming languages such as C, Pascal or Basic, and access to a system with Perl 5 installed. After having completed this course, participants should expect to be able to do the following:
Data Structures
In Perl there are a number of different kinds of variables that you need
to be aware of. These are as follows:
$bob # Standard scalar variable, can be string or number BOB # Filehandle @bob # A numerically subscripted array %bob # A string subscripted array &bob # A subroutine
$bob = "I hate Fred"; # A string of characters $bob = "I"; # A single character $bob = 1; # A number $bob = 3.1; # Pi defined in Tennessee $bob = 6.023e23; # Number in scientific notation $bob = "I hate $fred"; # A string with interpolation $bob = `uname -n`; # A commandOne of the wonderful things about Perl, is that it doesn't care what you mean until you say it. As a matter of fact, you could do one of these as an assignment and the immediately follow it with a different one, and Perl would happily comply.
Looking Ahead: You'll notice that I included a ; after every command. This indicates the end of a command to Perl. You'll also notice that I used a # after the semicolon for my comments. # is the comment character in Perl.
By default the Perl line separator is a carriage return (\n). This is very convenient, as most text files have lines that are delimited by carriage returns, and users have been trained to hit enter or return after typing in input. This allows us to do convenient actions like the following:
$answer = <STDIN>;
This says to put whatever the user typed in, up to and including the carriage return, into the variable $answer.
Note: This last point bears repeating. Perl tends to keep all of the users input, including the carriage return. I guarantee that at least once, this will bite you. To get around this, you want to make sure that you use the chomp function to remove carriage returns. The syntax is as follows:
chomp $answer;
As for output, you will normally either want to print to a file, or to the users screen. To print to the screen, you will normally do something like this:
print STDOUT "Hello world\n"; print STDERR "Hello world\n"; print "Hello world\n";"What is that!?!?", you say? It would appear that that third print statement is missing a filehandle to print to. In fact, this will print to to the currently selected filehandle, usually STDOUT . Perl tries to assume logical defaults, so that you do not have to type as much. In this case it assumes that by defaults prints should go to STDOUT.
Now, if you want to print to a file, you need to do a couple of extra things. First, Perl needs a way to know what file you want to either read from, write to, or append to. Here is how this is done for all three cases.
open(INPUT,"input.txt"); open(OUTPUT,">output.txt"); open(APPEND,">>append.txt"); print OUTPUT "Hello world\n"; print APPEND "Hello world\n"; close(INPUT); close(OUTPUT); close(APPEND);You should notice several things here:
The numerically subscripted array looks like this:
@bobWith individual elements set or retrieved as follows:
$bob[0] = "fred"; $bob[2] = "jane"; print "$bob[2]\n";
Note: Please don't use @bob[1]. This is a common mistake and it doesn't mean anything close to $bob[1].
If you want to know how many elements are in an array, you can use the $#bob to find out .
Now, several other things you may want to do are illustrated below:
$bob[0] = 1; # Set the first element of @bob to 1 $bob[1] = 2; # Set the second element of @bob to 2 $bob[2] = 3; # Set the third element of @bob to 3 @bob = (1,2,3) # Same result as the previous 3 examples @bob = (6..9); # This is the same as: @bob = (6,7,8,9); @fred = (2,3) # Set $fred[0] = 2 and $fred[1] = 3 @bob = (1,@fred,4) # Same as @bob = (1,2,3,4)The other array type is the associative array (or hash array). These arrays use scalars to index. Examples are as follows:
$weekdays{'Wednesday'} = 4; # Wednesday is the 4th day of the Week
$daysofmonth{'January'} = 31 # There are 31 days in January
In order to address the whole array for an associative array, you use
%arrayname. This means in our last examples we would have
had %weekdays and %daysofmonth.
Relational Operators
One of the places that people get into the most trouble is using
the wrong operator in the wrong context. Or, equally common mistake
is to use an assignment where you mean relational operator. Here are some
examples:
if ($bob = 1) { return;} # This is wrong
if ($bob == 1) { return;} # This is right. (Assuming $bob has a number)
The difference between these two is that the first one is assigning
$bob the value 1, whereas the second one is test to see if
$bob is equal to 1.
Looking Ahead: You'll notice this is the first use of a conditional in our examples. The use of if indicates you should only do what is in the {}'s if the argument in the ()'s is true. In Perl, the {}'s take the place of the IF ... THEN DO ... END; ELSE IF ... THEN DO ... END.
Note:Perl does have an ELSE IF conditionaly, but it is elsif. Notice that there is only 1 e! It won't work as elseif and you'll get a red spot on your head from beating it on the table if you try to use elseif.
The following is a list of relational operators that are commonly used:
Numeric String Meaning ------- ------ ------- == eq Equal to != ne Not equal to > gt Greater than >= ge Greater than or equal to < lt Less than <= le Less than or equal to <=> cmp Not equal to, with signed resultIt is very important that you not confuse the numeric relational with the string relational, otherwise you're not going to get what you want. For instance, let's look at the following program:
#!/usr/local/bin/perl
$bob = "aaa"; # Make $bob a string
$fred = "bbb"; # Make $fred a string
if ($bob == $fred) {print "This is probably not what you wanted.\n";}
if ($bob eq $fred) {print "This is not true.\n";}
if ($bob ne $fred) {print "This is true.\n";}
Which produces the following:
node72% perl /tmp/test-program This is probably not what you wanted. This is true.
Here, the first example prints something, the second will not and the third will. The first prints because the two strings do not make sense in a numeric context. The second will not print because in a string sense, $bob and $fred are not equal. The third does print because they are not equal.
Note: Check it out, our first fully functional Perl program! Of course it doesn't do much, but it is a complete program. If you'll notice I chose to execute this by typing perl programname. You could also do the following, and get the same result:
node72% chmod 755 /tmp/test-program node72% /tmp/test-program This is probably not what you wanted. This is true.
Loops and I/O
More often than not, the reason that you want to write a program is to
do some repetitive and menial task while you surf the net.
;)
To do this you need to have handy loops. Here are a few of the more popular:
for(i=1;i<=7;i++) # C style for(1,2,3,4,5,6,7) for(1..7)These all say the same things, just in different ways that might be easier to understand.
foreach $fred (@bob)What this does is step through @bob and put each successive element into $fred
while(<>) {
chomp;
print "The line was: $_\n";
}
while(@ARGV) {
$bob = shift @ARGV;
print "The argument is: $bob\n";
}
OK, I admit it, I'm cheating. I'm sneaking in little lessons you can use
later into my examples, to confuse you and to make me look smarter than I
am. ;)
Let's go over these examples in a little detail.
There are two places in this first example that I use defaults that really reduce readability if you aren't familiar with Perl, but save several characters of typing. They are:
What I Said What it Meant ----------- ------------- <> $_ = <STDIN> chomp; chomp $_;The other concept I'm introducing is $_. $_ is a variable just like $fred or $bob. The difference is that Perl uses it a lot. As a matter of fact, most of the time if you don't specify a variable to use for a function or operation that requires one, it will use $_ by default. Put more precisely $_ is the default input and pattern-searching space. This means, if no variable is specified for either input or a pattern search, $_ is assumed.
This first program just reads from STDIN, gets rid of the newline and then prints what it got and adds a newline back on. Not very useful, but it illustrates an example.
The second example is only a little more subtle in trying to slide in helpful lessons for later. The first question is what is this @ARGV thing? It is the list of all arguments passed on the command line to the program. If you write programs that you want to have different actions depending on what the users provide as switches this is how you will do it. This program goes through the list of arguments, pulls them out of @ARGV and prints what they were. Again, not very useful in what it does, but the loop is very useful.
Note: Unlike C, Perl does not include the program name in the argument list. So, $ARGV[0] is NOT the name of the program. The name of the program actually is stored in $0.
Pattern Matching
I sincerely hope I haven't lost you to this point. We are about to start
on pattern matching, or regular expressions. These are arguably the heart
and soul of Perl. Before you get used to them, you'll hate them, once you
know them, you'll love them.
REGular EXPressions (regexp) are really almost a separate language. So, let's cover the vocabulary of the language first, then we'll move to the grammar. Here are the basic elements:
Expression Meaning ---------- ------- . Matches any character except newline [a-z] Matches any character in the set [^a-z] Matches any character not in the set ^ Negation or beginning of line $ Anchor to the end of the line \d Matches a digit => [0-9] \w Matches any word character => [a-zA-Z0-9_] \s Matches any whitespace (tab, space, newline) \n Matches a newline \D Matches any non-digit => [^0-9] \W Matches any non-word => [^a-zA-Z0-9_] \S Matches any non-whitespace x? Matches 0 or 1 occurrences of x x* Matches 0 or more occurrences of x x+ Matches 1 or more occureneces of x () Used for later backreferencesThe easiest way to understand regular expressions is to just jump in and see how they work. It's painful, but necessary. Let's take make a list of potentially interesting strings that we are interested in manipulating.
6.2345 90 30 400,600 300,90 /net/switchyard/disk/disk3/local_design/l141_NaNHere are some sample code fragments to parse these three examples:
if ($_ =~ /([0-9]+)\.([0-9]+)\s+([0-9]+)\s+([0-9]+)/) {
$before_decimal = $1; # 6
$after_decimal = $2; # .2345
$second_number = $3; # 90
$third_number = $4; # 30
}
if (/([0-9]+,[0-9]+)\s+([0-9]+,[0-9]+)/) {
$first_pair = $1; # 400,600
$second_pair = $2; # 300,90
}
if (m#^(.*)/disk3/local_design/(.*)$#) {
$beginning = $1; # /net/switchyard/disk
$design_name = $2; # l141_NaN
}
Yep, you guessed it. Another insidious attempt on my part to sneak
more information into easy examples. Especially nasty is my use
of defaults again.
Perl has two pattern matching operators, substition and matching. In these examples we are just using the matching operator. This operator is m//. However, if you just use the // for indicating the beginning or end of your search region, the m is assumed. The final example illustrates using something other than // for a pattern delimiter. The reason we chose to do that is because our example had extensive use of the / in it. This would have required us to escape all of those /'s with a \. This would have made our third example look as follows:
if (/^(.*)\/disk3\/local_design\/(.*)$/) {
$beginning = $1; # /net/switchyard/disk/disk3
$design_name = $2; # l141_NaN
}
This is more difficult to read that the original solution. Also, you'll
notice that I didn't indicate what string we should be doing pattern matching
on, except in the first example. As was indicated before, a good assumption
is that the default is $_.
Finally, you'll notice that the relational operator doesn't appear in our original list. The regexp relational operators are:
Operator Meaning -------- ------- =~ Equal to !~ Not equal to
mroe teh quik borwn fox jumped over teh lazy dre god hi mom convienentLet's assume we have these lines in a file, and we want to write a program to fix all of our spelling mistakes.
#!/usr/local/bin/perl
# examples/fix-spelling.pl
$filename = "bob.txt";
$output = "fred.txt";
open(INPUT,"$filename");
open(OUTPUT,">$output");
while (<INPUT>) {
chomp;
s/mroe/more/;
s/teh/the/g; # Global
s/quik/quick/;
s/borwn/brown/;
s#convienent#convenient#;
s/dre/red/;
s/god/dog/;
s/hi mom//; # Erase the "hi mom"
print OUTPUT "$_\n";
}
close(INPUT);
close(OUTPUT);
And, when we run the program look what we get:
node72% ./test-program node72% cat fred.txt more the quick brown fox jumped over the lazy red dog convenientThe only real trick that you haven't seen before is the use of the the s///g. The g indicates to do a global replace. Normally the replace will only replace the first one that it encounters, but since we needed to fix both occurences of teh we needed to use the g .
Subroutines are great for when you have a task you would like to do within a program, and you want to do it over and over again. You could just duplicate the code a bunch of times, but that could severly cut into your net surfing. ;) This is the perfect application for a subroutine. So, now let's get to some contrived little example to show how to do subroutines.
Let's say you have an input file that looks something like this:
bob 10 10 20 fred 20 20 5 90 1000 jane 5 4 30The first column is the name of one of your "friends" and the numbers are how much money you've won from them the last several poker games that they haven't paid up on. Now, before you decide to send "Kneecap Manny" after them, you want to send them one last bill so they can pay. So, we're going to write a program to go through this list, and total up all of the money they owe you. Here goes:
#!/usr/local/bin/perl
# examples/manny.pl
$input = "manny.inp"; # The data file from our example
open(INPUT,"$input"); # Open the file to read
while(<INPUT>) { # While we still have moochers
chomp; # remove the \n
if (/([a-z]+)\s+(.*)/) { # Parse up the line into the name and the rest
$total = &total_loot($2); # Call our subroutine with the numbers
# Print a warning message
print "Listen up $1, you little worm, send me $total dollars before\n";
print "I send Kneecap Manny for you!\n\n";
} #end if
} #end while
close(INPUT); # Close our filehandle
# Our subroutine that returns the total that they owe us
sub total_loot {
local (@mulah,$total,$amt); # Setup local variables so that they
# don't interfere with our globals
@mulah = split(/\s+/,$_[0]); # $_[0] contains the rest of the line. We
# want to get those numbers into our array
# so that we can add them
foreach $amt (@mulah) { # For every number in our array
$total += $amt; # add it to the total
}
return $total; # return the total
} # end total_loot()
If we run this program on our input, we get:
node72% ./test-program Listen up bob, you little worm, send me 40 dollars before I send Kneecap Manny for you! Listen up fred, you little worm, send me 1135 dollars before I send Kneecap Manny for you! Listen up jane, you little worm, send me 39 dollars before I send Kneecap Manny for you!So, this program has a small main loop (the while loop) and it calls our subroutine total_loot for every line in our file that matches our basic pattern. Pretty simple, eh?
# This is a comment with donkeys,2,fuscia that should be ignored rabbits,4,brown dogs,2,purple cats,28,red horses,100,greenThis should produce output like:
There are 4 brown rabbits. There are 2 purple dogs. There are 28 red cats. There are 100 green horses. Finally, there are 134 total animals.
Hints:
Credits
Credit where credit is due.
© Copyright 1996 by Jot Powers