In the last Perl Practicum, I covered the new Perl5 construct, references. Because the syntax and usage for references are confusing in places, I promised an extended example that would cover, in a real-world example, all the material I had introduced.
The problem is to "marshall" data into format such that if we do
$string = marshall($some_ref); eval("\$other_ref = $string");then the data structure pointed to by
$other_ref
will
have the same contents as the data structure pointed to by
$some_ref
. If this is true, then we can save
$string
to a file and read it back in later in some other
application. We use this kind of functionality here at NetMarket to
allow Web CGI applications to share data because the format is
extremely portable.
Remember from last time that the way to create a reference to an anonymous list was
$a_ref = [1, 2, 3];where the elements of the list could be lists (or associative arrays, etc.)
$other_ref = [1, ["a", "b"], 3];In general, then, an anonymous list reference is assigned with
$some_ref = [EXPR1, ..., EXPRn];where
EXPR1, ..., EXPRn
are either a scalar or a
reference to some other complex data structure (list, associative
array, list of lists, etc.).
This is where recursive thinking comes in. We assume that we already
have a marshall()
function which can encode EXPR1,
..., EXPRn
properly. With this simplifying assumption, we can
quickly write a function that creates a string containing the
righthand side of the expression above. All we have to do is put
commas after successive calls to the marshall()
function
and put square brackets around the whole affair:
sub encode_list { my($list_ref) = @_; my($string); $string = "[ "; for (@$list_ref) { $string .= marshall($_); $string .= ", "; } $string .= "]"; return($string); }Notice the
@$list
in the for loop: remember that you can
use a scalar reference any place you would use the name of an
identifier. Yes, there is a dangling comma after the last element of
the list - Perl ignores it. If you are thinking this is all handwaving
and I have not even begun solving the problem, bear with me.
$mail_info = { "hal" => "[email protected]", "tina" => "[email protected]", "rob" => "[email protected]", };Here the values in the hash can actually be references to complex data structures, but the keys have to be scalars. To state things generally, we assign hash references with expressions like:
$some_ref = { KEY1 => EXPR1, ..., KEYn => EXPRn };We are already assuming we have
marshall()
lying around
to encode EXPR1, ..., EXPRn
. I promise that I will write
an encode_scalar()
function next to handle encoding
KEY1, ..., KEYn
. With those two assumptions, encoding an
associative array reference is now just a matter of putting commas in
the right place:
sub encode_hash { my($hash_ref) = @_; my($string, $key); $string = "{ "; foreach $key (keys(%$hash_ref)) { $string .= encode_scalar($key); $string .= "=> "; $string .= marshall ($$hash_ref{$key}); $string .= ", "; } $string .= "}"; return($string); }
$mail_info = { "hal" => "[email protected]", "tina" => "[email protected]", "rob" => "[email protected]", };You would guess from this example that you just throw quote marks around the value and be done. Well, suppose your scalar value is one of these strings:
got"ya $variableYou will end up with encodings like:
"got"ya" # dangling quote "$variable" # evaluates $variableThe safest thing to do is to backslash every nonalphanumeric character in the scalar:
sub encode_scalar { my($scalar) = @_; $scalar =~ s/(\W)/\$1/g; return(" }Now our strings will get encoded as:
"got\"ya" "$variable"
marshall()
routine. It turns out that we can define
marshall()
in terms of the encoding routines:
sub marshall { my($thing) = @_; $type = ref($thing); if ($type eq "ARRAY") { return(encode_list($thing)); } elsif ($type eq "HASH") { return(encode_hash($thing)); } elsif (!$type) { return(encode_scalar($thing)); } else { die("Can't handle $type\n"); } }Remember that
ref()
is a function that returns what type
of data type its argument points to, returning undef
if
its argument is not a reference.
How can this possibly work? Things will become clearer when we walk through a simple example. Suppose we do:
$simple = [1, ["a", "b"], 3]; $output = marshall($simple);In the first call to
marshall()
, $simple
is
identified as a reference to a list and marshall()
calls
encode_list()
.
The encode_list()
function starts building up
$string
. First the function sets $string
to be the
initial opening bracket. Then encode_list()
begins
walking through the list and calling marshall()
on each
element.
The first argument of the list is a scalar, 1. When
encode_list()
calls marshall()
on this
element, marshall()
immediately calls
encode_scalar()
, and encode_scalar()
returns
the string:
"1"
marshall()
simply returns this value back up to the
encode_list()
function, which appends it to the value of
$string
. At the end of one iteration of the loop,
$string
looks like this:
[ "1",Now
encode_list()
calls marshall()
on the
second argument of the list. This argument is a list reference, so
marshall()
calls encode_list()
recursively.
This second call to encode_list()
starts building a new
$string
. First it initializes this new
$string
with the opening square bracket. Next it calls
marshall()
on each of the list elements in turn, both of
which are scalars. After the first iteration of the loop, the new
$string
looks like this:
[ "a",After the second iteration, we have:
[ "a", "b",The loop terminates and the
encode_list()
function
appends a closing bracket and returns
[ "a", "b", ]to the
marshall()
function that called it originally. In
turn, marshall()
returns this value to the original
encode_list()
call. This function appends the string
above to its own $string
, which now looks like this:
[ "1", [ "a", "b", ],This
encode_list()
function now moves onto the third and
final element of the list in this example. This is just another
scalar, so after the third iteration we have this:
[ "1", [ "a", "b", ], "3",We have exhausted the elements of the example list, so we fall out of the for loop.
encode_list()
appends the closing bracket
and returns the string:
[ "1", [ "a", "b", ], "3", ]Try working out a simple example for yourself, possibly with an associative array this time. Or just type in the code and try running some examples.
marshall()
function. First, we build a "jump table": an associative array that
links certain keywords to various function references. In this case,
we link the strings returned by ref()
to the function
used to encode each data type:
%encode = ("SCALAR" => \&encode_scalar, "ARRAY" => \&encode_list, "HASH" => \&encode_hash,);
marshall()
now becomes extremely terse:
sub marshall { my($thing) = @_; my($type, $func_ref); $type = ref($thing) || "SCALAR"; $func_ref = $encode{$type}; return(&$func_ref($thing)) if ($func_ref); die("Can't handle $type\n"); }First we assign
$type
to be whatever gets returned by
ref()
or SCALAR if ref()
returns
undef
. Next we extract the appropriate function reference
from %encode
, and call that function on the argument to
marshall()
. If we cannot find an appropriate function to
call, we die()
.
To make this even more terse, remember that you can use a block inside of curly braces in place of a scalar reference. In other words, we can use
{$encode{$type}}in place of
$func_ref
. Our new version of the function
is:
sub marshall { my($thing) = @_; my($type); $type = ref($thing) || "SCALAR"; return(&{$encode{$type}}($thing)) if (defined($encode{$type})); die("Can't handle $type\n"); }I generally hate using extra variables, but the function above is hard for the human eye to comprehend.
Reproduced from ;login: Vol. 21 No. 2, April 1996.
Back to Table of Contents
12/3/96ah