In this chapter, we'll learn some brief yet useful guidelines for Lisp style, followed by practical advice on tradeoffs among debugging, performance, and readability.
The art of Lisp style is simpler and more fruitful than in most other languages. Lisp's simple, consistent syntax eliminates the need for the rules of style that plague more complicated languages. And the direct, standardized availability of complex functionality within Lisp helps to drive down the size of programs, thereby providing improved readability through brevity. (In my experience, a Lisp program may range in size from 5 percent to 20 percent of an equivalent C++ program.)
Furthermore, the universal availability of Lisp-aware program editing tools -- such as Emacs or its equivalent built into many Lisp IDEs -- means that you can let the computer handle the details of indentation that are so important to the ability of a person to comprehend the structure of a Lisp program. I've said this before, but it bears repeating: you should not be programming in Lisp without the aid of a Lisp-aware editor.
So, if we don't have to worry about conventions for spelling, capitalization, indentation, and other such mundane details, then what remains for us to discuss as elements of Lisp style? How about things that are truly important? Lisp programming style is about the choice of proper abstractions, and about communicating not just with the compiler, but with other people who will eventually read your program. (For that matter, good style will help you read your own program some months or years in the future.)
A long, long time ago the capabilities of a typical Lisp implementation were very much less than what you'll find in any Common Lisp system today. After all, Lisp has been around for over forty years since John McCarthy first invented the notations (see Chapter 34). Strangely, when Lisp is taught at all in computer science curricula, it is taught using a circa-1965 view of the state of Lisp implementations: interpreted execution, limited data structures, and no real application beyond the manipulation of symbols.
Unfortunately, authors and publishers of Lisp textbooks did little to help correct these misperceptions, ignoring Common Lisp (and indeed, many of its recent forebears) in highly-recommened Lisp textbooks published as recently as 1989.
In the bad old days -- when Lisp didn't have arrays, vectors, hash tables, structures, or CLOS -- programmers learned to rely heavily on property lists as an important mechanism for structuring data. You'll still find -- in bookstores and on the shelves of college libraries -- Lisp and AI books that recommend the use of property lists as the underlying basis for looking up values identified by a symbolic key.
A property list is a list of alternating keys and values. For example:
(SEX MALE PARENTS (BOB JANE) OCCUPATION MUSICIAN)
establishes these relations:
relation value ---------- ---------- sex male parents (bob jane) occupation musicianWhen you attach these relations to a symbol, say
JAMES, then you have a way to get information related to the symbol
JAMES, namely the properties having the names
In Common Lisp you can retrieve a symbol's property using
(GET symbol property-name
and set a property using
(SETF (GET symbol
While property lists were useful substitutes for more capable
data structures in ancient Lisp implementations, they find few uses
in modern Lisp programs. One problem is efficiency. Every time you
ask Lisp to retrieve a property value, it must locate the symbol's
property list and then search the list for a matching key. If you
have five or six properties on a symbol, the search may or may not
be faster than using a hash table; the exact crossover point will
depend upon your particular Lisp implementation. The other problem
is that properties are shared among all parts of your program. It's
not too difficult to imagine two programmers using a property named
COLOR in two different ways in two parts of the same
program. Imagine their surprise when they discover the conflict...
At any rate, you should become familiar with all of the capabilities of Common Lisp, and learn to recognize that information about older Lisp implementation which still haunts us through the mists of history.
Common Lisp defines the following declarations:
Of these, only the first and last must be implemented. The rest are advisory; depending upon the implementation, the compiler may or may not honor the given advice. If you've programmed in other languages, you may find it strange that most of Lisp's declarations are advisory rather than mandatory. So let's dig a bit deeper and see what this really means to you.
Lisp by default must have the capability to determine the type of every variable at runtime. This is not to say that a sufficiently smart compiler can't infer at compile time that a variable will always be of a particular type and generate code that does not need to check types at run time. However, an actual "sufficiently smart compiler" remains an elusive creature, much like the Yeti, Bigfoot and the Loch Ness Monster.
Declarations allow the programmer to pass metainformation to the compiler. This is not part of the program, but rather information about the program. Declarations can help the compiler to generate better code by providing information about the programmer's intent.
For example, if you declare that a variable will always be a
FIXNUM (an integer value that fits in a single machine
word) then the compiler can emit code to load that variable directly
into a register in preparation for the next operation. If you
declare the result of the operation to also be a
FIXNUM, then the compiler can generate code to perform
the operation and store the result using simple machine instructions
without first checking the
actual type of the value.
Given such declarations, a good Lisp compiler can generate code
comparable to a low-level language in which operations and types in
the language map directly onto the underlying machine.
But there's a risk. If you declare certain types, and the compiler emits code that optimizes the program according to your declarations, and the program then contradicts those declarations by providing a value of a different type at runtime, then bad things will happen. Tell the compiler to expect two numbers to add, then pass it a number and a symbol, and all bets are off.
Fortunately, the declarations that guide the compiler are
themselves moderated by the
OPTIMIZE declaration. The
OPTIMIZE declaration lets you instruct the compiler
about the relative importance of certain properties of the program.
You can specify the relative importance of the
SIZE of the generated code. You
can specify whether you'd like to allow the compiler to spend extra
time doing a better job, or to emphasize
COMPILATION-SPEED. You can specify the importance of
being able to
DEBUG your program, which may cause the
compiler to produce code that is simpler or interacts well with the
Values range from 0 to 3 for the
OPTIMIZEdeclarations, with 0 meaning "totally unimportant" and 3 meaning "most important". The default value is 1, meaning "of normal importance". Bear in mind that for something to be relatively more important, something else must be less important; it won't give the compiler any useful guidance to specify values of 3 for all of the
Of all the
OPTIMIZE declarations, the most important is
SAFETY, since this affects the amount of trust the
compiler is willing to extend to your type declarations. A high
SAFETY generally compels the compiler to check
the type of every value that it can't absolutely determine at
compile time. Lower
SAFETY values put increasing weight
upon your abilities as a programmer to guarantee that type
declarations are correct, array bounds are always in range, etc.
The exact effect of declarations (with the exception of
NOTINLINE) varies among Lisp
implementations; consult your reference manual for details.
Although not required by the Common Lisp standard, almost all implementations require that you load code from a file. (The one exception that I know of is the Venue Medley environment, which normally saves the entire Lisp world when you end a session. Medley also keeps track of new definitions created in the listener and allows you to save just those definitions to a file.)
In a file-based Lisp environment, you'll normally add definitions to a file of source code. One reason for so doing is to periodically save your work; unless you're debugging FFI code or running buggy Lisp code with a low optimization value for safety, your Lisp environment will almost never crash. However, other disasters can happen -- another application could crash and bring down the system in an unprotected OS such as the Mac OS or Windows, the power could fail, or your cat could walk across the keyboard when you leave to refill your coffee.
As your program gets larger, you may find that it's useful to reload an entire source file after making a series of changes. Most Lisp environments also let you evaluate one definition at a time in any open window. This is quite useful because you can edit, then recompile, one definition at a time. But sometimes you'll forget, and then it's easier to just reload the entire file than to spend time figuring out which definition you might have forgotten to recompile after last changing its definition.
But you may also be in the midst of debugging your program when you'd like to reload its source code. If your program uses any global variables to keep track of its state, you really don't want to reinitialize these in the midst of your debugging session. So, how do you handle this? You could put definitions of your program's state variables in a separate file, but that increases your mental workload and increases debugging time by splitting clearly interrelated parts of your program into two separate files. (I know that this is an accepted practice in many programming languages, but it really does increase the amount of work you do as a programmer. Imagine how much less pleasurable reading a novel would be if the novel was delivered as a set of pamphlets, one per character, and you had to follow page references to get to the next part of the dialog.)
Fortunately, Lisp has grown through decades of real-world programming
experience, and has a very simple mechanism to handle whether variables
get reinitialized or not when you load a file. You use
to declare variables with values that need to be initialized only once.
To declare a variable with an initial value that gets set every time its
defining form is evaluated, use
One final note: as a matter of form, you should name global variables
with a leading and trailing asterisk, as in
Think of this convention as a courtesy to those who will maintain your
code at some future date.
You should define global constants using
DEFCONSTANT. From the viewpoint of reading a Lisp
program, the distinction between
DEFCONSTANT is that the value defined by
DEFPARAMETER could concievably be altered by the user
after the program is compiled, but a
will never change. A good Lisp compiler will take advantage
DEFCONSTANT declarations to perform classical
optimizations such as constant folding or compiling immediate load
Fewer Lisp programmers follow a naming convention for constants.
The one I use puts a leading and trailing plus sign on the name of
the constant, as in
Most Lisp systems include both an interpreter and a compiler; when both are available, you'll normally find that it's easier to debug interpreted code. Consult your vendor's documentation to learn how to switch between the interpreter and the compiler.
Of course, when performance is important, you'll want to run your code compiled once it's debugged. But see the earlier cautions about running buggy code with low safety settings.
When you're writing Lisp code to run on multiple platforms, it's
safest to assume that code will run interpreted unless you call
COMPILE-FILE. For this reason,
you should develop the practice of writing (or using) a system
definition procedure that first loads all of your Lisp source files,
then compiles them, then loads the compiled files. This is usually
overkill, but it's a very safe, conservative approach. With suitable
source code organization and proper use of
you can reduce the number of source files that must first be loaded;
the main idea is to ensure that all macros are defined before
compiling code that uses the macros, but there are other possible
situations that can depend upon the current state of the Lisp world.
Interpreted programs are easier to debug because it's easier for the debugger to access the actual source code at the point of an error. Once you've compiled your program, the debugger typically has less source information available; you may find yourself puzzling over a transformed version of the source code or grovelling through assembly-language instructions to find the source of the error. Fortunately, the need for such low-level debugging will be rare if you follow some simple advice:
SAFETYoptimizations on untested code.
SPEEDand higher settings for
Once your code is running well, then you should compile it and adjust the optimization declarations for performance. If you find that simply compiling your program provides adequate performance, leave it alone. If the performance of the compiled program falls far below your expectations, first improve the algorithm; optimization declarations typically have a fractional impact upon performance.
The first rule of efficiency in any programming language is to start with an efficient algorithm. It's a little harder to spot inefficiencies in a Lisp program because the underlying operations don't usually map directly onto a hardware instruction. But with a certain amount of knowledge and practice, you should be able to tell why the following four programs have radically different resource requirements.
These four programs return the sum of a list of numbers, but do
it in different ways. In each case, we test the program with the
TIME form, which reports run time and memory allocation.
Each program is tested twice, once with a list of ten thousand elements,
then again with one hundred thousand.
;; Runtime increases as the square of the number of elements ? (defun sum-list-bad-1 (list) (let ((result 0)) (dotimes (i (length list)) (incf result (elt list i))) result)) SUM-LIST-BAD-1 ? (let ((list (make-list 10000 :initial-element 1))) (time (sum-list-bad-1 list))) (SUM-LIST-BAD-1 LIST) took 2,199 milliseconds (2.199 seconds) to run. Of that, 102 milliseconds (0.102 seconds) were spent in The Cooperative Multitasking Experience. 16 bytes of memory allocated. 10000 ? (let ((list (make-list 100000 :initial-element 1))) (time (sum-list-bad-1 list))) (SUM-LIST-BAD-1 LIST) took 336,650 milliseconds (336.650 seconds) to run. Of that, 15,680 milliseconds (15.680 seconds) were spent in The Cooperative Multitasking Experience. 2,704 bytes of memory allocated. 100000 ;; Recursive version works when compiler does tail-call optimization ? (defun sum-list-bad-2 (list) (labels ((do-sum (rest-list sum) (if (null rest-list) sum (do-sum (rest rest-list) (+ sum (first rest-list)))))) (do-sum list 0))) SUM-LIST-BAD-2 ? (let ((list (make-list 10000 :initial-element 1))) (time (sum-list-bad-2 list))) (SUM-LIST-BAD-2 LIST) took 2 milliseconds (0.002 seconds) to run. 10000 ? (let ((list (make-list 100000 :initial-element 1))) (time (sum-list-bad-2 list))) (SUM-LIST-BAD-2 LIST) took 21 milliseconds (0.021 seconds) to run. 100000 ;; The recursive version can fail w/o tail-call optimization ? (defun sum-list-bad-3 (list) (declare (optimize (debug 3))) ; disable tail-call optimization (labels ((do-sum (rest-list sum) (if (null rest-list) sum (do-sum (rest rest-list) (+ sum (first rest-list)))))) (do-sum list 0))) SUM-LIST-BAD-3 ? (let ((list (make-list 10000 :initial-element 1))) (time (sum-list-bad-3 list))) > Error: Stack overflow on control stack. ;; The iterative version is not as elegant, but it's fast! ? (defun sum-list-good (list) (let ((sum 0)) (do ((list list (rest list))) ((endp list) sum) (incf sum (first list))))) SUM-LIST-GOOD ? (let ((list (make-list 10000 :initial-element 1))) (time (sum-list-good list))) (SUM-LIST-GOOD LIST) took 1 milliseconds (0.001 seconds) to run. 10000 ? (let ((list (make-list 100000 :initial-element 1))) (time (sum-list-good list))) (SUM-LIST-GOOD LIST) took 10 milliseconds (0.010 seconds) to run. 100000
The first version,
SUM-LIST-BAD-1, harbors a hidden
(ELT LIST I) must search
LIST from the beginning for each value of
I. In other words,
ELT must examine one
I is 1, two elements when
is 2, and so on. For a list of length N,
will examine almost N-squared elements. Have a look at the
runtimes for 1,000 and 10,000 elements.
The second version is coded using an awareness of how lists are
accessed; the helper function
DO-SUM calls itself
recursively with the tail of the list it is given. In
SUM-LIST-BAD-2, the runtime increases linearly with the
length of the input list. So why is this a bad example?
DO-SUM calls itself as the last form it executes;
this is known as tail recursion. Some compilers can compile
tail recursion as a jump instruction instead of a function call;
this eliminates the growth of the control (function return) stack
that would otherwise occur in a recursive call. However, the Common
Lisp standard does not require that tail calls be optimized.
The third version shows what can happen when the compiler does
not optimize tail recursion. The compiler in my Lisp system disables
tail recursion optimizations when the
optimization is set to 3. Without tail recursion optimization,
SUM-LIST-BAD-3 consumes a function call frame for each
recursive call, and the program fails -- exhausting stack space --
before reaching the end of the test list.
The final version,
SUM-LIST-GOOD, uses iteration
instead of recursion for its control loop, and walks down the input
list element by element. It runs slightly faster than
SUM-LIST-BAD-2 and doesn't fail if the compiler doesn't
support tail recursion optimization.
Avoidance is the best defense against inefficiency. Use appropriate data structures and control techniques. When you're not sure, put together a small test program and time it with a variety of inputs.
Every Common Lisp implementation will have the
that we used to show the differences in the
functions. You can use this to examine and tune small portions of a program.
Once you have assembled a larger program, you may need to find the bottleneck that causes unexpectedly low performance. For this, you'll need a profiler. A profiler watches the execution of your whole program and generates a report to show where the program spends its time. Some profilers also report on memory allocation. A profiler is not a standard part of Lisp, but most vendors provide one. Consult your vendor's documentation.
Lisp provides abstractions that help you solve problems. You'll find that you don't have to make a tradeoff between readability and performance; an efficient Lisp program is usually one that is written using the most appropriate abstractions and operations to solve a given problem.