Using Common Lisp to Model and Validate Mission Critical Software

David B. Lamkins

Senior Member of Technical Staff
Dynamics Research Corporation
19545 NW Von Neumann Pkwy., Ste. 130
Beaverton, OR 97006

November 11, 1998


We used Common Lisp to build an independent model of a large C++ system, then compared the production system's operation to that of the model. This exposed several latent flaws in the production code, which we were able to correct before delivering the system to our customer. The simplicity and compactness of the Lisp model was an aid to understanding and validating system operation; two thousand lines of Common Lisp code captured the essential features of 30- to 40-thousand lines of C++ code. The model's throughput was initially within a factor of three of the production system when executed on identical platforms. At the end of the eight-week project, one simple optimization reduced the performance difference to less than ten percent. Thanks to our positive experiences, we anticipate additional uses of the model code as: a testbed for new feature development, a portable sales tool deployed on laptop computers, and a core around which a low-cost product can be developed for a new target market.


  1. Introduction
  2. Background
  3. The Premise
  4. The Experiment
  5. The Model
  6. The Results
  7. Management Perspective
  8. Follow-on Projects
  9. Conclusions


Dynamics Research Corporation (DRC) is a leading manufacturer of telecommunications fraud detection systems used by Regional Bell Operating Companies (RBOCs), private corporations, and state agencies to reduce revenues lost to intentional misuse of telephone systems. Fraud control systems examine large volumes of data -- typically tens of millions of call records per day. When undetected due to a system failure, the losses due to fraud can be upwards of $100,000 per day.

During the development of its second-generation fraud control system, DRC used Common Lisp to develop a model of the essential functional characteristics of the new fraud detection engine. The model was operated side-by-side with the production system, using the same input data and fraud detection rules. Differences in the outputs were investigated; this led to the discovery and repair of subtle design and implementation flaws which had remained undetected despite the use of traditional validation techniques such as peer review, unit testing, black-box testing, regression testing, and alpha testing.

The model was considerably smaller than the production code and therefore easier for one person to comprehend. This helped to expose mismatched interfaces which were previously undetected because of misunderstandings or miscommunications among the original developers. Also, the immediacy of code changes using Common Lisp allowed us to quickly instrument the model to answer "what-if" kinds of questions; this helped us to detect and eliminate gaps in our test data's coverage.


I was hired by DRC to develop a technical overview of the development readiness of the server side of its second-generation fraud control system, and to function in the roles of advisor, architect, and troubleshooter. Examination of the existing code revealed that the developers had a clear understanding of their subsystems' functions, but lacked certain engineering and organizational disciplines necessary to ensure the successful deployment of a complex system.

As the development and QA teams prepared the product for alpha release, I expressed my concern that -- having taken time to understand both the implementation and the group dynamics -- there would probably be certain classes of defects which had escaped, and would continue to escape, detection. I believed that we could flush out these latent errors by constructing an operational model.

The Premise

The model's design would be based upon a functional description which I had previously reverse-engineered from the production system. No code would be shared between the model and the production system. When given the same inputs, differences between the outputs of the model and the production system would expose flaws in either the model or the production system. Following identification and correction of flaws, the process would be repeated until the production system and the model produced identical results. Inspection of the model would help to establish confidence in the correctness of both the model and the production system.

There were just two problems. The first was that the production system had been under development for almost two years, and it consisted of about a quarter million lines of code. The second was that we had only eight weeks remaining before our first system delivery.

Fortunately, the production system contained a lot of code which did not directly contribute to the detection of fraud from telephone call records. Server framework code provided support for distributed parallel processing; this was essential to the performance, availability, and scalability of the production system, but irrelevant to the model. Database code supplied an interface to client software and maintained an audit trail used to substantiate allegations of fraud; again, this was important to the production system but not to the model. Finally, the client software supplied a user interface and embodied the process flow of fraud investigation (e.g. collecting and presenting related fraud events, prioritizing cases to be investigated, and recording the actions taken to resolve a case); this, too, was unimportant to the model.

The Experiment

Eliminating the framework, the user interface, and the database still left an estimated 30- to 40-thousand lines of code to be validated in the server within our eight week project timeline. The "feed" portion of the server in the production system processes telephone call records to extract, decode, and normalize fields of interest. The "detector" portion uses a rule program to detect suspected fraud in the normalized call records.

By implementing the model in Common Lisp we expected that its code could be made relatively transparent. Common Lisp's rich selection of abstract data types and control-flow constructs would let us write the model's code to emphasize the function of the model, rather than the mechanisms necessary to support its operation. Transparency of the model would prove to be important when we discovered differences between its behavior and the behavior of the production system.

The Model

The model (like the production system) reads BellCore Automated Message Accounting (AMA) records containing telephone call-completion data. For each record read, the model decodes its tagged format and populates a set of variables which are visible to the rule program.

The rule program is a set of Horn clauses. The clauses are written to match patterns in certain fields decoded from a single AMA record. Most of these patterns only suggest fraudulent activity; repeated activation within a limited time period is usually necessary for the system to generate a diagnosis.

The activity vs. time detection is aided by a memory of calls recorded by account number, call statistics, and the unique identifier of the triggered rule. The memory is updated and queried by the invocation of built-in functions that appear as clauses in the rule program. Fraud is diagnosed as the violation of a threshold -- either call count or total call duration -- for a particular rule and account number.

Additional built-in functions perform input, output, and varied housekeeping functions. Matching and backtracking are used to implement control flow and looping in the rule program.

The model's implementation consists of a compiler, a runtime system, and a (minimal) user interface.

The compiler reads a rule program file which contains Horn clauses and references to runtime functions; the runtime functions provide memory and I/O services. The compiler performs lexical analysis on the rule file all at once, and passes a buffer of tokens to a recursive descent parser. The parser generates an annotated program tree.

Various routines traverse the program tree to collect and extract rules and goals. Each rule or goal is further decomposed; their components are analyzed and inserted into macro templates which generate Lisp source code, which is then passed to the Lisp compiler to generate machine code.

The generated code is a continuation-passing style implementation of the rules program. Matching and binding of resolution variables is performed at runtime. The compiler optionally generates a listing of the compiled Lisp source code for inspection or verification. The machine code may be inspected using Lisp's disassembler.

The runtime system implements the detector's built-in functions for I/O, memory, and comparisons. Lower-level support functions provide runtime matching and binding for resolution variables, plus platform independent program tracing.

The I/O system reads telephone call record files and fills global variables using the information decoded from each record. These variables are read under control of the rule program. Just prior to reading a new record, every variable is initialized to a value which, if used, will force a parameter mismatch. This provides a runtime guard against uninitialized or stale data and is helpful in flushing out rules which reference unset variables.

All built-in functions are implemented as CLOS methods with their arguments specialized the same way as in the production detector. A default method for each built-in function, not specialized on any arguments, will catch and signal parameter mismatches at runtime.

The user interface consists of a single Lisp function which compiles a specified rule file and uses the resulting program to process a directory full of call record files. Output is logged for later comparison with comparable tests performed on the production system.

The Results

I wrote the first version of the model in just three weeks. While still developing the model, a unit test uncovered flaws in the fraud detection rules.

Running interpreted under CMUCL 18a on a Sun UltraSparc 1, the initial version of the model managed to process only 100 call records per minute. This was one third of the performance obtained on a Power Macintosh 7200/75 running MCL 4.2, which was where most of the initial development was done. I realized then that I had been spoiled by MCL's "always-on" compiler. The second release, therefore, was all about file compilation; the performance improved by a factor of nine on the Sun/CMUCL platform (and, as expected, not at all on the Macintosh/MCL platform).

To my surprise, the second release had a memory footprint which was only one-third larger than the corresponding subset of production components on the Sun platform, and the model's performance was already within a factor of three of the production system. I found this performance particularly surprising since I had made no effort to optimize performance beyond the use of file compilation.

The remaining five weeks were spent running comparison tests between the production server and the model. About once a week we would complete another series of tests, then reconcile differences. Sometimes the model was wrong, other times the production server was wrong. All of the defects found and corrected in the production server were in areas that had been tested previously using other techniques.

By the end of the sixth week, the model and the production server were functioning identically. We spent the next two weeks scrutinizing the model for further design flaws and found none. However, we did add additional instrumentation to the model, using it to find and eliminate gaps in our test coverage.

In the eighth week, having achieved the project's original objectives, I turned my attention to improving the performance of the model. In an hour's time, with the aid of a profiler, I identified one function which was responsible for three quarter's of the model's storage allocation. Moving the offending computation to compile time -- which is where it had belonged all along -- brought the throughput of the model to within ten percent of the production server's throughput on identical platforms.

Management Perspective

Why did my manager support this project? From a business perspective, he was not concerned with the long term viability of this project; he supported the short-term approach of validating the production system against an independent model.

I had previously established my credibility through a successful four year working relationship with this manager, during which time I had led other short-term projects using Common Lisp as an appropriate tool. The lack of readily available Common Lisp development talent was not an issue, since this project had a short, well-defined lifespan.

The project began to yield useful results three weeks after its inception, when it exposed the first previously unknown defect in the production system. Because Common Lisp provides a highly productive environment for rapid development, additional improvements were typically made in a matter of minutes, rather than the hours required for corresponding changes to the production code.

The project has proven itself as a QA vehicle; it has helped us to eliminate several subtle defects from the production system prior to its delivery to the customer, and it has further helped to establish our confidence in the correctness of the system through inspection of the short, relatively transparent model code. The success of this project has opened the door to several follow-on projects.

Follow-on Projects

The model will serve as a testbed for proposed enhancements to the fraud detection component of the next product release. We will be able to test changes quickly thanks to the small size of the model and the rapid turnaround inherent in the Common Lisp development environments.

The availability of Common Lisp for laptop computers will allow us to deploy the model as a sales tool. The model will be provided for use by potential customers, who will supply the model with sample data from their own telephone switches. The indications of suspected fraud would give the prospective customer an indication of the magnitude of undetected losses and a baseline by which to evaluate the business benefits of DRC's production fraud detection system.

Because the model's performance is surprisingly good, we may invest some additional effort to improve its performance and package it in a low-end system suitable for use by Competitive Local Exchange Carriers (CLECs); there are over 1,200 known CLECs providing phone service in the United States alone.


Conventionally, software is validated by comparison with an independently developed system only in cases where the system has an impact upon human life or safety. Our experience has shown that Common Lisp makes the technique accessible to software development organizations in which cost and time to market are important issues. I strongly recommend the use of this technique to validate complex algorithmic code, a domain in which Common Lisp does very well.

Common Lisp provided a means for one person to construct an independent, fully-functional model of a subsystem containing 30- to 40-thousand lines of C++ code. The model consists of about two thousand lines of Common Lisp code. Its throughput is comparable to the production C++ code on equivalent platforms despite an emphasis on coding the model for clarity and abstraction at the expense of performance.

Much of the credit for this achievement is due to Common Lisp's rich set of built-in data and control abstractions; these supported me in my efforts to express the functions of the model without first having to build the code for supporting mechanisms and infrastructure. Furthermore, the Common Lisp specification is stable and has been mostly based upon proven practices; this meant that I didn't have to deal with the problems of library and compiler glitches and instabilities which hounded the developers of the C++ production system. The good performance of the unoptimized model is probably due to good compiler technology and a close integration between the Common Lisp compiler and runtime libraries.

The success of any project is measured by its results. The project was designed to expose latent flaws in a large system. The first useful results appeared within three weeks of the project's inception. Over the next three weeks, the model and the production system were brought into agreement for all test cases. During the final two weeks, inspection of the model helped us to develop confidence that the production system is correct. Because the initial results were achieved with time to spare, we obtained an unanticipated benefit by instrumenting the model to audit and improve test coverage; both the model and the production system passed the additional tests, further reinforcing our confidence in the correctness of both.

Going forward, we have identified several additional uses for the model: as a testbed for development of new features, as a sales tool to quantify business benefits for potential customers, and as the core of a low-cost product for a new target market. Although we anticipated none of these uses at the project's inception, I find some satisfaction in the knowledge that all of these additional uses will be realized by making incremental changes to the model's code.