Interpreter Design Pattern in Delphi

This session consists of the development of a small application to read and pretty-print XML and CSV files. Along the way, we explain and demonstrate the use of the following patterns: State, Interpreter, Visitor, Strategy, Command, Memento, and Facade.

Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the grammar

The term “interpret” here is pretty broad. In a BASIC interpreter, it would mean to go through executing instructions in some runtime environment. However, we can use an interpreter for other things that require an understanding of the structure of a language.

This pattern works best if that structure is not too complex. Because we will define a class for every element of the grammar, the class hierarchy can get very large (usually very shallow and very wide). It can be a quite inefficient way to represent and work with the data. In my opinion, these are also the conditions under which recursive descent compilers are appropriate, so they can be a good match. But you would not write a Delphi compiler this way. The Dragon Book discusses more efficient methods.

However, for us, with our small grammar, the Interpreter pattern is fine.

Grammar

We will not be able to parse all XML documents. In particular, we will ignore DTDs, attributes, the structure of the contents of a prolog, escaped characters (e.g. <) and empty element tags (e.g. ). We will be able to cope with empty files, though.

I've used a variant of Backus Naur Form (BNF) to define the grammar. Here an empty string is denoted by ε, zero or more occurrences by * (called Kleene closure, in case you're interested), one or more by + (positive closure), and zero or one by 0..1 (no known aliases).

XmlDoc     -> Prolog0..1 TagList0..1
Prolog     -> 
TagList    -> Node*
Node       -> StartTag [Data | TagList] EndTag
StartTag   -> 
EndTag     -> 
PrologData -> [Any printable characters except <,>,/ and ? ]*
Data       -> [Any printable characters except <,> and / ]*
TagName    -> [Any printable characters except <,>,/,space ]

An example of the sort of file that we will be able to interpret is:



Stuff 1 is here
Stuff 2 is here
Stuff 3 is here
Stuff 4 is here

Implementation

The Interpreter pattern normally requires a client to build the syntax tree. In our case, this will be the XML parser in XmlParser.pas. Time doesn't permit a detailed description of this, but briefly, we have tokens which will be single characters, or the end of document marker. The lexical analyser class extracts these, and passes them to the XML parser. Like all recursive descent parsers, this has a procedure declared for each element of the grammar. These procedures check for appropriate tokens and report errors if necessary, and call procedures corresponding to other grammatical structures when they should appear in the source text. This parser adds nodes to the syntax tree as necessary.

The syntax tree is the essence of the Interpreter pattern. Astute readers will notice that it is a special case of the Composite pattern. The basis of the tree is the abstract expression class, which defines an abstract method to perform the Interpret operation. In our case this will be a search and replace (I told you the definition could be pretty broad). We will allow the operation just in data or in both tags and data. This is where the requirement to understand the structure of the document comes in.

We then go through our grammar defining subclasses, for each grammar element. There are two types of classes, although I don't see the point of defining them in code, as there is not normally any difference between them that can be inherited. The first type is for terminal expressions, which are grammar elements that cannot be reduced further. In our grammar these are the PrologData, Data and TagName elements. These classes in fact turned out to be so trivial that I ended up refactoring them out, and they are now just string properties of the relevant non-terminal expression classes.

There is one class for each of the other grammar elements. Besides implementing the SearchAndReplace method, these classes have as properties instances of other expression classes from which they are constructed. The declarations of the interpreter classes are as follows (ignore the Accept routine for now).

// Forward declaration of base visitor class
TXmlInterpreterVisitor = class;
 
// Abstract base expression class
TXmlExpression = class(TObject)
private
protected  
  function DoSearchAndReplace(const TargetStr,SearchStr,ReplaceStr : string) : string;
public
  // Declaring these methods abstract forces descendant classes to implement them
  procedure SearchAndReplace(const SearchStr,ReplaceStr : string;
                                       DoTags : Boolean = False); virtual; abstract;
  procedure Accept(Visitor : TxmlInterpreterVisitor); virtual; abstract;
end;

TXmlStartTag = class(TXmlExpression)
private
  FTagName : string;
protected
public
  procedure SearchAndReplace(const SearchStr,ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property TagName : string read FTagName write FTagName;
end;

TXmlEndTag = class(TXmlExpression)
private
  FTagName : string;
protected
public
  procedure SearchAndReplace(const SearchStr, ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property TagName : string read FTagName write FTagName;
end;

TXmlTagList = class;
 
TXmlNode = class(TXmlExpression)
private
  FStartTag : TXmlStartTag;
  FData     : string;
  FTagList  : TXmlTagList;
  FEndTag   : TXmlEndTag;
public
  destructor Destroy; override;
  procedure SearchAndReplace(const SearchStr, ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property StartTag : TXmlStartTag read FStartTag write FStartTag;
  property EndTag   : TXmlEndTag read FEndTag write FEndTag;
  property Data     : string read FData write FData;
  property TagList  : TXmlTagList read FTagList write FTagList;
end;
 
TXmlTagList = class(TXmlExpression)
private
  FList : TObjectList;   
  function GetItem(Index : Integer) : TXmlNode;
protected
public
  constructor Create;
  destructor  Destroy; override;
  function Add : TXmlNode;
  procedure SearchAndReplace(const SearchStr,ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property Items[Index : Integer] : TXmlNode read GetItem; default;
end;

TXmlProlog = class(TXmlExpression)
private
  FData : string;
protected
public
  procedure SearchAndReplace(const SearchStr,ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property Data : string read FData write FData;
end;

TXmlDoc = class(TXmlExpression)
private
  FProlog  : TXmlProlog;
  FTagList : TXmlTagList;
protected
public
  destructor Destroy; override;
  procedure Clear;
  procedure SearchAndReplace(const SearchStr,ReplaceStr : string;
                                                DoTags : Boolean = False); override;
  procedure Accept(Visitor : TxmlInterpreterVisitor); override;
  property Prolog  : TXmlProlog read FProlog write FProlog;
  property TagList : TXmlTagList read FTagList write FTagList;
end;

// Equates to Client in the Interpreter pattern
TXmlInterpreter = class(TObject)
private
  FXmlDoc : TXmlDoc;
protected
public
  constructor Create;
  destructor  Destroy; override;
  property XmlDoc : TXmlDoc read FXmlDoc write FXmlDoc;
end;
 
EXmlInterpreterError = class(Exception);

Note how the class definitions follow the grammar. The only variation is TXmlTagList which includes a function to add new nodes to the list. Oh, and TXmlDoc has a method to allow us to clear the syntax tree. Note that any lists we define are of type TObjectList, and they are constructed such that they will free the items in the list when they themselves are destroyed.

If we have a look at a couple of examples of the SearchAndReplace method, we will see the power of this pattern. Here is the version in TXmlDoc:

procedure TXmlDoc.SearchAndReplace(const SearchStr, ReplaceStr : string;
                                                                  DoTags : Boolean);
begin
  if Assigned(Prolog) then begin
    Prolog.SearchAndReplace(SearchStr,ReplaceStr,DoTags);
  end;
 
  if Assigned(TagList) then begin
    TagList.SearchAndReplace(SearchStr,ReplaceStr,DoTags);
  end;
end;

All we do is call the same method on the elements that make up this expression, that is, the Prolog and TagList properties. In this case, since they are optional, we check if they're assigned first. This is the case in the other classes whenever there is a non-terminal expression. Whenever there is a terminal expression, we can actually perform the operation. For instance, here is the end tag method:

procedure TXmlEndTag.SearchAndReplace(const SearchStr, ReplaceStr : string;
                                                                  DoTags : Boolean);
begin
  if not DoTags then begin
    Exit;
  end;
  TagName := DoSearchAndReplace(TagName,SearchStr,ReplaceStr);
end;

The DoSearchAndReplace method is declared in the base class, as it is used in several places.

There is one last participant that you may sometimes need, which is the context. This holds any global information that the interpreter needs. We haven't got any, so there isn't one in the example. If you do have one, it is normally passed as a parameter to the interpret operation methods.

And that's it. Just define classes for each expression in the grammar, which have properties corresponding to the sub-expressions. To implement an interpret operation define an abstract method in the base expression class. This forces descendant classes to implement it. In each implementation, call the method on all the sub-expression properties. For more complex operations than our example, there may have to be work done in some of the methods as well. And although we didn't need it, sometimes it's useful to have a reference to the parent node.

You can see that extending the grammar is pretty easy; you just need new classes, each of which is very similar, so the implementation is quite easy. At least, this is true until the number of classes gets too high – you would have hundreds in a Delphi interpreter, for instance. There are a number of issues about how to handle child operations, too, that Interpreter has in common with Composite, but we won't go into them here.

Also, if you need to add several ways to interpret the syntax tree, then you will find that the code is practically identical for each. In that case, it's time to use a different pattern.

Code examples