State Design Pattern in Delphi

This session consists of the development of a small application to read and pretty-print XML and CSV files. Along the way, we explain and demonstrate the use of the following patterns: State, Interpreter, Visitor, Strategy, Command, Memento, and Facade.

Allow an object to alter its behaviour when its internal state changes. The object will appear to change its class

The State pattern is used when an object's behaviour changes at run-time depending on its state. Indicators of the potential for using the pattern are long case statements or lists of conditional statements (the Switch Statements "bad smell", to use refactoring parlance). In Delphi (as in most languages) a given object cannot actually change its class, so we have to use other schemes to mimic that behaviour, as we shall see.

The participants in an implementation are the context and the states. The context is the interface presented to clients of the subsystem being modelled by the State pattern. In our case this will be the TCsvParser class. Clients will never see the states, allowing us to change them at will. The only interface client subsystems are interested in is extracting the fields from a line of text.

We do this by using a finite state machine (FSM). Essentially, an FSM is a model of a set of states. From each state, particular inputs can cause transitions to other states. There are two sorts of special states. The Start state is the state the FSM is in before beginning work. End states are those where the processing finishes, and are usually denoted by double circles. The FSM for the parser is shown below:

In the State pattern, each of the states becomes a subclass of the base state class. Each subclass must implement the abstract method ProcessChar which handles the input character and decides on the next state.

Implementation

The interface section source code for the State pattern code to parse CSV files is:

unit CsvParser;

interface
uses Classes;

type

  TCsvParser = class;  // Forward declaration
  TParserStateClass = class of TCsvParserState;

  TCsvParserState = class(TObject)
  private
    FParser : TCsvParser;

    procedure ChangeState(NewState : TParserStateClass);
    procedure AddCharToCurrField(Ch : Char);
    procedure AddCurrFieldToList;
  public
    constructor Create(AParser : TCsvParser);
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); virtual; abstract;
  end;
 
  TCsvParserFieldStartState = class(TCsvParserState)
  private
  public
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); override;
  end;
 
  TCsvParserScanFieldState = class(TCsvParserState)
  private
  public
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); override;
  end;
 
  TCsvParserScanQuotedState = class(TCsvParserState)
  private
  public
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); override;
  end;
 
  TCsvParserEndQuotedState = class(TCsvParserState)
  private
  public
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); override;
  end;
 
  TCsvParserGotErrorState = class(TCsvParserState)
  private
  public
    procedure ProcessChar(Ch : AnsiChar;Pos : Integer); override;
  end;

  TCsvParser = class(TObject)
  private
    FState           : TCsvParserState;
    // Cache state objects for greater performance
    FFieldStartState : TCsvParserFieldStartState;
    FScanFieldState  : TCsvParserScanFieldState;
    FScanQuotedState : TCsvParserScanQuotedState;
    FEndQuotedState  : TCsvParserEndQuotedState;
    FGotErrorState   : TCsvParserGotErrorState;
    // Fields used during parsing
    FCurrField       : string;
    FFieldList       : TStrings;

    function GetState : TParserStateClass;
    procedure SetState(const Value : TParserStateClass);
  protected
    procedure AddCharToCurrField(Ch : Char);
    procedure AddCurrFieldToList;
    property  State : TParserStateClass read GetState write SetState;
  public
    constructor Create;
    destructor  Destroy; override;

    procedure ExtractFields(const s : string;AFieldList : TStrings);
  published
  end;

If we examine the parser class first, we see that we have a private instance of each of the state subclasses. In our case, where we could be parsing very long files, and the state is changing frequently, it makes sense to create all the objects once, and keep track of the current state.

If you have a situation where you have very many states (which is when this pattern really starts making a difference), especially if they are only needed occasionally, then it makes more sense to create and free the states on the fly. This might be an opportunity to use the automatic garbage collection property of interfaces, but be careful not to mix class and interface access to the state objects. It might also be a time to consider the Flyweight pattern (I'm going to refer you to the GoF for that).

Note that we are keeping track of the state using the class of the current state object. We can use a protected property (an example of the Self Encapsulate Field refactoring, as it happens) to access the field. The parser class also keeps the current field and the list of extracted fields. The states will use the protected methods to update them.

The states can manage this because the parser is passed as a parameter in the constructor. It is quite common for state objects to need access to the context in which they are being used. The base abstract state class defines methods for changing state, and updating the parser. Descendant classes only need to implement the character processing routine.

Let's have a look at one of these routines, for the start state.

procedure TCsvParserFieldStartState.ProcessChar(Ch : AnsiChar;Pos : Integer);
begin
  case Ch of
    '"' : ChangeState(TCsvParserScanQuotedState);
    ',' : AddCurrFieldToList;
  else
    AddCharToCurrField(Ch);
    ChangeState(TCsvParserScanFieldState);
  end;
end;

If we get a double quote, then the FSM goes into the Scan Quoted state, a comma means we have come to the end of the field, so we should add it to the list, and anything else means we are starting a new field.

However, in the Scan Quoted state shown below, the transition when we get a double quote is different. This is what we mean by the behaviour depending on the state.

procedure TCsvParserScanQuotedState.ProcessChar(Ch : AnsiChar;Pos : Integer);
begin
  if (Ch = '"') then begin
    ChangeState(TCsvParserEndQuotedState);
  end else begin
    AddCharToCurrField(Ch);
  end;
end;

The rest of the code is quite straightforward. The only slightly different state is the Error state, where we raise an exception. The parser has one long method, only because it has to handle validity checks, setting up, and so on. The essential lines of ExtractFields are:

  // Read through all the characters in the string
  for i := 1 to Length(s) do begin
    // Get the next character
    Ch := s[i];
    FState.ProcessChar(Ch,i);
  end;

This reads through the input line s, sending each character to the current state. Some sort of processing loop like this is not uncommon. I'll leave the rest of the code to go through at your leisure. It's all in CsvParser.pas.

Code examples