2014/10/21

Scripting, Part 3: Implementing the Interpreter

"Get coding already!" I hear you scream, and it is finally time to comply. For now we'll look at the underlying algorithm in detail, rather than the interpreter's interface. I'll trust you with writing that up yourself. All code shown here is written in C# for .NET framework 4.5, though you should be able to follow along if you know any other C-like language.

The byte code pattern

This pattern is the meat of our interpreter. The idea is pretty simple and essentially how CPUs work (although it's implemented in hardware there). Instructions are stored in a simple binary format and are processed one after another by simply going through the binary code:
 
 byte[] byteCode;  
 int programCounter;

 public void NextCommand()  
 {  
   switch (byteCode[programCounter++]) {  
     case 0x00:  
     // Implement execution logic for command 0x00  
     break;  
     // Implement other commands...  
 }  

This is probably the most straightforward implementation you're going to get away with. It would probably really only work for very simple languages for which you want to implement the Interpreter interface from scratch. Besides, subclassing this to implement the switch statement yourself is not exactly ideal either, so this won't work for our BaseInterpreter.

Parsing byte code

Reading arguments of any type other than byte or char from a byte array is also a pain, since you have to be aware of big and little endian formats and the specific amount of bytes taken by a type. Plus, you have to keep track of the program counter with every single access to the array. Forget it once and you have a hard-to-find, breaking bug.

The BinaryReader class from System.IO, which you probably already use to load the byte code in the first place, could take care of that. But if you were to continue using it, you'd have to keep open the FileStream, which would probably slow done every access to the code.

Enter the MemoryStream. After you load the byte code from a file, store it in a byte array and pass that to the constructor of MemoryStream, which you then pass to the constructor of BinaryReader. You now have a BinaryReader with all its features available for reading the byte code, which makes things a lot easier.

You no longer have direct access to the program counter though, which will probably become a nuisance, possibly a major problem (e.g. when you need to keep track of access to it, which you would otherwise do with a property). My solution:

 public BinaryReader DataBus { get; private set; }  

 public int ProgramCounter  
 {  
   get { return (int) DataBus.BaseStream.Position; }  
   set { DataBus.BaseStream.Position = value; }  
 }

It is not perfect - it's still possible to access the PC without going through the property (assuming it's possible to access the BinaryReader) - but works well enough. I actually ended up exploiting this in my own debugger code, so I'm not sure whether that's good or bad.

The instruction set

As I mentioned earlier, reading an opcode and then performing a switch statement on that does not really lend itself to a template method, or any other inheritance-based pattern I know of.

So instead of making it work, we ditch it and look at alternatives. We will definetely need to read an opcode before we can determine what function to call, so we might as well make the most of it by directly mapping the number to a method. The simplest way to map numbers to objects is initializing an array, and the simplest object to represent a method is a delegate. We don't even have to define a delegate type for this, since the .NET framework already provides the Action type.
 
 BinaryReader DataBus;
 Action[] InstructionSet;

 public void NextCommand()  
 {  
   byte opcode = DataBus.ReadByte();
   InstructionSet[opcode]();
 }  

Other than initialization and a class header (and the ProgramCounter property), this is basically all we need for our BaseInterpreter. Initialization of InstructionSet will be deferred to the concrete interpreter class, and DecompileCommand() basically works the same, except it uses an array of type Func<string> instead of Action.

No comments:

Post a Comment