2014/10/26

Scripting, Part 7: The Perks of Building a Compiler

Now that we have designed a language and built a full interpreter around it, you might think that it's time to build a compiler for it. Instead, we'll discuss why this may be a bad idea and what alternatives there are.

The inner workings

To get a grip on what's bad aboud coding a compiler, one should know what actually needs to be coded. Modern high-livel compilers consist of two parts: Scanning and parsing.
Scanning refers to grouping the source into tokens and identifying their type, and parsing refers to analysing the tokens' syntactical structure. This is some complex stuff, and both steps can be terrifying to implement - there's a reason you can major in compiler construction, and I'm not going to pretend that I can cover that with a blog post or three.

What I implemented is technically an assembler, i.e. instructions can be compiled independently, since there is no overarching syntax; each instruction in code equals exactly one instruction in byte code.
So far, my assembler consists of these basic steps:
  • Format the code for easier parsing (remove unneeded whitespace, comments),
  • cut it into instructions, which are then cut into key word + arguments,
  • lookup command definitions by key word,
  • compile the instruction according to definition.
I also added support for macros by adding a fallback to the command lookup: When no command definition is found, see if there's a macro defined, and recursively call the compile function on the macro code. By using format strings, you can easily pass parameters to the macros, too.

Finally, there are several features you may want to add, depending on the language. For control flow, you'll need lables to jump to, in most languages symbols (i.e. constants) will be useful, etc.

Assembly language

Coding an assembler may be easier, but it also forces users to write in an equivalent of assembly. This has the opposite effect of what scripting should be - you want to code on a higher level, not a lower one.
Sure, our instructions are based around higher level concepts, but in the end, you're still compiling code like this:
 Action 1  
 Action 2  
 Action 3  
 Action 4  
 Succeeder  
 Sequence 3  
 Action 5  
 Selector 3  
 End  
And that's for describing a tree!

To be fair, some use cases are more suitable for this type of language. This is a conceivable NPC script in an arbitrary language:
 facePlayer  
 showMsg "Hello World! Do you like assembly?"  
 showYesNoBox  
 if YES goto .yesAnswer  
 showMsg "Too bad..."  
 end  
   
 .yesAnswer:  
 showMsg "Me too!"  
 end  
And don't run away now because I'm using goto, it really isn't as bad as people make it out to be, especially in short scripts like this.

Alternatives to code

The point still stands, though: designing a language and building a compiler for it is not the optimal path in a lot of cases. But you need to produce the code somehow, right? Well, I can think of three alternatives:

Dedicated editors or parsers
Suppose we're still on the case of our behaviour tree markup language. Instead of people coding tree creation, it's much simpler for everyone (except maybe the tools programmer) to build a dedicated tool for visual construction of behaviour trees.
Even when a dedicated tool is not an option, parsing the XML or JSON output of existing tools is probably no more complicated than building an assembler, and the emerging tool chain is much more user friendly for everyone involved.
Assuming the tree structure already exists in memory, generating the above code is dead simple and can be done with a single recursive method.

Dedicated GUI script editor
The possibly easiest, if code heaviest, option is probably a button based script editor similar to the one used by the RPG Maker:


This lends itself really well to the MVC pattern, too. When you're operating on a list of command objects, created by the respective buttons / dialogs, compiling becomes a matter of looping through that list, syntax highlighting can be done on object creation, and syntax errors become virtually impossible.
If you're not comfortable in this kind of environment, imagine the same principle in a console. Now translate the console to a single text box, while the actual code appears in the same kind of list as shown above. Add some functionality to the up and down arrow keys and some shortcuts and you're good to go.

Flow charts
If none of that is for you, see if you can get your hands on a good tool for flow charts and its output format. Parse and compile it right and you essentially get Unreal's Blueprints. Scripting with flow charts is a very visual, intuitive programming method and allows most people to get the gist of it all quite quickly.
This is also the most difficult to get working, and working right, though, and I haven't really done anything like it myself, so I will refrain from further commenting on it.

And that's it for now! If you still want to get a compiler going, the next three posts will deal with just that, and that will be the conclusion of this series. If not, I guess this journey ends here for you. In that case, I hope I had something useful to say to you, and wish you all happy coding, or whatever programmers wish each other.

No comments:

Post a Comment