2015/07/26

The Game State Machine

This post is about lots of things: It's about words with so many meanings they become meaningless. It's about what code runs every frame in the game. It's about one reason I don't use Unity
Ultimately, it boils down to flow control. I'm not talking about conditional jumps and loops, I'm talking about transitions between conceptually and logically distinct phases of the game.

Of Game States, Screens & Scenes

One of the problems in discussing this, like much else, is language. All of the terms above are somewhat overloaded, but the worst offender, I think, is game state. Most people you might ask will have some idea of what it means, but there is lots ideas to be had:
  • Mathematicians and game designers might think of game state in terms of rules: every valid arrangement of chess pieces on a board is a game state of chess, and different actions by the players will transform the game state;
  • A computer scientist might think of video games as programs, so the game state should refer to the program state of the game, i.e. all the information stored in its memory;
  • Some programmers, including the XNA developers, seem to think of phases of the game's execution, where the game behaves differently.
You could model your engine after that first definition, but you would probably end up with an engine that can only deal with a small set of turn based games. Modeling the engine after the second definition doesn't really make sense: Managing the program state is the job of the program, no matter how you model it.
The third definition, I find, makes the most sense in the context of programming. Modeling your engine around this requires you to turn the game logic subsystem into a state machine, hence the name game state.

Looking at the Ruby source code of the RPG Maker engine, you'll find that it works just like that, only that it calls the states Scenes. Sounds familiar?
Unity is all about scenes, but they use a different definition, essentially equating them with sections of the game world. Unity is not modeled around game states at all, you use a scene to represent your title screen, another to represent your menu screen, one for World 1-1, another for World 1-2. It assumes that the conceptual difference between the world map and a level accessed through that map is the same as the difference between any two levels. I find this one-size-fits-all solution to be very annoying to work with, so I don't.
To get back on topic, Unity's definition of a scene is similar to that used in 3D modeling: A scene refers to a collection of objects and their relative positions. In fact, it's also similar to the definition used in theatre: There's a description of what's on stage (background, actors) and what happens (the script). If the actors or the background change you go to the next scene.

If you look at the terms I used here, title screen and menu screen among them, you might ask why I don't call states screens and what I would define a screen as. Among the three, it is for sure the least used term among developers, but it is the most common among players: they talk about inventory screens, battle screens, status screens, loading screens...
In my mind, screens are substates of game states, with some acknowledgment of difference in content as well. For example, in the inventory state, you might come across an item list screen, an item description screen, and, in some RPGs, a party screen ("Who do you want to use this item?").

Managing Game State

So if it's not clear already, I want my engine to be aware of game state. So now we need to make a plan as to how to actually implement it.
Just to give you an idea of two extremes I've seen in practice: the Pokémon games I used to mod literally had an integer variable and a jump table as state management. Compare that to the RPG Maker's SceneManager, which uses a fully object oriented model of states, and a stack to keep track of what states came before the current one.

According to automata theory, there's a couple of sensible models:
  • The simplest construct is the finite state machine. Therein, there is only the current state, and the next state is solely determined by the current state and the input it received. However, since we're working in a Turing complete language, the logic to determine the next state can be arbitrarily complex. If one of your states wants to return to the previous states, it can keep track of the previous state itself, while the manager still operates like a FSM.
  • A pushdown automaton has that logic baked in: Formally, it extends the finite state machine with a stack, shared by all states, that can be filled with arbitrary data. Transitions may depend on current state, input and the top of the stack. In practice, since we have RAM, which is shared by all states anyway, we'd really only push new states in and pop them off when we want to return to the previous one, considering the top of the stack as the current state.
  • In a hierarchical state machine, a state may define substates (the current substate being on top of the stack), which extend the logic of the superstate, and cascade down the stack. As in the linked example, you could implement that with inheritance as well.
We could go on like this for quite a while, all the way up to Turing machines, but I guess this is where I draw the line. We're already working in a Turing complete language, so why build a Turing complete state manager? Telling it what to do would be equivalent to just hacking in the feature you want, except you wouldn't be used to the semantics, and the code running in the background wouldn't be battle tested.
A finite state machine, unintuitively suffers from the same problem: It's feature set is so minimalistic, you'd be hacking in just as much, except this time, you're doing it in a familiar language. Better, but still bad.
Both the pushdown automaton and the hierarchical state machine share a desirable property: they facilitate code reuse to simplify your code. We will use a stack based hierarchical state machine, as it is essentially a pushdown automaton with random access to the previous state; having a Previous property on a GameState is a very low cost compared to the additional power it provides.

Implementation Details

Now that we've decided on the theoretical construct behind our manager, we need to consider all the nitty gritty details; looking back at the two extremes, they don't only display different theoretical models, they also showcase different paradigms of programming.
Way back when I was writing the byte code interpreter, I used an array of delegates and a MemoryStream, the closest I could get to a jump table. This time, however, I feel like the object oriented approach is more viable; for one, state objects will bundle lots of methods together, and the interface is easier to enforce with an abstract class. More importantly, since the state hold some actual data they should manage, representing them as data structures (although delegates technically are data structures as well) just makes more sense intuitively.

So we start with two classes that might look like this. We'll talk about why I'm not passing around GameTime in a later blog post, but there are more pressing issues at hand.
First and foremost, we need the option to change the stack to turn this from a layer of indirection to an actual state machine. The basic stack operations are Push and Pop. In this context, they refer to entering a substate and returning to the superstate. To change substates of the same superstate, you'd first pop the current state and immediately push the new state. We'll also provide a method for that.
But just deciding on these three methods doesn't answer all our questions. For instance, how should you refer to the game state you want to transition to?
You could just pass in the GameState instance you want and be done with it, but then we still need to talk about where to get that from; we can't make game states static classes because they need to inherit GameState, but we could make them singletons or provide them as properties of GameStateMachine. We could use a dictionary and refer to them with string handles or an enum.

Or we don't enforce any such limit and allow new instances to be pushed freely.
That has the advantage that, if you can come up with a scenario where the same state has to be present in the stack multiple times (I couldn't), that's no more difficult than any other. It also makes it easy to pass info into the state you want to push, since you are responsible for its construction.
The issue I see with it, though, is that then you can't very well initialize all states upfront, which might be a bit too costly for fluent transitions. But, if we allow states to be initialized freely, we also allow them to be singletons or whatever else, so the states that need that optimization can get it individually.

So, if we can get info into the state we're pushing, how do we get it out when we return?
Well, if we provide random access to the previous state, there isn't much I could do to stop you from casting to specific substates and calling arbitrary functions, but I think that's backwards. A function doesn't care who called it, and returns whatever it wants. It's the caller's job to figure out what the output means, should there be uncertainties.
So, the substate provides its output to the GameStateMachine.ExitSubstate method, and the state machine makes sure the superstate gets it. One bug I anticipate with this, would occur when you switch substates (i.e. pop and immediately push a new one) and the new substate Exits with some output data the superstate didn't expect because it called a different substate. If it didn't expect any output, it'd probably discard it and be done with it. But if it expected a different kind of output, we'll get undefined behaviour, depending on how the state handles the input.
There's no easy fix for it either, as you could design your state transitions around that, think polymorphism. The only solution I see for this problem is to be aware of it, and make sure the logging makes clear what happens, in case it ever happens accidentally.

Finishing Touches

With the elephant out of the room, what do we need to do to wrap this up?
  • States need to have access to their superstate. We will add the public GameState Previous property I promised earlier and have the GameStateMachine set it when the state is pushed.
  • States should be notified when they are entered, so we will add two empty virtual mehods Enter and Return to the GameState class. The GameStateMachine will call these after a state has been pushed or its substate popped, respectively.
  • Some states may want to override input handling but delegate update logic to the superstate. To allow this scenario in particular, we'll add GameState.HandleInput, which will be called before GameState.Update in the state machine's own Update method.
  • Both classes still lack initialization code, and the GameStateMachine has to be set up in the MainGame.
Those should be some pretty simple touch-ups to do on your own, in case anyone is actually following along here. This post is long enough as it is, and I don't want to split it so we can move along a little more quickly from now on.

2015/07/21

Content Management: Assets and Localization

Okay, so this will be the last post in this series about Content Management. I'm sick of it, you're bored of it, and we should Keep It Stupidly Simple anyway, so here's what's gonna happen:

Partial unloading of assets will happen, but we'll take an unusually optimistic approach, with just two ContentManagers; one that only gets unloaded at the end of the game holding global content, another that is used to load level-specific assets, which gets unloaded whenever there's both a fade to black and a map change. Makes seamless map changing less painful, and we can still do optimizations later. Or ask uncle Bill for help.
Parallel loading, if I decide to do it at all, we can talk about when we talk about parallelization in general. Until then, premature optimization and all that. Further research suggests that the ContentManager isn't thread safe anyway. I've seen people do it before, might just be more trouble than it's worth.

Localization

is the only thing left on my list then. I was first introduced to the troubles of localization at a summer job where I was programming in Java with another intern. We were tasked with building an installer, which would parse an XML file, build a MySQL database according to it, upload some images into BLOBs in that database and then run all SQL scripts (which we had to fix by hand, because most contained faulty syntax...) inside another directory.
After coding all that in surprisingly little time, including a half-decent, installer-like GUI (in fucking AWT, I might add), we were told, "Great, now translate it into German and give our translator a file he can translate into Polish!".

Luckily for us, we were working in Eclipse and quickly came across String Externalization. The concept is simple: Eclipse parses all your code files and replaces constant strings with calls into a static subclass of ResourceBundle, which, upon startup, is loaded from a *.properties file. Syntactically, it's roughly equal to a *.ini file, so really simple stuff.
These calls would look something like RESOURCES.getString("main_window.title"), and internally, the ResourceBundle would be implemented as a hash map.

We'll do it similarly, with these changes:
  • Our TextBank class won't be static, so you can instantiate different text banks and load only those you actually need at any one time.
  • We will use an index instead of a method. Everybody who knows what a TextBank does should be able to deduce what is meant. If they don't know, calling the method GetString won't help much - if they know the language, they can tell that a string is returned already.
  • We won't use a hash map with string keys, but a plain string[], with integer keys. This is slightly more performant, but the main motivation here is event scripting; a script can refer to a specific text without the interpreter having to learn about string literals.
The resulting class is pretty simple. The XML schema for custom content formats is as well; I used this for testing. Thing is... getting MonoGame to load the damn thing proved a bit of a challenge.

Project set up, revisited

The problem I was running into was a compile time error message: MGCB, the MonoGame Content Builder, couldn't resolve the type TextBank. Which is weird, because the Content.mgcb file was in the same project - by default - and should know types from the corresponding assembly, right?
Wrong. You need to manually add a reference to the project for it to build those types properly. However, you can't just go into the references property in MGCB and add in the assembly name (which will just result in an additional error, telling you that it can't find an assembly with that name), nor the file name (which gets you the same error), nor the full path to the built assembly (which will tell you that the assembly appears to be corrupted or incompatible - probably because it's not finished building because the content isn't built yet).
Copying the built files into the content directory won't work either, again because the assembly is supposedly corrupted/incompatible. So, google to the rescue. Except I didn't find anyone with the same issue (in MonoGame, at least. Somebody managed to fuck it up in XNA. Their solution wasn't applicable to MonoGame). So I looked through the issues on github, SOMEONE must have tried to build custom XML content before, right?
Right. Sort of. At least they were pointed into the right wrong direction. The developer in that thread claims that the error stems from problems with dependencies and the build order. The proposed work around is similar to what the XNA project templates had you doing, with an additional content assembly containing the *.mgcb file. Don't worry, you don't actually have to do this. I did, however, spend quite a while experimenting with that, fighting non-sensical error messages. In the end, the solution is changing the target platform of my assembly from x86 to Any CPU.

Fixing the TextBank class

The next build error was an easy fix: TextBank lacks a parameterless constructor, necessary to deserialize by reflection. Just add one that nulls the texts field.
Then, another build error, caused by an XmlException in MGCB. This was a bit confusing, because most google results showed the XML to be invalid. In fact, all but Shawn Hargreave's awesome blog, which reminded me that the content serializer, by default, ignores private members, meaning it didn't expect any child nodes to the <Asset> node. Adding the [ContentSerializer] attribute to the texts field actually let me build the content project.

A quick test run reveals that this actually works so far. So we're done here, right?

Introducing: The Localization Tool

Wrong again.
If you can't tell from my test XML, there isn't really any way to tell which text you're editing. There isn't even an indicator for the integer key of the text. Once there's more than five or six strings in the bank, you'll be counting to make sure you have the right one, once there's more than a couple dozen, this becomes infeasible.
I'll be doing some tools programming anyway, so I might as well start easy with this one. The bare minimum should be a table with the index and an editable text field, as well as an Add String button. There's other features you could want for this tool, but we'll see how far down the rabbit hole I'll go.

Even excluding tooling, there are some other localization details that need to be figured out: How do we initially determine what language to use (Use English and hope for the best? Check the system locale on first launch?), how do we elegantly determine the asset names of the relevant TextBank XNBs?
This is a matter of taste, among other things, but what I'll probably end up doing is using an extension method ContentManager.LoadTextBank() which appends a language identifier given in the configuration file to the asset name. Alternatively a static method GetLocalizedAssetName() on TextBank, which does the same, except you still need to pass the returned string into ContentManager.Load().

Anyway, that should wrap up this series nicely. Next, we'll get into high level flow control in the game: game state management, and what that even means.

2015/07/12

Content Management: Save Game Serialization

As promised, I looked into Xna.Framework.Storage, and now I know why no one ever seems to bother with it: The API is messy, it revolves around Gamer Services (which aren't exactly supported on Windows) and the XBox 360 UI. Also, it still leaves the serialization up to you, so we'll just do that and build our own, more lightweight API around it.

Save Game Data

Before we start serializing stuff, we need to consider what we need to save. I can safely say now that this game will try to be to 16 bit RPGs as Shovel Knight was to 8 bit platformers, so let's turn to those for help.
We will need to store the player's position in the world, the state of the party and their inventory. Until the map, combat and inventory systems are coded, however, we don't know how they should be stored specifically. More importantly, anyway, is story progression, and since I already have a pretty good idea how the scripting system should work, I can already tell that we will need, at minimum, an array of ints, used as persistent variables in scripts. I'll also add an array of bools, serving as simple flags, to keep track of which events have been triggered already.

To serialize our data, we then need to apply the [Serializable] attribute to our class. But we're not quiet done: When saving the game in the background, i.e. in another thread, we might want to change it in the main thread, so we'll clone our SaveGameData before writing it.
At this point, the class looks like this. Nothing spectacularly ingenious, but this is just the data representation. Not that the serialization itself will be spectacularly ingenious either.

Save File Manager

Object Serialization in .NET is a well documented, exceedingly simple process: You'll need a formatter, a stream, and a [Serializable] class; open the stream, use the formatter to (de)serialize and close the stream. You can keep the formatter around for later use.

The problem is, that all three steps are potential failure points. Luckily for us - sort of -, there aren't very many recovery options: essentially, all of them amount to: tell the player that the read/write could not be completed.
However, we don't have the facilities to do that available in our SaveGameData class, nor should we make them so: Informing the user of the failure should be the job of the UI, so we won't bother catching any exceptions in our serialization code.
Still, we'll want to log the reading and writing process, so we can get an idea of what caused the failure. To that end, we make use of the TracerCollection again. Since I don't want to chuck around a TraceSource when cloning the SaveGameData, and the formatter and file streams shouldn't concern the SaveGameData, I'll make a class SaveFileManager that does all that for me.

The source code for that class can be found here. You'll notice that this class doesn't concern itself with threading. This is because I will probably write a job system for fire-and-forget operations like this, as well as purely functional data crunching (e.g. path finding); I've already tested it in that kind of context, and didn't find any modifications necessary.

2015/07/05

Content Management: Integrating System.Configuration

The .NET Framework comes with a library dedicated to application configuration, in the System.Configuration assembly. It's primary use, as intended by Microsoft, seems to be application level, design time configuration of the framework, but dynamic reading and writing, and user level configuration are also supported.
Using it is pretty simple: You create a file called App.config in your project and Visual Studio will recognize it, and copy and rename it appropriately at build time. The file itself is plain XML. Like most XML schemes, it is kind of verbose, annoying to write, and hard to read.
Similarly, the code to manually read it is a bit confusing, and you need to jump through hoops to make lasting changes to the config file. So our mission this time is to create a layer of abstraction above System.Configuration to alleviate some of these weaknesses. Since System.Configuration is not fully implemented in the Mono Framework, this might also come in handy if there is porting work to be done.

The abstracted API

For a given executable *.exe there is exactly one *.exe.config file, so there is very little sense in instantiating some class for this: multiple instances would just step on each other's toes. Using a static class is also more convenient, since no instances have to be passed around. It's easy to shoot yourself in the foot with global state (which is why I avoid static classes most of the time), but I think this is a pretty good use case for it.
We want to abstract away the complexities of loading the config file in the beginning of the program's life time, as well as the saving after reconfiguration by the user. So we will make two empty methods Load() and Save().
Finally, we don't want our game to crash because of an invalid or incomplete config file. We want to access config data in a type safe manner with default values to fall back on. To do that, we want clearly defined properties or methods on this class, rather than a key/value system.

Getting and setting values

There's at least four different ways to get at the actual values: There's the static ConfigurationManager, which you can use to make Configuration objects. You can use either of those to get string values or to deserialize classes derived from ConfigurationSection.
Sections are type safe and Configuration objects allow you to save changes back into the config file, so we will use that approach. There isn't a whole lot of code yet, so there isn't a lot of configuration data I could define, but I created a class DiagnosticsConfigurationSection : ConfigurationSection, and added a public property of that type to my static class. Note that this class can't be an inner type, for some reason. To make it serialize properly, you also have to add the [ConfigurationProperty] attribute to the properties you want it to serialize. This is also where you define default values and such.

Loading and saving App.config

Loading it isn't really difficult: Just create a private Configuration instance, which we will initialize in the Load() method, and route all your section properties to its GetSection() methods.
Writing it back is a bit tricky, however, and there seems to be this misconception that it isn't even supported. Makes you wonder why Configurstion.Save() exists. The root of this rumor, I suspect, is an issue with the debugger. See, what's called App.config in your project will actually be called *.config after build, where the wild card is replaced by the file name of the executable.
If you open up your debug build in the explorer, you'll notice that there is two executables (*.exe and *.vshost.exe), each with its own config file.
This is because Visual Studio has a hosting process running in the background, which, to my understanding, dynamically loads your executable for optimization purposes in the debugger. So, when you naively call Save() on your Configuration object, you will write to the right config file, but the wrong one is loaded on start up. Or vice-versa, depending on your point of view.
There's two solutions to this problem: disable the hosting process in your project settings, or load a config file other than App.config. If you opt for the former, you'll get a noticeable increase in start up time when debugging. The latter option is not better, I'm afraid: I didn't find a way to abuse this overload to load an arbitrary *.config file, so the only other files available are user level files in their AppData folder, which I would rather not clutter up.

Finishing touches

There's two things left for us to do: Write up the config file and actually use the configured data. The XML format is pretty well documented here, but here is the little file I have at the moment, as a quick reference:
The two values I defined there are used by my profiler and tracer collection. Integrating this into the profiler was pretty simple, just changing this line
callTree = new CallTreeNode[128];
to this
callTree = new CallTreeNode[
    Configuration.Diagnostics.CallTreeNodeMaxCount];

Configuring the TracerCollection was a little tricky because System.Diagnostics tracing isn't particularly well designed, or simply not comprehensive enough. I ended up implementing two new subclasses of TraceFilter, just to attach (1) multiple filters to a listener; and (2) white list multiple sources, rather than just the one allowed by SourceFilter.

Altogether, I feel like this is a pretty robust and intuitive system, and it took only one evening of lazy work to implement. The downside to this type safe approach is that it takes a bit of work for every configurable variable, but I feel like it's worth it, since you even get IntelliSense on the available config values.