Tuesday, May 24, 2016

Game optimization by space partitioning: Part II

Reduce rendering calls

Note that last time we reduced the number of objects returned in collision detection using a bin-ing strategy from 10020 to just around 31 objects. But then after profiling the application we noticed that the most time was now being spent in the rendering loop. Upon examination both loops were largely inefficient they were in fact both performing loops on 10020+ objects each call which was happening at both 33fps and the updates were as often as possible. 

Since the strategy existed already it was easy enough to figure out how a bounding box of objects to grab each time we render the screen and only render those objects since they are the only ones that are visible.

One problem that arose from this was the fact that the way the views work the actual objects on screen were significantly scaled.

Handle scaling

Screen size was 640x480 but each tile is 8x8 and is scaled. Blown up by a factor of 5. So grabbing a 640x480 section of from the SpacePartitionStrategy resulted in returning over 5000 objects.

If we take the inverse scale factor for the x and y directions which where 1/5 since each x and y was multiplied by 5. Then we get a grid size of 128x96 so instead of about 80*60 (4800) objects we now are only rendering 16*12 or 192 objects approximately. Of course it comes out to more like 200 or so objects based on bin boundaries and that will fluctuate when we add in smooth scrolling on maps.

The savings are incredible and give many many more benefits for future development.

 Check out the commits

You can check out the diffs of the four commits created for these two posts.

Abstract partitioning scheme                  32994afbc7d8bf2c34f4e3246473acb9eace8d7f
Space partitioning  that does not work   4e09bed3aff2f03856ff282ec4c6e1d91ca6ed58
Fix integer bin partition strategy             e62a1760dd9c59d6d8619b88f03a24f573c23e37
Reduce Rendering using Partitioning      fb896ef2098972b6007bb1cb0b03d74e14f93ce6


Friday, May 20, 2016

Game optimization by space partitioning

But first, a recap of LD#35


I participated in the last ludumdare competition. I love to participate in the competitions, they are an incredible challenge which forces me to focus and learn about how unprepared I am. Also how few public code I've actually released. Most of my entries I have started from scratch and each time I get a little bit farther than I did before, but still way behind where I need to be in order to actually make a game during the competition.

I tend to dive into writing a lot of boilerplate code that would already be written for me in an engine though I've never really found an engine I like. Though I am contemplating going back to SDL one of the few graphics libraries that actually provided all the necessary drudge work to write a game on top of. Otherwise I will stick with writing a game for ludumdare with the code from my last entry.


Pup8Bit an engine


So yeah I did not create a game I only created an engine and not the perfect one at that. It had a couple of bugs. Including no transparent images which was a huge time sync during the competition. Silly Java. Besides needing to make transparent images work, there was a huge issue with collision detection where the character movement code actually was checking it's current position instead of checking the next position for validity. But now that is working correctly.

Did some refactoring of the main Game object to off load all event handlers, updaters and rendering queues to game state objects.

Oh and this may be a little odd and out there but during the competition I added javascript support for scripting. Then quickly added coffeescript support after that. This uses Javas "Nashorn" javascript engine.  And grabbed the javascript distribution from the github repo.

Long story short with that, I intend to use the coffeescript files in a resources/states directory as init scripts for game states so all your objects can be added to the game state via coffeescript. This will allow me to reload the game states and watch the file system for changes to the scripts automatically restarting the state if the script changes. There is already a game initialization script that is loaded right now and does all the keyboard input handling for the current test game.


Ok, the problem - Performance Profiling

So the project is moving along, I have already resolved a couple of major performance bottlenecks with the code, as in somebody (me) was generating new transformation matricies in GameView everytime an object was rendered. Obviously this caused some major gc issues. But that has been resolved storing the transformations efficiently. But there are still occasional pauses when moving about the screen. Using jvisualvm helped catch the last performance issue so lets see if it helps in this circumstance.


So most of the time is being spent in the GamePosition.isClose method. Though there really isn't much to optimize there which is, maybe in getBounds call which does instantiate a new bounding box when there isn't one already but that should not take up too much time after the first call.

Heres the code for isClose.

  public boolean isClose(Renderable r) {
    return distSq(r.getPosition()) < (r.getBounds().getRadiusSq() * 10d);
  }
  public double distSq(GamePosition update) {
    double dx = (getX() - update.getX());
    double dy = (getY() - update.getY());
    return dx * dx + dy * dy;
  }

Really not much we can do about optimizing the calculations like my previous post indicated through experimentation. So really we need to go a step above that to the either the GameState.isClose or the GameState.neighbors method.

isClose just purely asks an individual position if a renderable object is close to it

private boolean isClose(Renderable r, GamePosition update) {
return update.isClose(r);
}

So nothing we can do to optimize that except eliminate the function call. Definitely not our concerning code.

And then here we have it. Where all of our time is being spent.

public Renderable[] neighbors(GenericGameObject go, GamePosition update) {
Set nearbys = new HashSet(); for (Set rSet : byLayer.values()) { for (Renderable r : rSet) {
if (isClose(r, go.getPosition()) && r != go) {
nearbys.add(r);
}
}
}
return nearbys.toArray(new Renderable[nearbys.size()]);
}

So pretty ugly here we are going through each layer and checking every renderable object to see if its close to the position we care about. Oh and all the while doing a bunch of memory allocations in to boot.

What we need is to reduce the overall number of checks. Yes we could pass back a reference to the hashset or store a field for the hashset and reuse that clearing it every update cycle etc. Or some such non-sense but as long as the number of items that we are iterating over is small there isn't really anything wrong with this code. So we shouldn't try to do crazy abnormal optimizations in order to eliminate the memory allocation. I think our biggest way to identify why this is a problem set of code is to output how many renderables get iterated over in this check.

So I added in some code to found how many Renderables in the inner loop were called.

And got a whole bunch of the following

Iterating over 10020
Iterating over 10020
Iterating over 10020
Iterating over 10020

This is for a small map from Tiled. So the app then performs fairly well for doing all those iterations every update on top of rendering at 33fps. But we are kind of wasting vast amounts of cycles here.

We need those extra cycles for doing input handling, AI logic, handling particle movement, etc, etc. So lets get them back shall we.


Ideas...anyone...anyone...Bueller!!!!

Ok so how about we drop the byLayers and just have a separate collection for all 10020 objects. Then what if instead of just one collection how about we use small collections of objects that are close together position wise. I mean this is mostly a filtering out unlikely candidates to get to a small number of objects that significantly reduces the impact of the loops.

A lot of people like to suggest quad trees or other data structures. I on the other hand like to use a grid based partitioning scheme where lookup time is constant time and aggregation is O(n) where n is the number of objects in the area of space we are grabbing. Though on the flip side a quad tree would give you O(log n) lookup time where n is the number nodes in the quad tree you have to traverse but then there is no aggregation time if each node has a list of objects in that section of the tree. Otherwise you have to continue traversing and still build a collection. I might eventually implement a combined method using bins of a certain size but each bin is a quad tree and the quad tree has nodes that contain the entire list of objects in that space. Removal time might be a problem though so might need to keep non-static objects out of it.

Defining the space partitions

We will say out space partitions are 4 tiles wide by 4 tiles high. Can go in any direction and the spacial grid starts at 0,0 and goes to MAX_INTEGER/2, MAX_INTEGER/2 only positive values. We want to be able to add renderable objects to the partition strategy. And get all objects within any bounds, regardless of layer. We want to allow for negative positioned renderable objects so we'll put renderables centered at MAX_INTEGER/4, MAX_INTEGER/4 in our grid system. We will also want our code to handle any bin size. So we can optimize for our bins. Obviously with a quad tree system our bins would only be allowed to contain a maximum number of elements thereby helping make iterations and performance more constant. With bins though the bins can have any number of objects in them so it would be based solely on how many objects occupy that space. Good for top-down rpgs where the maps are pretty static and there are not going to get very busy in one section of the maps. RTS' might benefit from quad tree with an implementation that reduces insertion time an keep back-references to the node an objects in for updating purposes.

Game types to use with bins

  • Turn-based Strategy
  • Top-down RPG
  • Simulation Games
  • 2D Shooters are ok
  • Platformer
I think all of the above should perform fine with binning as a space partition strategy.

Ok now it's time for....

Math

This ends up being our bin calculation.
private int getBin(int x, int y) {
return BIN_OFFSET + (y * BIN_COLUMNS + x);
        }
We calculate the bin by taking a base number BIN_OFFSET adding the number of elements for the beginning of the row in the grid we are currently in so y * BIN_COLUMNS where columns are the number of columns in our grid. I actually made the size of the bin grid just the square root of the BIN_OFFSET
// just make the number of possible bins really large
private static final int BIN_OFFSET = Integer.MAX_VALUE / 4;
private static final int BIN_COLUMNS = (int) Math.sqrt(BIN_OFFSET);
This is still pretty large for game maps. It's 23170 elements.

Benefits of Space Partitioning

So here we go now what are the benefits of this strategy. Before we were iterating over "Iterating over 10020" each update loop

Now we are

Iterating over 31
Iterating over 31
Iterating over 31
Iterating over 31
Iterating over 31

So wow we dropped 9989 iterations. 

Check Profiling

Now lets check the profiling to see if there is anything else we can optimize.


Looks like a lot of time is now spent in rendering. And that reminds me that we were looking at only the collision detection code and not the rendering code due to the most significant time user.

We could easily now use the abstract SpacePartitionStrategy and the methods on it to reduce the number of rendering calls the application is making every render which is now the most significant path.

I will let you know those benefits next time.