Tuesday, May 24, 2016

Game optimization by space partitioning: Part II

Reduce rendering calls

Note that last time we reduced the number of objects returned in collision detection using a bin-ing strategy from 10020 to just around 31 objects. But then after profiling the application we noticed that the most time was now being spent in the rendering loop. Upon examination both loops were largely inefficient they were in fact both performing loops on 10020+ objects each call which was happening at both 33fps and the updates were as often as possible. 

Since the strategy existed already it was easy enough to figure out how a bounding box of objects to grab each time we render the screen and only render those objects since they are the only ones that are visible.

One problem that arose from this was the fact that the way the views work the actual objects on screen were significantly scaled.

Handle scaling

Screen size was 640x480 but each tile is 8x8 and is scaled. Blown up by a factor of 5. So grabbing a 640x480 section of from the SpacePartitionStrategy resulted in returning over 5000 objects.

If we take the inverse scale factor for the x and y directions which where 1/5 since each x and y was multiplied by 5. Then we get a grid size of 128x96 so instead of about 80*60 (4800) objects we now are only rendering 16*12 or 192 objects approximately. Of course it comes out to more like 200 or so objects based on bin boundaries and that will fluctuate when we add in smooth scrolling on maps.

The savings are incredible and give many many more benefits for future development.

 Check out the commits

You can check out the diffs of the four commits created for these two posts.

Abstract partitioning scheme                  32994afbc7d8bf2c34f4e3246473acb9eace8d7f
Space partitioning  that does not work   4e09bed3aff2f03856ff282ec4c6e1d91ca6ed58
Fix integer bin partition strategy             e62a1760dd9c59d6d8619b88f03a24f573c23e37
Reduce Rendering using Partitioning      fb896ef2098972b6007bb1cb0b03d74e14f93ce6


Friday, May 20, 2016

Game optimization by space partitioning

But first, a recap of LD#35


I participated in the last ludumdare competition. I love to participate in the competitions, they are an incredible challenge which forces me to focus and learn about how unprepared I am. Also how few public code I've actually released. Most of my entries I have started from scratch and each time I get a little bit farther than I did before, but still way behind where I need to be in order to actually make a game during the competition.

I tend to dive into writing a lot of boilerplate code that would already be written for me in an engine though I've never really found an engine I like. Though I am contemplating going back to SDL one of the few graphics libraries that actually provided all the necessary drudge work to write a game on top of. Otherwise I will stick with writing a game for ludumdare with the code from my last entry.


Pup8Bit an engine


So yeah I did not create a game I only created an engine and not the perfect one at that. It had a couple of bugs. Including no transparent images which was a huge time sync during the competition. Silly Java. Besides needing to make transparent images work, there was a huge issue with collision detection where the character movement code actually was checking it's current position instead of checking the next position for validity. But now that is working correctly.

Did some refactoring of the main Game object to off load all event handlers, updaters and rendering queues to game state objects.

Oh and this may be a little odd and out there but during the competition I added javascript support for scripting. Then quickly added coffeescript support after that. This uses Javas "Nashorn" javascript engine.  And grabbed the javascript distribution from the github repo.

Long story short with that, I intend to use the coffeescript files in a resources/states directory as init scripts for game states so all your objects can be added to the game state via coffeescript. This will allow me to reload the game states and watch the file system for changes to the scripts automatically restarting the state if the script changes. There is already a game initialization script that is loaded right now and does all the keyboard input handling for the current test game.


Ok, the problem - Performance Profiling

So the project is moving along, I have already resolved a couple of major performance bottlenecks with the code, as in somebody (me) was generating new transformation matricies in GameView everytime an object was rendered. Obviously this caused some major gc issues. But that has been resolved storing the transformations efficiently. But there are still occasional pauses when moving about the screen. Using jvisualvm helped catch the last performance issue so lets see if it helps in this circumstance.


So most of the time is being spent in the GamePosition.isClose method. Though there really isn't much to optimize there which is, maybe in getBounds call which does instantiate a new bounding box when there isn't one already but that should not take up too much time after the first call.

Heres the code for isClose.

  public boolean isClose(Renderable r) {
    return distSq(r.getPosition()) < (r.getBounds().getRadiusSq() * 10d);
  }
  public double distSq(GamePosition update) {
    double dx = (getX() - update.getX());
    double dy = (getY() - update.getY());
    return dx * dx + dy * dy;
  }

Really not much we can do about optimizing the calculations like my previous post indicated through experimentation. So really we need to go a step above that to the either the GameState.isClose or the GameState.neighbors method.

isClose just purely asks an individual position if a renderable object is close to it

private boolean isClose(Renderable r, GamePosition update) {
return update.isClose(r);
}

So nothing we can do to optimize that except eliminate the function call. Definitely not our concerning code.

And then here we have it. Where all of our time is being spent.

public Renderable[] neighbors(GenericGameObject go, GamePosition update) {
Set nearbys = new HashSet(); for (Set rSet : byLayer.values()) { for (Renderable r : rSet) {
if (isClose(r, go.getPosition()) && r != go) {
nearbys.add(r);
}
}
}
return nearbys.toArray(new Renderable[nearbys.size()]);
}

So pretty ugly here we are going through each layer and checking every renderable object to see if its close to the position we care about. Oh and all the while doing a bunch of memory allocations in to boot.

What we need is to reduce the overall number of checks. Yes we could pass back a reference to the hashset or store a field for the hashset and reuse that clearing it every update cycle etc. Or some such non-sense but as long as the number of items that we are iterating over is small there isn't really anything wrong with this code. So we shouldn't try to do crazy abnormal optimizations in order to eliminate the memory allocation. I think our biggest way to identify why this is a problem set of code is to output how many renderables get iterated over in this check.

So I added in some code to found how many Renderables in the inner loop were called.

And got a whole bunch of the following

Iterating over 10020
Iterating over 10020
Iterating over 10020
Iterating over 10020

This is for a small map from Tiled. So the app then performs fairly well for doing all those iterations every update on top of rendering at 33fps. But we are kind of wasting vast amounts of cycles here.

We need those extra cycles for doing input handling, AI logic, handling particle movement, etc, etc. So lets get them back shall we.


Ideas...anyone...anyone...Bueller!!!!

Ok so how about we drop the byLayers and just have a separate collection for all 10020 objects. Then what if instead of just one collection how about we use small collections of objects that are close together position wise. I mean this is mostly a filtering out unlikely candidates to get to a small number of objects that significantly reduces the impact of the loops.

A lot of people like to suggest quad trees or other data structures. I on the other hand like to use a grid based partitioning scheme where lookup time is constant time and aggregation is O(n) where n is the number of objects in the area of space we are grabbing. Though on the flip side a quad tree would give you O(log n) lookup time where n is the number nodes in the quad tree you have to traverse but then there is no aggregation time if each node has a list of objects in that section of the tree. Otherwise you have to continue traversing and still build a collection. I might eventually implement a combined method using bins of a certain size but each bin is a quad tree and the quad tree has nodes that contain the entire list of objects in that space. Removal time might be a problem though so might need to keep non-static objects out of it.

Defining the space partitions

We will say out space partitions are 4 tiles wide by 4 tiles high. Can go in any direction and the spacial grid starts at 0,0 and goes to MAX_INTEGER/2, MAX_INTEGER/2 only positive values. We want to be able to add renderable objects to the partition strategy. And get all objects within any bounds, regardless of layer. We want to allow for negative positioned renderable objects so we'll put renderables centered at MAX_INTEGER/4, MAX_INTEGER/4 in our grid system. We will also want our code to handle any bin size. So we can optimize for our bins. Obviously with a quad tree system our bins would only be allowed to contain a maximum number of elements thereby helping make iterations and performance more constant. With bins though the bins can have any number of objects in them so it would be based solely on how many objects occupy that space. Good for top-down rpgs where the maps are pretty static and there are not going to get very busy in one section of the maps. RTS' might benefit from quad tree with an implementation that reduces insertion time an keep back-references to the node an objects in for updating purposes.

Game types to use with bins

  • Turn-based Strategy
  • Top-down RPG
  • Simulation Games
  • 2D Shooters are ok
  • Platformer
I think all of the above should perform fine with binning as a space partition strategy.

Ok now it's time for....

Math

This ends up being our bin calculation.
private int getBin(int x, int y) {
return BIN_OFFSET + (y * BIN_COLUMNS + x);
        }
We calculate the bin by taking a base number BIN_OFFSET adding the number of elements for the beginning of the row in the grid we are currently in so y * BIN_COLUMNS where columns are the number of columns in our grid. I actually made the size of the bin grid just the square root of the BIN_OFFSET
// just make the number of possible bins really large
private static final int BIN_OFFSET = Integer.MAX_VALUE / 4;
private static final int BIN_COLUMNS = (int) Math.sqrt(BIN_OFFSET);
This is still pretty large for game maps. It's 23170 elements.

Benefits of Space Partitioning

So here we go now what are the benefits of this strategy. Before we were iterating over "Iterating over 10020" each update loop

Now we are

Iterating over 31
Iterating over 31
Iterating over 31
Iterating over 31
Iterating over 31

So wow we dropped 9989 iterations. 

Check Profiling

Now lets check the profiling to see if there is anything else we can optimize.


Looks like a lot of time is now spent in rendering. And that reminds me that we were looking at only the collision detection code and not the rendering code due to the most significant time user.

We could easily now use the abstract SpacePartitionStrategy and the methods on it to reduce the number of rendering calls the application is making every render which is now the most significant path.

I will let you know those benefits next time.

Sunday, May 15, 2016

Quick distance checking!

Recap

A while ago (ah hem 8 years ago) I made a post about how to implement basic bounding box coliision detection. I thought I would make a post about how to use a little short-cut method to determine if two objects are nearby each other.

The Problem

If we have a world full of objects, which are anything from the trees, and grass to the little particles racing around the screen we do not want to needlessly check every object with every other object if we do not have to. For instance, an object on the left side of the screen could not possibly collide with an object on the right side of the screen.

You want to eliminate as many of these expensive mathematical calculations as possible. Or perhaps you just want to see if two objects will affect each other anytime soon. There are a couple things we need to know about the objects in order to quickly make the determination if we want to check them. We might be currently checking collisions on objects that will never touch each other. Each time we can eliminate a future check it is can save us from wasting a check. Although not as big of a problem with modern day graphics if you use the GPU for rendering. Still it will help update cycles which that we run on the CPU and anything that is still prone to bottlenecks.

There may also be other areas where you want to quickly find all objects close or nearby each other. I plan to cover some partitioning strategies in the future that will help eliminate massive amounts of loops from being necessary.

The Math

We want to determine if two objects are close to each other. We need to at least have a radius around the objects in order to calculate whether or not they are within another objects radius. So we use the distance between the two objects which is calculated as

// SquareRootCalculator
double d = Math.sqrt( Math.pow(object1.x - object2.x, 2) + Math.pow(object1.y - object2.y,2));

We can improve the speed of this a little bit by short-cutting the math and applying some variables.

// SingleDiffCalculator
double diffX = object1.x - object2.x;
double diffY = object1.y - object2.y;
double distSquared = diffX * diffX + diffY * diffY;

No Math routine calls and eliminate the need to calculate difference twice. Although the compiler and JIT may not use a register for diffX and diffY they may instead decide to use the stack which is less optimal but probably negligent difference in performance. You can expand it out to be a long equation.

// DirectFromObjectsCalculator
double distSquared = (object1.x - object2.x) * (object1.x - object2.x) + (object1.y - object2.y) * (object1.y - object2.y);

And because I like to take things farther than they need to be here accessing the x and y variables may not be efficient enough since the objects were probably allocated on the heap and may not be in cpu cache lets force them to be put on the stack which is probably in cache still.

// DirectFromStackCalculator
double ox1 = object1.x;
double oy1 = object1.y;
double ox2 = object2.x;
double oy2 = object2.y;
double distSquared = (ox1-ox2) * (ox1-ox2) + (oy1-oy2) * (oy1-oy2);

Because I like to be thorough lets see if that matters at all lets put them into a utility class and use a Unit Test and get some output. And lets do it in an object oriented way.

Ok so there was not much of a difference or enough to be noticable between methods.

283 ms for 10000000 using DirectFromObjectsCalculator
263 ms for 10000000 using DirectFromStackCalculator
283 ms for 10000000 using SingleDiffCalculator
268 ms for 10000000 using MathCalculator
304 ms for 10000000 using SquareRootCalculator

Another run:

336 ms for 12000000 using DirectFromObjectsCalculator
347 ms for 12000000 using DirectFromStackCalculator
391 ms for 12000000 using SingleDiffCalculator
394 ms for 12000000 using MathCalculator
409 ms for 12000000 using SquareRootCalculator

These results did not change much on repeated runs and varied a bit between winners the only obvious looser was the SquareRootCalculator obviously.

So no need to go drastically changing your coding style use variables if you want to clean things up. Doing temporary calculations like the SingleDiffCalculator.

If you would like to fiddle with the code then you can find it here.


The Check

So the final piece which I should have mentioned before going into all that testing above, is the comparison of against the radius.

The calculation returns the squared distance so we need to see if the square of the distance from the objects centers is less than the square of the sum of the radii.

Something like

double radiusSquared = Math.pow(object1.radius + object2.radius, 2);
if(distSquared < radiusSquared) {
  // objects could be colliding
}else{
  // objects not colliding
}

That should be about it. Hopefully I will get some time to back fill that repo with examples from earlier snippets I posted.