Friday, January 28, 2005

Profiling + Vector Math = Performance

So there was a student who implemented a flocking simulation for a virtual fishtank using Croquet. Worked fine with 20 fishes. With 50 fishes it became rather sluggish. Putting in more than 100 fishes the framerate could be measured in seconds per frame. So the rendering in Croquet is too slow, right?

Not quite. There are two things you have to keep in mind when it comes to performance:
  1. When in doubt, measure.
  2. You are in doubt.
So we fired up a message tally (world menu - debug - start MessageTally). Turns out only 12 percent of the time were spent in rendering. So it's not the rendering at all. A whopping 80 percent was taken by the flock's step methods. The leaves of the message tally showed that 15 percent of the time went into each of the methods #x, #y, and #z!

So we looked at the code. Every fish was told to swim in the main direction of its neighbors, that is the fishes within a certain radius. Processing continued only for those fishes in range.

Sounds quite reasonable, doesn't it? The only problem was that to find the neighbors for a fish, the distance to each other fish was compared to the radius. Not only does this result in a quadratic complexity (every fish was tested against every other) but the distance check was this:
(((thisFish position x - otherFish position x) *
(thisFish position x - otherFish position x))+
((thisFish position y - otherFish position y) *
(thisFish position y - otherFish position y))+
((thisFish position z - otherFish position z) *
(thisFish position z - otherFish position z))) sqrt
This guy knows how to take the distance of two points in 3D, and he knows that the square is the product of a number with itself. Very well, but very slow, too.

Vectors in Croquet are represented as FloatArrays, and for a good reason. The reason is that you can compute sums, products, etc. of a zillion numbers very rapidly, just by invoking a single method. This is way faster than doing a component-wise computation on your own. There's a drawback too, which is that because the internal representation of a float in a FloatArray and the one in a normal Float object differs, it is rather expensive to read or write individual elements from such an array.

So to make your math go fast, use vectors instead of home-brew arithmetic. The whole mess above can be replaced by this:
(thisFish position - otherFish position) length
which is way faster.

Of course, one can further improve this by omitting the square root inside #length, by using #squaredLength and comparing to the squared radius, but that would be nit-picking. And of course one should reduce the overall algorithmic complexity from O(n²) to O(n log n), but that's just common sense.

The point I want to drive home is this: Use the class library wisely. If something is too slow, chances are you're doing it the wrong way. Look for examples in the image, there almost always is one. For a nice example of vector math look at the particle system (TParticle).

2 comments:

orion said...

neat!

i just discovered that about 55% of my code's time is being used in TAvatar >> extent, (which boils down to 30% in TGroup >> boundingBox and another 20% tacked on by TMesh >> boundingBox).

I call TAvatar extent once per frame per avatar per collision object.

.. A quick change to Latch the value of extent just knocked that 55% down to 0%, and it's visibly quite faster when connected.

CroqueTweaker said...

Bounding boxes are second-class citizens in Croquet. The primary bounding volume is the sphere, indeed a hierarchy of spheres organized as sphere tree. So depending on what you need the avatar's extent for you might be better off using its bounding sphere.