Fast math
Pretty soon after spinning his first cube, the 3D programmer realizes he needs a primitive library to handle all the math voodoo. The first primitive he probably wants is the vector. At the core a vector is just three values:
struct vector3f {
float x, y, z;
}
A very common thing to need to do to a vector is normalization. Normalizing a vector can be written as follows:
void normalize() {
float length = sqrt((x * x) + (y * y) + (z * z));
x /= length;
y /= length;
z /= length;
}
Hmm, looks like three multiplications, a square root, three divisions... a decently expensive function. If we want more speed we can use SSE. Basically SSE is a set of instructions and registers built into the CPU that we can use to perform common math routines. For example, the __mm_mul_ps function gives us four multiplications with just one instruction, and __mm_div_ps similarly gives us four divisions. So while using SSE can be a pain since you are basically working in assembly, it can offer some serious speed improvements.
I think it's good for a programmer to write his own primitive routines first to better understand the math involved. Especially when it comes to something nasty like quaternions, I would have never understood them without manually coding each operation. However, at some point I thought, "Surely this has been done before, and by people who are much better programmers than me." Especially when it comes to topics like SSE, there is no way I could do it justice. This brought me to the Sony Vectormath library. This library has primarily been in use by the Playstation 3 SDK, but was also donated to the Bullet Physics project for everyone to use.
I wrote a basic app to see just how much of a speed increase I might see in math heavy routines. It spins 4096 cubes on a random axis 60 times a second. The result looks pretty sweet, and I've included videos at the bottom of the post. Check out these numbers:
Sony w/ SSE : 0.000485
Sony w/o SSE : 0.000796
sinprim: 0.000837
The SSE lib outperforms my naive implementation by almost double! Granted, most routines are not going to be so purposefully math heavy as this test, but the room for improvement is there. One thing to keep in mind is that an executable compiled with SSE can only run on a CPU that supports it. Personally I will use up to SSE2 until I run into trouble, since the Pentium 4 and pretty much everything after it supports it.
You can check out the latest version of the Sony vector math library at the Bullet Physics svn:
svn co http://bullet.googlecode.com/svn/trunk/Extras/vectormathlibrary
My changes to the library to get it to work with GCC are here:
http://sinoth.net/code/vectormath_lib.zip
Now the fancy vids :)
