The thing is like this: I read in a lot of places that it is generally a good idea to create "interleaved" array data when it comes to rendering performance. For those not knowing what that means, it's simply organizing the geometry data so it's finally feed to the GL as some sort of "struct array" instead of separate, independent arrays.
So, for example, say your geometry is made up of positions, normals, and one texture coordinate set. You have then two choices for feeding all that information to the GL:
- either you create 3 big arrays, one for each element: positions array, normals array and UVs array. Then, upon rendering, you setup your position, normal and texcoord pointers so they point to your three arrays, set the stride of each one so it fits the data type and you're done.
- or you can create a struct containing position, normal and UV members, and then you create one big array made up of struct elements. In this case, upon rendering, you'd do basically the same, but this time your pointers would be set so they point to the first corresponding member in the struct array, and set the stride to be the size of the struct.
Both approaches yield the same visual results, and many people would say the second one will perform better. Though no one can say how much better, because it depends on so many other factors like how you are using the geometry data (dynamic vs static) , whether you are bandwidth limited, or how many cache lines does the underlying hardware have...this last consideration, though, seems to be the ultimate reason why you'd like to interleave your data (or at least I couldn't think of a  better one).
As I couldn't find any actual figures redarding how interleaving affects performance, I decided to prepare a quick benchmark on my own, using OpenGL ES 1.1 and iPhone 3GS as the target platform. And the results where somehow...well...disappointing.
I setup a sample program that performed 5 draw calls for rendering five large geometry objects using a camera set at a reasonable distance, so all GL computations are focused on pure geometry processing (that is, vertex data fetching and T&L).
I also tried varying the geometry data size, just to make sure the results were consistent. The figures I got are displayed in the graph below.
I setup a sample program that performed 5 draw calls for rendering five large geometry objects using a camera set at a reasonable distance, so all GL computations are focused on pure geometry processing (that is, vertex data fetching and T&L).
I also tried varying the geometry data size, just to make sure the results were consistent. The figures I got are displayed in the graph below.
As you can see, no matter how big the batches are, there is little or no difference between the regular and interleaved data setup. In some cases the performace was even a bit better for the regular setup (though the difference was negligible, that's true :).
I tried several geometry settings, just to check if at some point during the GPU data fetching, the interleaved setup would take advantage of the vertex attributes being near each other. I thought that having both the position and normal handy would speed up the whole process, as the second one could be already in the cache just after reading the first one. I expected some kind of improvement, even if it was just a bit...I knew it was not going to be huge but...it was certainly disappointing there was no difference AT ALL.
Maybe it was because the iPhone GPU is designed in some specific way that makes this kind of "optimization" useless, or maybe the "driver" is already performing some kind of automatic interleaving behind the scenes (highly improbable though). The fact is that trying to make your graphics engine "interleave-aware" is definitely not worthy. Not if your are targeting iPhone 3GS+ platforms.
I was just about to undertake a major engine redesign to allow for goemetry data interleaving. Thank God I made this little test before.

