21 Jun 2013

Sane C++

TL;DR: An attempt to outline the 'good parts' of C++ from my experience of porting Nebula3 to various platforms over the years. Some of it controversial.

Update: some explanation why STL and C++11 is currently "forbidden", see below!


C++

...is relatively famous for how easy it is to shoot yourself in the foot in many interesting ways. The types of bugs which are simply impossible in other languages is legion.
So then, why is C++ so damn popular in game development? One of the most important reasons (IMHO) is that C++ allows to write very high-level and very low-level code. If needed, you can have full control over the memory layout, when and how dynamic memory is allocated and freed, and how exactly memory is accessed. At the same time you can write very clean and high-level code with the right framework and don't care about memory management at all.
Especially the significance of low-level programming, e.g. controlling the exact memory layout of your data is often ignored by other, higher level languages, even though it can have a dramatic effect on performance.
One of the most common C++ newbie errors is to tackle a big software project without a proper high-level "toolbox". C++ doesn't come with a luxurious standard framework like all those fancy-pancy modern languages.
And with only hello_world.cpp under their belt newbies quickly end up with this typical mess of object ownership problems, spaghetti-inheritance, seg-faults, memory leaks and lots of redundant code all over the place after just a few ten-thousand lines of code.
On the other hand, it is incredibly easy to write really slow code in a high-level environment since you don't really know (or need to care) what's going on under all those layers of convenience.
The most important rule when diving into C++ is: Know when to write high-level and when to write low-level code, these are completely different beasts!
So what's the difference between high-level and low-level C++ code? I think there's no clear-cut separation line, but a good rule of thumb is: if it needs to run a few thousand times per frame, it better be really well optimised low-level code!
  • If you look at a typical rendering pipeline, there's this typical cascade where every stage in the pipeline is executed at least an order of magnitude more often then the previous one: outer-most there's stuff that happens only once per frame, next code is executed once per graphics object, then once per bone/joint, then per vertex, and finally per pixel. The realm of low-level code starts somewhere between per-object and per-bone (IMHO).
  • Typical high-level code to me is "game play logic". This is also were thinking object-oriented still makes the most sense (as opposed to a more data-oriented approach). You have a couple of "game objects" which need to interact with each other in fairly complex ways. On this level you don't want to think about object ownership or memory layout, and high-level concepts like events, delegates, properties etc... start to make sense. Shit starts to hit the fan when you have thousands of such game objects.
  • It is of course desirable to get the performance advantages of low-level code combined with the simplicity and convenience of high-level code. This is basically the holy grail of games programming. Hiding complex or complicated code under simple interfaces is a good start.
Ok, so before I drift completely into the metaphysical, here's a simple check-list:


Forbidden C++:

This stuff is completely forbidden in our coding-style:
  • exceptions
  • RTTI
  • STL
  • multiple inheritance
  • iostream
  • C++11
That's right, we're not using C++ exceptions, RTTI, multiple inheritance or the STL. C++11 is pretty cool, but still too fresh. Most of these restrictions will make your multiplatform-life a lot easier (and not much of importance is lost IMHO).

Update: I should have explained why the STL and C++11 is on this list. First the STL: Historically the STL came with a lot of problems because quality differed between compilers a lot, porting to non-PC platforms was difficult if your code depended on STL, and I am reluctant to include more complex dependencies into the engine (like boost for example). Today STL implementations are much better, so on most platforms this is probably no longer an issue.

Personally, I think the STL is an ugly library, *at least* the container classes. You'll have to admire its orthogonality and flexibility, but in reality one project ever only needs 3 or 4 specialisations. What we did was write a handful of container classes (Array, Dictionary, Queue, Stack, List) in the spirit of C#'s container classes (those are probably not as flexible as STL conteiners, but they do look nicer, and the generated code should be the same in most cases). Beautiful looking source code is important I think. This may all change with C++11 though. C++11 is extremely cool, but I think it is too early still to jump on if we need cover a lot of platforms. But C++11 together with the STL is much more powerful then those two alone, so I will very like revert my stance on STL once we switch to C++11.

But I think this switch should be done throughout the entire engine (starting at the core with the new move semantics which are really useful for containers, to the new threading support, lambdas, function objects and so on), so switching to C++11 will involve a major rewrite of Nebula3, maybe even justify a major version number switch. I think it doesn't make sense to sprinkle bits and pieces of C++11 and STL here and there into the code


Tolerated C++:

Use with care, don't go crazy:
  • templates
  • operator overloading
  • new/delete
  • virtual methods
Templates are very powerful, they can make your code both more readable, AND faster because more type information is known at compile time. But you really need to keep an eye on the generated code size. Don't nest them too deeply, and keep it simple.
Operator overloading is restricted to very few places (containers and items in containers). We're NOT having operator overloading in our math library. dot(vec,vec) is much more readable then vec*vec.
Not using new/delete in C++ code sounds a bit crazy, I know. But most of the time where you need to create an object on the heap you'll also want to hand its pointer somewhere else, which quickly introduces ownership problems. That's why we're using smart pointers to heap objects which hide the delete call. And since a new without its delete looks a bit silly, we're also hiding the new behind a static Create() method. It's better to avoid heap objects altogether though, especially in low-level code.
Virtual methods are important of course, BUT: Just spend a second to think about whether a method really must be virtual (or more importantly: do you really need run-time polymorphism, or is compile-time polymorphism enough?). The more "static" your code is, the more optimisation options the compiler has.


Forbidden C:

Some unusual stuff here as well:
  • all CRT functions like fopen() or strcmp() are forbidden, except the math.h functions
  • directly calling malloc()/free() is forbidden
Most of the CRT functions are straight out terrible (strpbrk, strtok, ...) and/or dangerous (strcpy), so we're wrapping them all away and/or use better platform-specific functions under the hood (this can also reduce executable size, which is always good).
Overriding malloc/free with central wrapper functions is really useful once you need to do memory-debugging and -profiling, also makes it easier to try out different memory allocator libs.


Tolerated C:

Some "dangerous" stuff is only allowed in performance-critical low-level code:
  • raw pointers and pointer arithmetics
  • raw C arrays
  • raw memory buffers
These are all recipes for disaster in the hands of an unexperienced programmer (or an experienced programmer who needs to juggle too many things in his head). Instead of pointers, use smart-pointers to refcounted objects (see above), or indices into containers. Instead of raw arrays use containers. Never directly allocate and access memory buffers in high-level code.
All of these "dangerous techniques" are essential for really performance-critical low-level code though, but this is only at a handful places in the code, and when the really mysterious kind of crashes happen, at least you know where to look.


The End

One last point: our code is riddled with asserts which are also enabled in release mode (hardly makes a performance difference, but the uncompressed executable size is up to 20% larger because of the expression strings, thankfully those strings compress very well).
The essential, must-have assert checks are for invalid smart pointer accesses (null pointers), boundary checks in container classes and checking for valid method parameters.
With all of the above, we're rarely ever hitting a seg-fault (maybe twice a year on the server-side). If something breaks, then it is very likely an assertion check which got hit, and this is usually very easy to post-mortem-debug since it comes with a call-stack and method signature.