You probably have noticed an issue with threads in video games. This article will explain, why games are stuck with two cores. The overwhelming majority of video games and real time 3D applications will not scale well beyond 2 execution cores. Some benchmarks, such as ray tracers of course, will easily scale even beyond 32 cores, the games, however, not. There are some exceptions, like the newest Tomb Raider, however, those are very rare exceptions. This article will try to explain this phenomon quickly.
How a typical 3D game engine works
In a typical 3D, you will have a collision engine, an animation engine, and the code for object and texture management. Besides this, you will have the game engine itself, which controls the game itself. In the case you have a 3D car racing game, then the code will have an AI to control your opponents, detects when you break your car, and so on. In the case of an RTS game, you will have a lot of little units with they own little AI, and you will have an AI that controls the actions of the enemy group. You may think, this is not a reason why games are stuck with two cores at all, as you can throw everything to threads.
Lets throw everything to threads!
Ok, so you have decided to throw your AI to a different thread, and the rest of your engine can care for the 3D and content management. The problem is, now you cant access functions to modify the location of an object, as they are currently being processed on the first core - replacing them in the middle of the rendering would cause glitches or software crashes. What you do then? Well, you dont use any engine-level commands to manage your units, but then you will need an additional pass after you finished the rendering, to execute every action you computed. In the case of a car game, your AI is very simple, and you dont have to care about. In the case of an FPS, your enemies can think for maybe a half seconds before you issue an action for them. However, you will not be able to constantly fill up your threads with all of the AI, as at most time, the enemy will just sit on its ass, or walk around, or you simply dont have enough enemy in range to fill your tasks with the computation of the AI.
Ok, so lets throw the physics and collision on different thread!
Well thats problematic. If your physics and animation engine is very advanced, you will have to recalculate the collision box for the models. Maybe you will just need to recalculate it to be able to frustrum cull the non-visible objects from your scene. The calculation of the physics will not allow you to do anything else - you cant modify the positions of your objects from your game engine, and you cant start rendering the objects, as they are under recalculation. Despite the common belief, you cant have your physics and collision engine on a separate thread.
Ok, so lets create multiple thread from everything
Yeah, that works. Until a point. So you can calculate a certain part of your animated objects on a thread, on another thread, you can calculate another part of it. In the AI, you can calculate parts of your units on a thread, another units on other threads, and so on. This sounds great, right? You may think, this will speed up the things, but the problem is not yet solved.
Lets measure a bit
Lets see, in your code the 3D rendering itself takes 30% of the time, the AI takes 20% of your time, the animation takes 10% of the time, the collision engine takes 20% of the time. There will be non-threadable parts in your code, that will take at least 20% of your code. If you have an unthreadable part of the code which results 20% of the total work, it already delimits you from being able to speed up the code for more than 5 times, even if you use all the cpu cores of the universe.
And the other parts?
Executing threads will take some time, and you cant spread the load evenly. Maybe in your animation code, you have 7 or 8 object that has high polygon count, and others are irrelevant or not animated - in this case, some of your cores will process multiple objects, some of your cores will process lesser work. As the result, there will be a thread that works longer than the others, and you have to wait for each to finish before you can continue.
The outcome
You have created plenty of threads while you have rendered, however, it was very rare to utilize a lot of cores for a long time. You was not really able to fill 2-3 core with tasks. The second core will give you a notable boost, but the third core will barely give you a notable boost. You will have to give a lot of a work to the engine to be able to scale at this point, but then your FPS is already below playable frame rates.
How to scale above 2 cores?
Writing a real time 3D engine that scales above 2 cores is extremely hard. You cant do any of the traditional threading ideas explained above. You will have to write a code that heavily relies on locks and semaphores. Everything will have to work in a monolithic watchdog manner, and you need a method to dinamically share all of the work to different threads. A code like this is a mess, and its hard to make it stable. With such methods, the code will be more efficient, and it will be able to scale up to 4-6 threads. This isnt a real solution tho, as the speed of such a code, due to the big number of semaphores, non-standard code, watchdogs, cache syncings, the speed will be smaller on lower core count.
There are a lot of games that we do play for our students in school which involves threading the Noddles.