What's in an (Graphics) API?
In the beginning, there was software rendering
Ever since the late 1970s and early 1980s, we had both video game consoles and personal computers. One was obviously geared towards entertainment, the other, not so much. As such, from the very beginning, video games had in-built - or, you could even say, hardware-accelerated - support for things like sprites, scrolling, tiles, and whatnot... Personal computers on the other hand, not so much, or to a much more limited degree.
// How to draw some things in bitmap.
// We're assuming 8-bit bitmap, and ignoring things
// like bound-checking for the sake of simplicity
uint8_t* SCREENMAP = 0xB8000;
uint8_t* image = ???;
const int SCREENWIDTH = 320;
const int SCREENHEIGHT = 200;
const int IMGWIDTH = 32;
const int IMGHEIGHT = 32;
const int IMG_OFFSET_X = 100;
const int IMG_OFFSET_Y = 150;
for(int y = 0; y < IMGHEIGHT; ++y) {
uint8_t * const screen_scanline = &SCREENMAP[(IMG_OFFSET_Y+y)*SCREENWIDTH];
uint8_t * const image_scanline = &image[y*IMGWIDTH];
for(int x = 0; x < IMGWIDTH; ++x) {
screen_scanline[IMG_OFFSET_X+x] = image_scanline[x];
}
}
Personal computers, in the very beginning, only had text mode (rendering characters are some minimal graphical elements), but then they later added bitmap mode, which allowed for direct manipulation of the pixels displayed on the screen. However, with some notable exceptions - such as the C64 and MSX - personal computers did not provide any direct hardware support for things like sprites and scrolling. In fact, the IBM PC - which is what I will be focusing on in this article - never did.
But just because that hardware doesn't support a feature, doesn't mean you couldn't emulate it in software. Yes, it was slower, but it was possible at the very least.
But of course, PCs evolved, their CPUs became faster, and it was only a matter of time, before they realized that they could use software rendering to do polygonal 3D graphics as well:
On PCs, there was only software rendering until 1996ish, then, until 2002, software rendering would coexist with hardware-accelerated rendering until 2002-ish. To figure out how software-rendering can be done, see my series on software rendering.
In theory, software rendering allows you to do whatever you want, with you having literally full control over the whole pipeline, full control over how vertices are translated, what kind of format do vertices have, how are the triangles rasterized, etc. with the only limitations being your own imagination.... and the RAM.... and the CPU cycles.
And the last one was the biggest issue. Speed. You see, 3D graphics is a computationally expensive task that involves lots of (matrix) multiplications and various other repeated tasks that can be - and should be - done in parallel. However, the x86 CPU - and general-purpose CPUs in general - are only good at processing instructions and doing math one operation at a time, and tend to scale poorly. In particular, the Pentium 1 struggled with producing acceptable framerates at the 640x480 resolution.... So what was the solution?
Well... they "solved" the problem by creating a new type of coprocessor that the task of rasterizing triangles could be offloaded to. As both Geri and I have already pointed it out, in hindsight, this was a big mistake, and continues to haunt us to this very day.
The Rise and Fall of the Fixed-Function Pipeline
GPUs - or Graphics Processing Units - aren't necessarily a new thing. You could argue, that early video game consoles - such as the NES - had GPUs too, because they had hardware-accelerated rendering of tiles, sprites, scrolling, etc. instead of just letting you draw pixels directly onto the screen.
The term "Graphics Processing Unit" also appeared in the the technical documentation of the PlayStation 1 in 1994, which was the second console to support hardware-accelerated 3D graphics (3DO beat them to the punch by a single year).
The venerable PS1 had GPU that could render between 90,000 and 360,000 polygons per second (divide that with the amount of frames you want per second to get the number of maximum recommended polygons per frame), and also a math coprocessor called the GTE (Geometry Transformation Engine) that could do the hard matrix/vector math for transforming the vertices and doing lighting.
It was truly a fixed-function pipeline, supporting only four rendering modes, four vertex types:
Flat-shaded, where colour was per-polygon rather than per-vertex, and thus each vertex consisted only of a 3D position
Goraud-shaded, where each vertex contained a 3D position and a colour that was interpolated for per-vertex colouring/lighting.
Texture-mapped, where each vertex contained a 3D position and a 2D texture coordinate, which was interpolated for texture-mapped polygons. The PlayStation 1 supported only affine texture-mapping without perspective-correction, resulting in warped and wobbly textures.
Texture-mapped & Goraud-shaded, a combination of the earlier two, with each vertex having a 3D position, a 2D texture coordinate and vertex colour.
This was eerily similar to the behavior of OpenGL 1.0, but we're getting ahead of ourselves....
And this was just the first one of the many limitations of the fixed-function pipeline I detest so much: the limitation, the fact that it was either the GPU's way, or the high way (software rendering). Yes, I said it: the fixed-function pipeline was a huge step back, a huge downgrade from the infinite flexibility of software renderers. Sure, it was much faster (with some exceptions), but at what cost? At the cost of not just flexibility, but also avenues for optimization, putting game developers completely at the mercy of hardware vendors and whoever wrote the drivers for the GPUs.
For instance, fixed-function hardware at the time couldn't do voxels, which is why games like Delta Force and Outcast relied on software rendering, even as late as 1999.
Anyway, in the aforementioned OpenGL, you basically render a triangle kinda like this:
glBegin(GL_TRIANGLES);
glVertex3f( 0.0f, 1.0f, 0.0f);
glVertex3f(-1.0f,-1.0f, 0.0f);
glVertex3f( 1.0f,-1.0f, 0.0f);
glEnd();
Or, alternatively, in modern C++:
struct Vertex {
float x,y,z;
float r,g,b;
};
void renderVertices(const std::span<Vertex>& vertices) {
glBegin(GL_TRIANGLES);
for(const auto& vertex : vertices) {
glColor3f(vertex.r, vertex.g, vertex.b);
glVertex3f(vertex.x, vertex.y, vertex.z);
}
glEnd();
}
Or, in older, more C-style code:
void renderVertices(const Vertex* vertices, size_t vcount) {
glBegin(GL_TRIANGLES);
for(size_t i = 0; i < vcount; ++i) {
glColor3f(vertices[i].r, vertices[i].g, vertices[i].b);
glVertex3f(vertices[i].x, vertices[i].y, vertices[i].z);
}
glEnd();
}
Now, there are problems with this piece of code. The most salient problem, is that we are required to store all the vertices in our RAM and then constantly re-upload them to the GPU on every frame, which is taxing on the CPU, GPU and the memory bus alike. This is known as "immediate-mode rendering", and is contrasted with so-called "retained-mode rendering", which I will talk more about later.
This is fine if your scene consists of just a couple thousand polygons, but obviously, as time progresses and video game graphics start becoming more sophisticated, this will eventually become unsustainable and a source of problems.
Another issue, is one I already mentioned: there being only a finite number of vertex formats being acceptable, and being particularly poorly documented in OpenGL 1.0. Yet, this is how they did it. It was either this, or software rendering.
Direct3D was the same - albeit somewhat lower-level, with a more involved initialization process than OpenGL -, except that it also supported retained-mode rendering, which was to keep graphics-to-be-rendered in buffers that would be reused frame to frame (which had the problem of practically making it impossible to animate anything without downloading, re-uploading, defeating the whole purpose).
Early 3D accelerators on the PC were a mixed-bag. First came the infamous S3 Virge, which has been mocked endlessly, dubbed a 3D "decelerator", because its performance was inferior to software rendering in many cases. Arguably, the 3dfx Voodoo was providing much better performance, but....
Well, the Voodoo at first only supported its own proprietary API (Glide), with support for the Direct3D and OpenGL APIs coming later. Also, it basically only accepted a type of vertex that contained a screenspace coordinate, a texture coordinate and an RGBA colour. Yes, that was it, it was that limited. Other than that, the programmer had a limited amount of control over how was alpha-blending done, as well as how to blend the textures with the vertex colours.
But of course, graphics kept evolving, and soon, GPUs for the PC started to support using two or more textures at the same time, which also came with controls over how to blend them together.
By modern standards, this is rather primitive, rudementary and crude, but by standards of the time, it was obviously state-of-the-art flexibility. More specifically, Nvidia implemented a feature called "texture environments" on their Nvidia Riva TNT (1998), which was essentially a set of settings for each texture.
GPUs were evolving, and APIs were evolving with them in order to accomodate for the increased amount of features and functionality. However, in the age of fixed-function pipelines, this meant that APIs became convoluted, Rube-Goldberg-like state machines that had to add more layers of complexity over more layers of complexity to accommodate for the new features. The pinnacle of this over-complication of the fixed-function pipeline was the so-called "register combiners" of the NVIDIA GeForce 256 (1999), something I still don't quite understand.
Fragment/pixel processing, colour blending modes and textures weren't the only place where GPUs were evolving though: initially, GPUs only accepted screen-space or clip-space coordinates in vertices, which meant that vertices had to be transformed in software. While the PS1 had its GTE to hardware-accelerate these transformations, on PC GPUs, this happened in software, until the release of the aforementioned GeForce 256, which supported hardware-accelerated "transform & lighting". ATI added support for the same feature, while 3dfx and S3 fell behind and did not support it, emulating it in software instead. Not everyone used Hardware T&L though, because it was inflexible, and programmers liked having control over how their vertices were transformed, which meant that many continued to do the vertex transformations in software anyway.
A point of contention between me and my friend Geri, is that he still favours the old fixed-function pipeline and immediate mode, while I personally prefer to pretend that it doesn't even exist, and hope to return to pretending that, once I am done writing this article. You see, this whole fixed-function pipeline was bleeding from multiple wounds:
If you wanted to do animations - or have fine-tuned control over vertex transformations - you had to upload every single vertex, every single frame, taxing the CPU, the GPU, the memory bus, throttling everything and constituting major bottlenecks. This was fine when we were dealing with only a couple thousand vertices per frame, but with gamers demanding more sophisticated graphics with more triangles, this was simply unsustainable on the long run.
It was initially very inflexible, and while it was evolving to be more flexible, this came at the cost of more and more needless complexity, turning the fixed-function pipeline into a Rube-Goldberg machine.
Poor standardization, with different GPUs supporting different features, and developers having to hack together stuff to make sh*t work on every popular GPU.
Most of the time, programmers were basically just hacking together to produce good results in spite of the flaws of the fixed-function pipeline.
A good example of the latter is the shader system of the Quake 3 Engine, where they basically transform vertices in software (and allow developers to write some sort of script that controls this transformation) and render models multiple types with different blending modes to produce interesting results. It was slow and costly, and a prime example of the programmer's struggle against the bull**** that was the fixed-function pipeline.
textures/liquids/lava
{
deformVertexes wave sin 0 3 0 0.1
tessSize 64
{
map textures/common/lava.tga
}
}
The above code is simply interpreted by the Quake 3 engine, probably compiled into some kind of bytecode, which is executed during the rendering of every frame. Besides the fact, that we're transforming vertices on the CPU and sometimes have to render the same model/mesh twice for certain special effects, the fact that we are forced to interpret some kind of "script" on the CPU for rendering 3D graphics, adds some extra unwanted overhead that no doubt throttles the CPU and is the source of slowdowns. (This gets solved by Vertex and Fragment shaders later, which get compiled on the GPU, into GPU machine code)
The game seen above is beautiful, and runs nicely on modern GPUs, but back 2000... it pretty much only ran on newly-released high-end PCs, and that had to do with all the hoops the developers went through to produce those beautiful cutting-edge graphical effects on the inherently inflexible fixed-function pipeline. It was a nightmare for programmers, and a prime example of smart developers hacking things to push bull**** to its limits.
Shaders, shaders, and more shaders
So, as I just said, the fixed-function pipeline had plenty of flaws, from CPU bottlenecks to inflexibility problems, to becoming too complex, etc. So what was the solution to these problems?
Well, it turns out, the solution was staring at us in the eye all along, because programmable pipelines are not a new thing at all: in 1986, Texas Instruments released the TMS34010, which was actually programmable. In fact, even the Nintendo 64's GPU allowed programmers to rewrite the microcode, which was functionally the same as having shaders. Hell, even the Rendition Verité in 1995 was programmable, containing a RISC CPU that could run "microcode" (shaders), giving developers fine-tuned control over both vertex and fragment processing.. So why didn't it catch on, while 3dfx's fixed-function pipeline did? The answer is simple: developers just weren't interested back then, the implementation was bad (fixed-function hardware was faster), and the puny 2-4 megabytes of VRAM was a difficult fit for textures, vertices AND the shader mirocode.
However, by 2001, GPUs have had enough VRAM and enough transistors and clock cycles to make the transition to the programmable pipeline and shaders. In the mid-late 90s, the fixed-function pipeline looked impressive enough for most people, but by the early 2000s, game devs wanted more, and got it.
Direct3D 8 started to recommend support for both vertex and pixel/fragment shaders in 2000 (support for pixel shaders was mandatory, support for vertex shaders was initially optional and became mandatory later), and OpenGL 2.0 followed suit in 2004... though, support for shaders was available earlier, albeit as extensions (NV_vertex_program and NV_texture_shader in 2001, ARB_vertex_program and ARB_fragment_program in 2002).
The game has changed: now, all the vertices would be stored in the VRAM, so you wouldn't have to re-upload them every single time you wanted to render them, and you would transform them using a vertex shader, like this:
#version 330 core
layout (location = 0) in vec3 aPos;
layout (location = 1) in vec3 aNormal;
layout (location = 2) in vec2 aTexCoords;
out vec2 TexCoords;
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;
void main()
{
TexCoords = aTexCoords;
gl_Position = projection * view * model * vec4(aPos, 1.0);
}
The GPU would run this vertex shader for you, transforming the vertices, and interpolating vertex attributes. Then, the interpolated vertex attributes would be used as parameters/arguments for the fragment/pixel shader:
#version 330 core
out vec4 FragColor;
in vec2 TexCoords;
uniform sampler2D texture_diffuse;
void main()
{
FragColor = texture(texture_diffuse, TexCoords);
}
And that's it. No need to worry about throttling your CPU with vertex transformations or glVertex3f calls, or glEnd or glBegin. None of that. Instead, your new worry was about fitting all the textures and vertices in the VRAM.
Instead, now you just call:
glUseProgram(momsFavouriteShderProgram);
glDrawArrays(GL_TRIANGLES, 0, 3);
The programmable pipeline would look kinda like this:
(Mandatory) Vertex Shader: This is the stage where the input vertices are transformed. This is also where the skeletal animations take place, if you elect on using hard rigging (which you should, since soft rigging is slow).
(Optional) Tessalation Control & Tessalation Evaluation Shader (OpenGL) / Hull & Domain Shader (DIrect3D): Bigger polygons are broken up into smaller polygons, and manipulate their vertices based on stuff like heightmaps. Only available after 2010. Useful for providing more detail to objects when looking at them more closely, which can save VRAM.
(Optional) Geometry Shader: Takes a single primitive (triangle, line, point, etc.), and depending on certain conditions, outputs either zero, one or multiple primitives. Only a thing since 2008, it's useful for dynamic particle systems.
(Fixed-function) Rasterization: The vertex outputs produced by the vertex shader are clipped, and then interpolated for every pixel that will be actually drawn to. This stage remains fixed-function even to this day, and the programmer has no control over this.
(Mandatory) Fragment Shader (OpenGL) / Pixel Shader (Direct3D): The per-pixel interpolated vertex outputs are utilized - along with any optional textures - to paint the pixels on the screen. This is the part where we sample textures, do any alpha-blending, maybe even dither the alpha, etc.
As I said, Tessalation and Geometry Shaders came later, and initailly we only had Vertex and Fragment/Pixel Shaders.
Anyway, power comes at a price, and while the the fixed-function pipeline was fairly simple to get into and learn, the programmable pipeline has quite the steep learning curve, which leads me to a bit of a segue....
You see, I told a simplified version of all this to a friend of mine called Roland, and he somehow came to the conclusion, that the fixed-function pipeline was somehow the result of intentionally dumbing things down to accommodate lazy programmers... Maybe I am just bad at explaining, but no, that's not how it happened. There was a time, when the FFP was in fact an accurate reflection of how the hardware worked, with much of the functionality being directly burned into the silicon, and programmability only being gradually introduced, until they had a revelation that all this new functionality needed a whole new clean-slate API.
And just to showcase how gradual was all this evolution... At first, programmable GPUs used separate compute units for executing vertex and fragment shaders, still relying on the exact same dedicated texture-mapping units as they did back in the fixed-function era (in fact, they still do). One could not even access textures from within the vertex shader first, and could only use a limited number of opcodes! It was only after 2007, that the unified shader model became a thing, which mandated that GPUs have generic compute units that can allow vertex and fragment shaders to use the same functions.
Still, together with shaders came the mainstreaming of multiple render targets / rendering to textures (not a new thing: was a feature of the PlayStation 1 too), having floating-point textures, etc. This opened up the door for things like HDR and bloom, which were initially overused.
While obviously, video game graphics did not look photorealistic, and probably never will, the programmable pipeline made it oh so much easier to hide the fact that games are made up of 3D triangles, and made games so much more immersive.
Not to mention, they also paradoxically ease the burden on the programmer: if one wants to produce graphics like seen on the video, it's probably possible with the fixed-function pipeline, but only at the cost of extreme hacking, and lots of multi-stage magic that is no doubt also slow as hell. The programmable pipeline on the other hand follows a trend for moving more and more things from the CPU to the GPU, continuing the earlier trend started by Hardware T&L (I really hope my readers don't have the memory of a goldfish).
Hardware continued to evolve, not just increasing VRAM, clock cycles, texture-mapping units, but also adding new features, like the aforementioned tessalation and geometry shaders, etc. Software continued to evolve too, with batch rendering, tiled-forward rendering, "virtual textures", etc. all striving to squeeze out more performance out of existing hardware while also trying to get closer to photorealism.
Up until 2016, there were really only two major APIs: Direct3D and OpenGL, both having existed since the mid-90s, and having been in competition ever since. There were a few other ones as well, but all of them were hardware-specific proprietary APIs, like Glide, SGL, ATICIF, etc. resulting in the hardware-agnostic D3D and GL winning out in the end. Direct3D is exclusive to Microsoft Windows and is maintained by Microsoft, while OpenGL is platform-independent, but allows for hardware/vendor-specific extensions, which can be queried at runtime. If you want, you can query what kind of extensions are available and write separate code paths to leverage them when available, but most of the time, it's more trouble than it's worth.
Vulkanic Eruption
In 2015 and 2016 respectively, we saw the release of Direct3D 12 and Vulkan. We have already talked about Direct3D before in this article, so what's so special about the 12th installation? Well, the fact, that it is completely different from all previous iterations of the API, having far more in common with Vulkan, than with Direct3D 1-11.
Vulkan was created by Khronos, the same folks who took over OpenGL in 2006.
You see, there always comes a time, when you just gotta start with a clean slate, because you have accumulated so much cruft. Imagine inheriting a 300+ year old house. It's a nice house, but it's literally three centuries old, and to have warm, running water, electricity, an access to the phone line and the world-wide web, you have to make a lot of renovations, to the point where demolishing the old house and building/buying the new one makes far more sense.
This is specifically what happened with OpenGL. With OpenGL 2.0 in 2004, OpenGL made the switch from the fixed-function pipeline to the programmable pipeline, but the fixed-function pipeline functionality continued to be supported to support old legacy code and applications (hardware no longer supported the fixed-function pipeline though, which was instead emulated using built-in shaders). Despite this transition, OpenGL retained the old, outdated state machine-based model, even though this was no longer an accurate reflection of how hardware worked at the time. in 2007, in OpenGL 3.0, they intended to replace this old system with a new object system, and remove all fixed-function functionality. Sadly, due to programmer outcry, they stuck with the same old state machine-based API, and instead of removing the fixed-function stuff altogether, they merely marked them as "deprecated", and created a model, in which programmers could request either a "Core" context (in which deprecated functionality is removed altogether) or a "Compatibility" context (in which deprecated function is simply emulated with in-built shaders). In OpenGL 4.X, after 2010, they did begin introducing new function calls that could bypass the nonsensical state-machine binding system, but by that time, it was too late. OpenGL had its API conventions, and they were out of date.
Around 2015, AMD created a new low-level API called "Mantle", which promised better performance, less CPU usage, better scalability and support for multithreading, all at the cost of manual memory-management and more verbosity (even rendering a triangle takes several thousand lines of code). By 2016, they donated this API to Khronos, which re-branded it as Vulkan, designating it as the successor of OpenGL.
Compared to the OpenGL it intends to replace, Vulkan is a brand new API, and a lower-level one, which gives far more control to the programmer. While previously, OpenGL (for PCs) and OpenGL ES (for smartphones and consoles) were two separate APIs, Vulkan provides a unified API for both types of platforms. Instead of one global state machine like GL, Vulkan has is object-based with no global state.
Just for comparison, here is how you render something in OpenGL or OpenGL ES, the programmable pipeline:
glUseProgram(momsFavouriteShderProgram);
glBindTexture(GL_TEXTURE_2D, texture);
glBindVertexArray(VAO);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
In contrast, here is how you render something in Vulkan:
vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, graphicsPipeline);
vkCmdDraw(commandBuffer, 3, 1, 0, 0);
vkCmdEndRenderPass(commandBuffer);
While yes, both contain "bind" commands, in Vulkan, we bind a named variable to another named variable which is clean, wheras in OpenGL, we bind a named variable to some sort of global state machine, which is unclean and hard to wrap one's head around. Arguably, Direct3D addressed these complaints of mine much earlier than Vulkan, but still.
Clean, efficient, flexible, and a far more accurate representation of how GPUs work, Vulkan delivered on its promises, and it definitely does provide better performance and less CPU usage in the hands of a skilled programmer....
... in the hands of a skilled programmer. And that's an important addendum, because comparing GL to Vulkan, is like comparing an autopilot to manual driving. Yes, manual is better at the hands of a good driver/pilot, but not so much in the hands of a rookie, who will make all the wrong choices and screw everything up.
What does the future hold?
Well, as I said in several previous articles of mine, the line between CPU and GPU is getting blurred. CPUs are being extended with GPU features, GPUs have become able to do general-purpose computing, etc. which suggests, that it's not unreasonable to assume, that one day, the two will merge. Many have prophesized, that GPUs will disappear, and instead, we'll have hundred-core or thousand-core CPUs, running software renderers so fast, we won't be able to tell the difference between that and genuine hardware acceleration.
As a matter of fact, more and more PCs simply lack dedicated GPUs altogether, instead having APUs - Accelerating Processing Units, combining CPU and GPU on a single die, a single chip. While not exactly the same as having a 128-core CPU masquerade as GPU by doing software rendering, not exactly a far cry from that future either.
Even if what I said will not come to pass, GPUs have evolved to be so complex, that it has become quite a wise choice to create Vulkan and Direct3D 12 as more low-level abstractions of the hardware. This can make it easier for vendors to write drivers in the future, and they can just use an OpenGL-to-Vulkan wrapper.... which really brings us back to the 1990s, when 3dfx's OpenGL ICD was just an OpenGL-to-Glide wrapper.
To summarize:
In the beginning, there was only software-rendering, but by the late 1990s, gamers wanted more, but CPUs - namely the Pentium 1 - were too slow to do software-rendering beyond 640x400 with a large amount of polygons.
Hardware-acceleration came to the rescue, offering a cheaper alternative to just buying stronger CPUs.
But hardware-acceleration mostly only offered inflexible fixed-function pipelines, so some people stuck with software-rendering (e.g. Novalogic), some people didn't bother to buy GPUs for a while, so software-rendering and hardware-acceleration coexsted for a while.
Eventually, GPUs made the transition to the programmable pipeline with shaders, and continued to evolve
But the APIs - namely, OpenGL and Direct3D 1-11 - were lagging behind the actual evolution of hardware, necessiating the need for a new API: Vulkan and Direct3D 12.
With GPUs being able to do general-purpose computing, one could literally write and execute a software-renderer on a GPU, so in a way, the line between GPU and CPU is getting blurred.
In the future, there will probably be no GPUs at all, only 1024-core CPUs that can act like GPUs when need be.
Excellent in depth article. I have done much of that (OpenGL, DirectX and Windows GDI). I am still interested in Desktop applications graphics, within windows. I will come back and read more later from your article. Nice job!