About a month ago, I wrote an article about why hardware acceleration was a mistake. Admittedly, it was a rather lousy article: short, just a three-minute read, about why increasing the amount of components of a PC that go obsolete every five years was a terrible idea.
So, even since that, I felt that I needed to elaborate on the subject further and explore it in higher details, explaining more things about why the venerable PC's greatest strength is also its Achilles heel.
So, without further ado, let's get started.
Instruction Sets, Architectures and Clock Cycles
ISA, RISC vs CISC
What is even the x86? Well, it's a computer Instruction Set Architecture. Unless you're reading this article from an Android phone, an iPhone or a Macintosh, then most likely, you are reading it from an IBM PC compatible, which contains a CPU based off the x86 architecture.
An Instruction Set Architecture is basically a set of commands - or machine code "instructions" - that the CPU can natively understand and execute. It's basically the CPU's native language. The best way to explain it would be using the Little Man Computer, where each instruction is made up of a single-digit opcode and a two-digit operand.
For the Little Man Computer, an example program - written in machine code - would be:
510 114 310 000
And in Assembly - which is basically a more human-readable form of machine code - it would be:
LDA 10 ; Load from memory address 10 into the accumulator
ADD 14 ; Add the number stored at memory address 14 to the accumulator
STA 10 ; Store the content of the accumulator at memory address 10, overwriting its existing content
COB ; End of our little program
Assuming that initially, the memory addresses 10 and 14 contain the numbers 20 and 30 respectively, by the end of the program, 10 will instead store 50.
Obviously, real computers work in binary rather than decimal, so instead of X digits for opcodes and operands, we reserve X amount of bits - X usually being a power of two, like 8, 16, 32, 64, etc.
Instruction Set Architectures can be neatly divided into two major camps: RISC (Reduced Instruction Set) and CISC (Complex Instruction Set).
The earlier has fixed-size instructions, with the same amount of bits reserved for each instruction's opcode and operands, while the latter typically has variable-length instructions.
The earlier has instructions that do just the basics (with more complex tasks having to be implemented from multiple instructions), while the latter has more complex instructions that do the heavy lifting for you, e.g. calculating the inverse square root of a number - something that has to be implemented in software on a RISC architecture.
What is a clock cycle?
A clock cycle is the basic unit of time at which a computer operates. On a RISC architecture, each instruction theoretically takes up the same amount of clock cycles as every other instruction (not necessarily one!). On CISC, there is no guarantee for it, as an instruction on a CISC machine often combines what needs to be done with multiple instructions on a RISC machine. In fact, very often, the same instruction takes a variable amount of clock cycles (e.g. integer division, integer modulo, floating-point multiplication, etc.).
We measure the speed of a CPU in terms of clock cycles per second, whose unit is Hz or Hertz.
1 Hetz = 1 clock cycle per second
1 KHz (kilohertz) = 1 000 Hz = 1000 clock cycles per second
1 MHz (megahertz) = 1 000 KHz = 1 000 000 Hz = 1 000 000 clock cycles per second
1 GHz (gigahertz) = 1 000 MHz = 1 000 000 KHz = 1 000 000 000 Hz, or clock cycles per second
Keep this information in the back of the head, as it'll be important later in the article.
The x86 architecture
The x86 is a CISC architecture, where each instruction's length varies from 1 byte (8 bits) to 15 bytes (120 bits), introduced in 1978. That's right, the processor of your personal computer is running on a 43-year old architecture (not really, but I'll get into that later).
Today, as of 2021, the x86's main rival is the ARM architecture, which is RISC. In the 1980s and 190s, the x86's known rivals were the Z80, MOS Technology 6502 (used in the Commodore 64, Nintendo Entertainment System,Atari 2600, Apple II) and the Motorola 68000 (used in the Sega MegaDrive, Apple Macintosh, Atari ST and Commodore Amiga), most of which were CISC.
It's important to note, that just because two platforms - e.g. the NES and C64, or the IBM PC and the PlayStation 4 - have a CPU based on the same architecture (e.g. x86 for the PC and PS4), does not mean that software between the two platforms are going to be compatible, because they most definitely run two vastly different operating systems, and even on the hardware level, memory-mappings (the way the CPU / operating system / software communicates with hardware peripheries, such as the sound card, video card, keyboard input, etc.) and various other things are going to be very different.
If you have a phone that runs Android, there is a 99.9% chance, that it has a CPU which is based off the ARM architecture.
However, this segue is getting a bit too long, so I won't bother to explain the differences between x86 and ARM.
The IBM PC's Greatest Strength and Weakness
The IBM PC - at least, the typical desktop computer we are all used to seeing - follows a very modular design. A large number of its components can be replaced will, if they malfunction or become obsolete. Yay, that sure saves money, right?
Back in the 1980s, when sound cards were a new thing, they weren't built into the computer like they often are today: you had to buy an Adlib, SoundBlaster or Gravis Ultrasound, and plop it into one of the ISA slots of your motherboard. It wasn until the arrival of the AC97 standard, that integrated, built-in soundcards became the norm.
A similar thing can be said for GPUs - yes, PCs have always had video cards conforming to various standards (e.g. MDA, CGA, EGA, VGA, etc.), but dedicted GPUs that offered 3D-acceleration didn't arrive until 1995, with the Creative Labs 3D Blaster and the S3 Virge. You can read more about them on Geri's article.
It's important to note, that while a lot of people think that the x86 architecture and the IBM PC are practically synonymous, not every device that relies on the x86 architecture is a PC: the original Xbox used a Pentium processor which is x86. The PlayStation 4 used an x86-based CPU too. The IBM PC itself is somewhat of a Theseus's Ship - we went from MDA/CGA/EGA/VGA to having 3D acceleration, we went from ISA slots to PCI to PCI Express, we went from PC bleepers to integrated sound cards, etc. Now we even have a concept called "legacy-free PC", which refers to modern-day IBM PC compatibles that lack all those legacy components that characterized 1980s IBM PC compatible, such as ISA slots and floppy drives.
A Winning Edge
The reason why the IBM PC compatible eventually won - or more precisely, outlasted - all its competition so far, is due to the aforementioned modular design.
Your GPU isn't rendering those triangles fast enough for the latest AAA game? Just replace it!
Your sound card isn't delivering super-crisp 10.1 audio at the highest fidelity humanely possible? Just replace it!
Not enough RAM to run thirty million instances of Bonzi buddy at once? Just replace it!
Your processor isn't processing all the instructions fast enough? Just replace it!
The PC's biggest strength - and weakness, as we'll later find out - is in this modular design, where components can be replaced at will - upgraded, if you will.
While the Commodore 64, once released, was forever stuck with its 64 kilobytes of RAM, a CPU clocked 1.023 MHz, a 16-colour display and a 3-channel sound chip, the IBM PC started with was initially available with a CPU clocked at 4.77 MHz, a varying amount of RAM between 16 kB and 640 kB, a monochrome display and a single-channel sound bleeper, it eventually evolved into something much more sophisticated, with modern-day PCs often having 8-32 gigabytes of RAM, clocked at 3.70GHz, 32-bit truecolour display, above-CD-quality audio capabilities, etc.
The IBM PC's 80s and early 90s rivals - the Commodore 64, Commodore Amiga, Atari ST, etc. - were un-upgradeable, while the PC, with its modular design, could evolve.
The Commodore 64 is now in the attic of retro-gamers, while the IBM PC is still going strong, even 40 years after 1981.
I guess you could also make arguments for this modular design from an environmentalist point of view, as you only have to replace one component at a time, instead of the entire machine - but from the point of view of economies of scale, you could make the opposite argument, as integrating every component into a single system-on-a-chip greatly reduces manufacturing costs.
A Mess to Program On
The biggest strength of the IBM PC is also its biggest weakness. This modular design may have allowed it to beat all its competition and last to the ripe old age of 40 as of 2021, but it always has also been - and continues to be - its Achilles's heel.
Imagine being a game developer in the late 1980s and early 1990s. Let's say you were developing the game for three specific platforms:
On the Nintendo Entertainment System, you know exactly what you are developing against: a CPU clocked at 1.79 MHz, a measly two kilobytes of RAM, a 4-channel chiptune synthesiser, a rather sophisticated - for the time, that is - graphics processing unit of sorts that allowed you to put 64 sprites on the screen per frame, colouring them with a palette of 48 colours. Oh, and the screen resolution was a fixed 256x240.
On the Commodore 64, you also knew exactly what you were developing against: a CPU clocekd at 1.023 MHz, 64 kilobytes of RAM, a 3-channel chiptune generator, a 16-colour palette, and a screen size of 320x200 or 160x200, depending on your choosing.
On the IBM PC.... ho boy, things were complicated. Every PC was - and still is - an individual, a special snowflake with its own specifications. Are we even talking low-end, or high-end, or what?
In 1988, you could expect even the lowest-end PC to have an Intel 8086-compatible CPU clocked at at least 4.77 MHz, have at least 64 kilobytes of RAM, have a four-colour CGA display with a resolution of 320x200, and have a single-channel PC speaker.
By 1992, this had increased to having an Intel 286 CPU clocked at 8 MHz, having at least 512 kilobytes of RAM, and an EGA display capable of rendering 16 colours at a 320x200 resolution.
In 1988, the most cutting-edge PC owned by the richest kid on the block would have an Intel 386 CPU clocked at 12 MHz, probably a couple of megabytes of RAM, a VGA video card capable of displaying 256 colours at a 320x240 resolution, and an AdLib sound card capable of producing FM synthesiszed music at 9 channels.
By 1992, the most cutting-edge PC would have been powered by a 486 CPU clocked at 66MHz, 8 MB RAM, and probably a SoundBlaster 16 soundcard, or a Gravis Ultrasound.
The point I'm trying to make here, is that as a software developer, you basically had no idea what kind of PCs would your customers have, so you had to specify the minimal system requirements.
You basically had four choices:
Write different code paths for different kinds of hardware, implementing support for CGA, EGA and VGA graphics alike, for the PC speaker and AdLib, etc. and maybe just target the lowest common denominator when it comes to CPU and RAM requirements.
Just target the lowest common denominator: 4.77 MHz, 64 kilobytes of RAM, CGA, PC Speaker.
Just do exactly what AAA game developers do these days: target high-end machines. That would mean 12 MHz, 512 kilobytes of RAM, AdLib, VGA, etc.
Just forget about developing for the PC, pick a different platform, like the NES (or later SNES and SegaMegaDrive), Amiga, etc.
As impressive as the DOS game library is, most game developers obviously picked the fourth option, as the majority of of our beloved NES, SNES, Sega MegaDrive, PlayStation, etc. classics did not have PC ports. Before 1992, it seems that the majority of those who developed PC games went with the first option, then with the third option after 1992.
However, even if you were developing games exclusively for high-end machines at the time, you still had some issues, like SoundBlaster and the Gravis UltraSound being very different beasts with different APIs and so forth. So, you either had to write two separate code paths for them, or rely on a third-party proprietary library like the Miles Sound System.
Then, in the mid-to-late 90s, 3D acceleration became a thing. At first, all these 3D GPUs supported only their own proprietary APIs: the S3 Virge had SGL, the ATI 3D Rage had the ATICIF, the 3dfx Voodoo had Glide, etc.
Eventually, as DOS was replaced by Windows, these proprietary APIs were replaced by OpenGL and Direct3D, but even those two had - and continue to have - their own problems. Not to mention, even in spite of this standardization, not every GPU conforms to the standards perfectly, with a game that works fine just fine on a machine with an Nvidia GPU having glitches on a machine with an AMD GPU, and vice versa.
This situation is still so bad, that my friend Geri just decided to abandon hardware acceleration altogether, and focus on software rendering.
Even by the late 90s, you would have faced a problem similar to the one I described for the DOS-era - while the PlayStation, Nintendo 64 and Sega Dreamcast all had well-known and widely documented 3D rendering capabilities, while on the PC, you may have had OpenGL and Direct3D support by the late 90s, but you also still had some video cards that lacked 3D acceleration capabilities altogether, so your best bet would have been to support Direct3D/OpenGL and also write a fallback software renderer - which is precisely what Quake 2 and Unreal did.
The Myth of Compatibility
So, in this article, I made the claim multiple times, that the IBM PC's greatest strength and weakness is its modular nature, the fact that you can simply replace and upgrade your CPU, your RAM, your sound card, your video card, etc.
However, I have a confession to make - I wasn't being completely honest when I made that claim.
Can you take an original IBM PC from 1981, replace its 8086 CPU with an Intel Core i7? Nope! They're not pin-compatible! You can't even put a recently manufactured memory module into it either! Hell, you can't even put a 15-year old computer's DDR2 RAM module into a modern computer either, which expects a DDR5, and the two are not compatible. Modern PCs also lack ISA slots, and instead rely on PCI.
So obviously there is a limit to the forward-compatibility of PC motherboards, and you do have to replace the entire machine every generation or so.
But wait, it gets even worse!
The 86 (5-10 MHz), 286 (5-25 MHz), 386 (12-40 MHz), 486 (16-100 MHz), Pentium (60-300 MHz), Celeron, Xeon, Intel Core i3/5/7, etc. all use the exactly same x86 architecture, right? Right?! Nope!
You see, in addition to greater clock speeds (and other improvements that reduced the amount of clock cycles needed to complete each instruction), these CPUs also introduced several.... ahem.... extensions to the instruction set. Each new Intel CPU implemented not just the same instructions as the previous CPU, but typically a few extensions as well, new instructions added to the x86 instruction set. If you wanted to squeeze out as much performance out of your new hardware as possible, you'd have to make use of these new features, but that would lock out those stuck with older hardware altogether.
As a matter of fact, as we're quickly approaching the point where increasing clock cycles of CPUs is no longer sustainable, both hardware and software developers alike are shifting the focus on increased parallelization. However, this parallelization involves more than just increasing the number of CPU cores and hoping for the best - it also involves writing software that benefits from the increased number of CPU cores, by making use of multi-threading. As an alternative - or as a way to further squeeze out even more parallel performance - it also involves implementing new CPU instructions (and software that makes use of said instructions) that can perform the same operation on multiple instances of data at the same time: Single Instruction Multiple Data.... or SIMD.
On the x86, this parallelization first became widespread in 1999, when the Pentium 3 came implementing the first iteration of the Streaming SIMD Extensions for the x86 architecture. Truth be told, it was preceeded by the introduction of MMX in 1997, but MMX did not have the same impact as SSE.
SSE was eventually superseded by the Advanced Vector Extensions - or AVX - which first appeared in 2008. As of 2021, the latest of these instruction set extensions is the AVX-512, which first appeared in 2015 - however, CPUs actually implementing it are rather expensive, and your CPU probably doesn't support it either.
Typically, PC games don't even bother to have multiple code paths for CPUs that may or may not support different iterations of SSE or AVX. They typically just went with SSE2 before between 2005 and 2016, and then AVX2 after 2016ish.
Final Thoughts
The PC is a mess. It's real difficult mess to develop for. But it's a beautiful mess none the less. Yes, it can be painful to program, but the beauty of it, is that it gives every consumer a unique experience. This was even more true in the DOS-era, where sequenced music reigned supreme, and MIDI music had a different sound on every sound card.
However, I still think that we really went down the wrong track when we decided to add yet another replaceable part to the stack in the mid-90s with the invention of 3D-accelerated GPUs.
It has been prophesied, that GPUs are going to disappear, and instead, we'll have CPUs with hundreds of cores running a software renderer, and people won't be able to tell the difference between that, and having a GPU. And you know what? We should have went down that route from the very start.
Everyone would have benefited. SIMD extensions would have been adopted faster. Multithreaded programming would have been adopted faster. Programmers would have started writing programs that scale for multiple CPU cores much earlier. All that money spent on GPUs would have been spent on stronger and faster CPUs that can do more than just make video games pretty. Software rendering could have driven the need for faster CPUs with more cores and software that scales better.
I said it before, and I'll say it again: GPUs were a mistake. It was never meant to be.
Or maybe the x86 itself was a mistake? It scales poorly. Maybe it's time to replace it with something different? Something fresh? Something that scales better for multithreaded programming?
The article is soo long! I just read the upper portion of it. It was elaborated in such a good way. I wish i can finish it sometime.