Push vs Pull APIs

0 97
Avatar for Metalhead33
2 years ago

In multimedia programming - and to an extent, network programming - we typically deal with two kind of application interfaces: ones that require the programmer to explicitly push data onto the underlying implementation of said API, and ones that pull data, usually via callbacks, which are little more than requests to push. But hey, at least it's automatic-ish!

I am going to posit examples of these two contrasting types of APIs in two applications I typically develop in as a hobby.


Push - OpenAL

Generally, pull APIs that rely on callbacks tend to be higher-level APIs to be contrasted with their push-based low-level counterparts, but audio seems to be an exception.

OpenAL is an audio processing API intentionally designed to be similar to OpenGL, which is arguably not a very clever choice, but oh well.

How do we play audio in OpenAL? Oh, it's very simple....

ALuint buffId;
ALuint sourceId;

Except, this is for playing a single sound-effect, where the entire sound sample is stored in the memory (or on the soundcard's memory, in case of hardware-based implementations of OpenAL). But what if we want to stream some music from the disk, and stream it as it's being played instead of loading it all and decoding it into PCM for storage on the PCM?

On modern machines, storing a 3-minute song in PCM format - around 30 megabytes - in RAM might be trivial, on older machines, it was not such a trivial task. Besides, reading it from the disk and decoding all at once causes delay, while streaming is a marginal mallus to performance.

But how do we stream on OpenAL?

Sadly, OpenAL does not natively support streaming, and it does not support ringbuffers either, which means that we are forced to rely on buffer queues.

std::vector<ALuint> buffers(8);
ALuint sourceId;
size_t offset = 0;
for(const auto& it : buffers) {
	offset += size_of_your_data;

However, this also comes with the added drawback that we have to periodically check with buffers have "expired", unqueue them from the source, fill them with new audio, and queue them back.

ALuint expiredBuffers = 0;
ALuint unqueuedBuffer = 0;
size_t offset = 0;
	alSourceUnqueueBuffers(source, 1, &unqueuedBuffer);
	tmpCtr = soundfile->bufferSound(tmpBuff, &framePosition);
	alSourceQueueBuffers(source, 1, &unqueuedBuffer);

This is a really cumbersome solution that requires basically running a checker on another thread to queue up the buffers, or doing it every frame, if we want to force it into an existing loop.

Don't get me wrong, OpenAL is a great API, but this shows a big shortcoming of the whole push API, as well as OpenAL's shtick about trying to have the same syntax as OpenGL. This kind of thing is best handled with callbacks, or in absence of that, a ringbuffer of sorts. Sadly, OpenAL does not support either.

Pull - SDL and PortAudio

SDL and PortAudio - despite being lower-level API that basically only serve as an abstraction around the native platform-dependent audio systems (PulseAudio and Alsa on Linux, DirectSound on Windows, etc.) - don't force you to go through such hoops to stream audio. It's about as trivial as this:

class Streamer {
	// Bla bla bla
	void streamTo(void* destination,int framerate,int channels,int bufferSize) {
		// Bla bla bla
		// Bla bla bla
class SDL_Driver {
	SDL_AudioSpec spec;
	SDL_AudioDeviceID devId;
	Streamer* streamer;
	static void callback(void* userdata, Uint8* stream, int len) {
		SDL_Driver* driver = reinterpret_cast<SDL_Driver*>(userdata);
	SDL_Driver(Streamer* streamer, int framerate, int channels, int bufferSize) {
		SDL_AudioSpec spec;
		SDL_AudioSpec want;
		want.freq = framerate.var;
		want.format = AUDIO_F32SYS;
		want.channels = channels.var;
		want.samples = bufferSize.var;
		want.callback = callback;
		want.userdata = this;
		devId = SDL_OpenAudioDevice(nullptr, 0, &want, &spec, SDL_AUDIO_ALLOW_FORMAT_CHANGE);
	~SDL_Driver() {
		if(devId) SDL_CloseAudioDevice(devId);
	void pause() { SDL_PauseAudioDevice(devId,1); }
	void unpause() { SDL_PauseAudioDevice(devId,0); }

Yes, it is arguably longer code than its OpenAL counterpart, but it's also much more intuitive: you set up your SDL audio context, you provide it with a callback, and every time you need to stream audio, you'll know exactly where to stream, and how many samples to stream. It's a pull API. It pulls audio from whatever you provide it.

It's "automatic" - no need to periodically check up on expired buffers on another thread (or any loop), no need to queue or unqueue buffers, all you need is a single function to do the streaming, and this function gets called automatically every time SDL wants more samples. What more could a man ask for?

Obviously, if you want to mix audio, you need to go through a few hoops, but that's basically just implementing a few classes.

class Playable {
	virtual ~Playable() = default;
	virtual void streamTo(void* destination,int framerate,int channels,int bufferSize) = 0;
class Mixer : public Playable {
	std::unordered_map<std::shared_pointer<Playable>,float> playables;
	void streamTo(void* destination,int framerate,int channels,int bufferSize) {
		// Clear the output buffer
		for(auto it = std::begin(playables); it != std::end(playables); ++it) {
			// Clear the input buffer
			it->streamTo(...) // Stream into the input buffer
			// After we did stream to the input buffer, add its contents to the output buffer, but multiplied with the volume.

Obviously, if we needed fancy effects like echo, reverb, delay, lowpass filter, highpass filter, etc. we'd have to implement them manually, while OpenAL already has them for us, but still, the point stands - if a lower-level API like SDL and PortAudio can do this right out of the box, why do we have to go through hoops in OpenAL?


Immediate-mode - Legacy OpenGL and Direct3D

In older 3D rendering APIs - namely, legacy OpenGL, Direct3D 8 and before, Glide, etc. - you typically had a so-called "immediate mode", where you had to push the vertices into the API('s implementation) to render them.

struct Vertex {
	glm::vec3 pos;
	glm::vec2 tex;
	glm::vec4 clr;
std::vector<Vertex> vertices;

for(const auto& it : vertices) {
	Vertex transformedVertex = transformVertex(it);
	glVertex3f(transformedVertex.pos.x, transformedVertex.pos.y, transformedVertex.pos.z);
	glTexCoord2f(transformedVertex.tex.x, transformedVertex.tex.w);
	glColor4f(transformedVertex.clr.x, transformedVertex.clr.y, transformedVertex.clr.z, transformedVertex.clr.w);

And you had to do this every single frame. Fine if you only have a couple thousand polygons on the screen at a time - like in Quake and Quake II, or any PlayStation 1 game - but obviously, we hit a bottleneck at a certain point.

Not only is transforming vertices on the CPU rather costly - you might as well be doing full-on software rendering at that point - but so is the pushing of vertices to your GPU. Those OpenGL API calls - glVertex3f(), glTexCoord2f() and glColor4f() - obviously come with quite a bit of overhead.

Can't we transform our vertices on the GPU instead of the CPU? Well, yes we can, if your GPU has hardware-based T&L (Trasnform and Lighting) support, sure, but then you're stuck with whatever the fixed-function pipeline of legacy OpenGL offers to you, which is rather limited. Also, the aforementioned three calls are still there, still called for every vertex on every frame.

"Retained" mode - Modern OpenGL and Direct3D

But obviously, in the early 2000s, GPU have implemented support for VBOs (Vertex Buffer Objects), which stored - retained - the vertex data of a mesh for you, and support for this feature came in OpenGL 1.5 in 2003 and Direct3D 7 in 1999 (that's quite a long gap in computer history!).

However, this feature was essentially only useful for static objects, and was quite useless for anything you wanted to animate. Ergo, if the hardware transform & lighting capabilities weren't good enough for you, you were stuck with immediate mode. As a matter of fact, games like Quake III Arena and future games using its engine (e.g. Jedi Outcast, Jedi Academy, Return to Castle Wolfenstein, Medal of Honor Allied Assault, Call of Duty, etc.) still relied on immediate mode (obviously, since Quake III came out in 1999, and it was an OpenGL game).

It wasn't until the arrival of shaders that this retained mode of storing vertices in in-VRAM buffers really caught on. In 2000, Direct3D 8 came out with support for vertex and pixel shaders. The same feature came to OpenGL first in 2002 as two extensions (ARB_vertex_program and ARB_fragment_program), then as core features of the newly released OpenGL 2.0 in 2004.

Forgetting about immediate mode finally made sense for the first time, since you could transform your vertices entirely on the GPU by writing a shader.

Obviously, the trivial example was:

#version 330 core
layout (location = 0) in vec3 aPos;
layout (location = 1) in vec3 aNormal;
layout (location = 2) in vec2 aTexCoords;

out vec2 TexCoords;

uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

void main()
    TexCoords = aTexCoords;    
    gl_Position = projection * view * model * vec4(aPos, 1.0);

But now that GPUs were programmable and 3D acceleration APIs supported this programmability, you could do all kinds of funky things on the GPU, like animation (whether keyframe or skeletal), and random distortions. Just upload your vertices once, update a few parameters every now and then, and render away!

Except that.... this "retained mode" is... well, still a kind of immediate mode. We just moved the vertex processing from the CPU onto the GPU. But you still have to make a draw call every frame for every single mesh you want to render, you still have to modify the shader uniform variables, etc. This is still very much a push-API.

Truly Retained mode - Game engines

What's simple-to-use for the end user is complex-to-implement for the programmer, and vice versa. This isn't news. What might be news to the uninitiated, is that this apply to different kinds of programmers too: what's easy-to-use for a game developer is complex-to-implement for the game engine's developer, and vice versa, and an API that's easy-to-use for a game engine's developer is going to be hell to implement for GPU vendors. In 2016, they gave up on that altogether, gave us all the middle-finger and released Vulkan, a very low-level close-to-the-metal API where even drawing a single triangle requires several thousands of lines of code.

But enough of that segue.

So, what do game engines offer us? Well, they offer us something much closer to true retained mode: because the game engines wrap around the lower-level 3D APIs and do all the draw calls for us, us - the game developers - no longer need to make explicit API calls. Instead, we just maintain a list of sors for all the objects we wish to have rendered, provided that they appear on the screen (good engines implement frustum-culling). These lists can be as simple as lists of objects we want rendered, or more sophisticated solutions, such as scenegraphs that also implement many of the transform functions.

So, what mode is the best?

It depends on what you want. I'm of the opinion that push APIs are good for 3D rendering (all 3D APIs are push APIs, even software renderers are push APIs), but rather asinine for audio, which clearly requires a more pull-based solution.

It's a rather sad state of affairs, that OpenAL is forced into a push-type API by a desperate desire to immitate OpenGL, when it's clear that for streaming audio, you'll need a ringbuffer and/or callbacks, two things OpenAL does not support.

$ 0.17
$ 0.17 from @TheRandomRewarder
Sponsors of Metalhead33
Avatar for Metalhead33
2 years ago