Software Rendering 2: Pipelines and Triangles

0 176
Avatar for Metalhead33
2 years ago
Topics: Programming, 3D, Graphics

In our previous episode, we managed to put pixels onto the screen using nothing but software code written in C++. In this episode, we are finally going to be putting triangles onto the screen.

Now, as I said in the first episode, this won't just be a tutorial on software rendering, but also a bit of an insight into how GPUs work under the hood. So, we'll be using shaders of sorts.

Because this code is getting more complex, I have to put out a disclaimer: I'm not claiming that I wrote every single line of code by hand. Some of it was clearly inspired by code written by others. However, I did modify it all, not just for my own needs, but for the purposes of this tutorial.

Remember the sneak-peak?

If you remember yesterday's sneak-peak, you're in for a ride, I'm about to explain things.

First, let's ask ourselves a question. How much state do we want there to be present in our software-renderer? Do we want it to be very stateful (storing the viewport, the uniform, the pointer to the framebuffer, the pointer to the Z-Buffer, shaders, etc. as variables), or to be complete stateless (having all of these as parameters for the rendering functions)?

Just to demonstrate the two extremes, here's the original sneak-peak, which is completely stateless:

#ifndef RENDERINGPIPELINE_HPP
#define RENDERINGPIPELINE_HPP
#include <functional>
#include <glm/glm.hpp>
#include "Texture.hpp"

template<typename VertexInType, typename VertexOutType, typename UniformType> struct RenderingPipeline {
	typedef std::function<VertexOutType(const UniformType&, const VertexInType&, const glm::ivec4&)> VertexShader;
	typedef std::function<void(Texture&, const VertexOutType&, const VertexOutType&, const VertexOutType&, float, float, float)> FragmentShader;
	static void rasterize(const UniformType& uniform,
				   const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2,
				   Texture& framebuffer, const FragmentShader& frag) {
		// TO BE IMPLEMENTED
	}
	static void renderTriangle(const UniformType& uniform, const VertexShader& vert, const FragmentShader& frag,
						const glm::ivec4& viewport, Texture& framebuffer,
						const VertexInType& i0, const VertexInType& i1, const VertexInType& i2) {
		const VertexOutType o0 = vert(uniform,i0,viewport);
		const VertexOutType o1 = vert(uniform,i1,viewport);
		const VertexOutType o2 = vert(uniform,i2,viewport);
		rasterize(uniform,o0,o1,o2,framebuffer,frag);
	}
	static void renderTriangles(const UniformType& uniform, const VertexShader& vert, const FragmentShader& frag,
						const glm::ivec4& viewport, Texture& framebuffer,
						const VertexInType* vertices, size_t vertexCount) {
		for(size_t i = 0; i < vertexCount; i += 3) {
			renderTriangle(uniform,vert,frag,viewport,framebuffer,
						   vertices[i],vertices[i+1],vertices[i+2]);
		}
	}
	static void renderTriangles(const UniformType& uniform, const VertexShader& vert, const FragmentShader& frag,
						const glm::ivec4& viewport, Texture& framebuffer,
						const VertexInType* vertices, unsigned* indices, size_t indexCount) {
		for(size_t i = 0; i < indexCount; i += 3) {
			renderTriangle(uniform,vert,frag,viewport,framebuffer,
						   vertices[indices[i]],vertices[indices[i+1]],vertices[indices[i+2]]);
		}
	}
};

#endif // RENDERINGPIPELINE_HPP

In contrast, here is a more stateful variant of the same code:

#ifndef RENDERINGPIPELINE_HPP
#define RENDERINGPIPELINE_HPP
#include <functional>
#include <glm/glm.hpp>
#include "Texture.hpp"

template<typename VertexInType, typename VertexOutType, typename UniformType> struct RenderingPipeline {
	typedef std::function<VertexOutType(const UniformType&, const VertexInType&, const glm::ivec4&)> VertexShader;
	typedef std::function<void(Texture&, const VertexOutType&, const VertexOutType&, const VertexOutType&, float, float, float)> FragmentShader;
	
	UniformType uniform;
	VertexShader vert;
	FragmentShader frag;
	Texture* framebuffer;
	glm::ivec4 viewport;
	
	void rasterize(const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2) {
		// TO BE IMPLEMENTED
	}
	void renderTriangle(const VertexInType& i0, const VertexInType& i1, const VertexInType& i2) {
		const VertexOutType o0 = vert(uniform,i0,viewport);
		const VertexOutType o1 = vert(uniform,i1,viewport);
		const VertexOutType o2 = vert(uniform,i2,viewport);
		rasterize(o0,o1,o2);
	}
	void renderTriangles(const VertexInType* vertices, size_t vertexCount) {
		for(size_t i = 0; i < vertexCount; i += 3) {
			renderTriangle( vertices[i],vertices[i+1],vertices[i+2]);
		}
	}
	void renderTriangles(const VertexInType* vertices, unsigned* indices, size_t indexCount) {
		for(size_t i = 0; i < indexCount; i += 3) {
			renderTriangle(vertices[indices[i]],vertices[indices[i+1]],vertices[indices[i+2]]);
		}
	}
};

#endif // RENDERINGPIPELINE_HPP

Personally, I prefer the stateless approach, but hey, you do you. Still, for the purpose of this tutorial, we'll be going with the stateful approach, because it makes the code slightly more readable.

To rasterize a triangle - a dead end

So, what do we need to rasterize a triangle?

There are two main schools of thought.

The simpler - but slower - method is to create a bounding box, iterate over all the pixels it covers, find out if the pixel is within the triangle via some mathematical black magic, and then proceed. This is not the method we'll be using, but just for demonstration, I'll be present it in C++-style pseudocode:

const int left = std::min(std::min(v0.POS.x,v1.POS.x),v2.POS.x);
const int right = std::max(std::max(v0.POS.x,v1.POS.x),v2.POS.x);
const int top = std::min(std::min(v0.POS.y,v1.POS.y),v2.POS.y);
const int bottom = std::max(std::max(v0.POS.y,v1.POS.y),v2.POS.y);

for(int x = left; x < right; ++x) {
	for(int y = top; y < bottom; ++y) {
		if( pixelIsInTriangle(x,y,v0,v1,v2)) {
			// Do rendering
		}
	}
}

So, what makes this method slow?

Well, a pixel is in the triangle, if its weights are positive. So, the above code should be rewritten to....

const float area = edgeFunction(v0.POS,v1.POS,v2.POS);
const int left = std::min(std::min(v0.POS.x,v1.POS.x),v2.POS.x);
const int right = std::max(std::max(v0.POS.x,v1.POS.x),v2.POS.x);
const int top = std::min(std::min(v0.POS.y,v1.POS.y),v2.POS.y);
const int bottom = std::max(std::max(v0.POS.y,v1.POS.y),v2.POS.y);

for(int x = left; x < right; ++x) {
	for(int y = top; y < bottom; ++y) {
		const glm::vec2 p = glm::vec2(float(x)+0.5f,float(y)+0.5f);
		const float w0 = edgeFunction(v1.POS, v2.POS, p) / area;
		const float w1 = edgeFunction(v2.POS, v0.POS, p) / area;
		const float w2 = edgeFunction(v0.POS, v1.POS, p) / area;
		if (w0 >= 0 && w1 >= 0 && w2 >= 0) {
			// Hurray, the pixel is in the triangle!
			// Now let's render!
		}
	}
}

But what is even this edgeFunction? I'm glad you asked, because we'll be using it even in our real pipeline, in the other algorithm I'll mention after I'm demonstrating why this bounding box algorithm is so bad.

edgeFunction.cpp
#include "EdgeFunction.hpp"

float edgeFunction(const glm::vec2 &a, const glm::vec2 &b, const glm::vec2 &c)
{
	return ((c.x - a.x) * (b.y - a.y) - (c.y - a.y) * (b.x - a.x));
}
float edgeFunction(const glm::vec3 &a, const glm::vec3 &b, const glm::vec3 &c)
{
	return ((c.x - a.x) * (b.y - a.y) - (c.y - a.y) * (b.x - a.x));
}
float edgeFunction(const glm::vec3 &a, const glm::vec3 &b, const glm::vec2 &c)
{
	return ((c.x - a.x) * (b.y - a.y) - (c.y - a.y) * (b.x - a.x));
}

So yeah, this is the mathematical black magic I mentioned. With this edgy "edge function", we find out how much weight does each vertex have on the current pixel, so we can calculate things like colour (if we're using vertex colouring), texture coordinates, etc. In the bounding box algorithm, we're also using these weights to figure out, if the pixel is even within the triangle in the first place.

Now, we'll be still using this function even within our better algorithm, but clearly, using this edge function for every pixel that may or may not be in the triangle is rather expensive, especially if at least half of all pixels are guaranteed to not even be within the triangle. We can do better, and we will.

Enter the scanline algorithm

Remember when I said that there are two schools of thought for rasterizing triangles, and that the one without the bounding box is better?

Instead of wasting so much precious processing power on calculating the weights of pixels that aren't even in the triangle, we simply render scanline-by-scanline, already knowing where does each scanline end and the begin!

RenderingScanline.hpp
void renderScanline(float areaReciprocal, int y, int minX, int maxX, const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2) {
	// Clamp the scanline's left and right ends into the viewport
	minX = std::max(minX,viewport[0]);
	maxX = std::min(maxX,viewport[2]);
	// Okay, let's render!
	for(int x = minX; x < maxX; ++x) {
		const glm::vec2 p = glm::vec2(float(x)+0.5f,float(y)+0.5f);
		const float w0 = edgeFunction(v1.POS, v2.POS, p) * areaReciprocal;
		const float w1 = edgeFunction(v2.POS, v0.POS, p) * areaReciprocal;
		const float w2 = edgeFunction(v0.POS, v1.POS, p) * areaReciprocal;
		frag(*framebuffer,glm::ivec2(x,y),uniform,
			 v0,v1,v2,w0,w1,w2);
	}
}

Floating-point multiplication is always faster than floating-point division, so I recommend multiplying with the reciprocal of a number instead of dividing, if you know that you are going to be dividing a lot of different numbers with the same number.

The more observant among you may have noticed that I also changed the fragment shader signature.

	typedef std::function<void(Texture&, const glm::ivec2&, const UniformType&,
				   const VertexOutType&,const VertexOutType&, const VertexOutType&,
				   float, float, float)> FragmentShader;

So, what do we have now?

What we have, is a way to render a single scanline, provided that we already know the line's beginning and end. But how will we know that?

Obviously, we need to somehow divide up our triangle into scanlines. We need some kind of an algorithm for iterating over every line that the triangle is supposed to represent. But how?

If the triangle has a flat bottom, the answer is dead simple:

void fillBottomFlatTriangle(float areaReciprocal, const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2) {
	const float invslope1 = float(v1.POS.x - v0.POS.x) / float(v1.POS.y - v0.POS.y);
	const float invslope2 = float(v2.POS.x - v0.POS.x) / float(v2.POS.y - v0.POS.y);
	const int minY = std::clamp(int(std::trunc(v0.POS.y)),viewport[1],viewport[3]);
	const int maxY = std::clamp(int(std::trunc(v1.POS.y)),viewport[1],viewport[3]);
	for(int i = minY; i < maxY;++i) {
		const float dy = float(i) - v0.POS.y;
		const float curx1 = v0.POS.x + (invslope1 * dy);
		const float curx2 = v0.POS.x + (invslope2 * dy);
		renderScanline(areaReciprocal,i,int(std::trunc(curx1)),int(std::trunc(curx2)),v0,v1,v2);
	}
}

Okay, maybe not so simple, and there may be some stuff to unpack there.

So, what's going on here? Truth be told, even I am a bit confused, but from what we can make out of this mathematical black magic, is that we're implying that v0 has the top, v1 and v2 have the exact same Y coordinate, and we're calculating the slopes. Basically, we're finding out how long is each horizontal scanline going to be for a given vertical position within the triangle.

But what if the triangle is upside down, and has a flat top?

void fillTopFlatTriangle(float areaReciprocal, const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2) {
	const float invslope1 = float(v2.POS.x - v0.POS.x) / float(v2.POS.y - v0.POS.y);
	const float invslope2 = float(v2.POS.x - v1.POS.x) / float(v2.POS.y - v1.POS.y);
	const int minY = std::clamp(int(std::trunc(v0.POS.y)),viewport[1],viewport[3]);
	const int maxY = std::clamp(int(std::trunc(v2.POS.y)),viewport[1],viewport[3]);
	for(int i = minY; i < maxY;++i) {
		const float dy = float(i) - v2.POS.y;
		const float curx1 = v2.POS.x + (invslope1 * dy);
		const float curx2 = v2.POS.x + (invslope2 * dy);
		renderScanline(areaReciprocal,i,int(std::trunc(curx1)),int(std::trunc(curx2)),v0,v1,v2);
	}
}

Basically almost the same deal, except that the implications are reversed.

Okay, so we can render top-flat and bottom-flat triangles. But what if our triangle isn't either? Or even if it was, how do we know if it is?

Well, first, we'll need a new edge function:

float edgeFunction(const glm::vec2 &a, const glm::vec2 &b, const glm::vec2 &c);
float edgeFunction(const glm::vec3 &a, const glm::vec3 &b, const glm::vec3 &c);
float edgeFunction(const glm::vec3 &a, const glm::vec3 &b, const glm::vec2 &c);
template <typename Vertex> float edgeFunction(const Vertex& v0, const Vertex& v1, const Vertex& v2) {
	return edgeFunction(v0.POS,v1.POS,v2.POS);
}

Now, after this, we can finally work on the real deal...

void rasterize(const VertexOutType& v0, const VertexOutType& v1, const VertexOutType& v2) {
	const VertexOutType *t = &v0;
	const VertexOutType *m = &v1;
	const VertexOutType *b = &v2;
	// Sort by Y
	if (t->POS.y > m->POS.y) std::swap(t, m);
	if (m->POS.y > b->POS.y) std::swap(m, b);
	if (t->POS.y > m->POS.y) std::swap(t, m);
	const float dy = (b->POS.y - t->POS.y);
	const float iy = (m->POS.y - t->POS.y);
	if (m->POS.y == t->POS.y)
	{
		const VertexOutType *l = m, *r = t;
		if (l->POS.x > r->POS.x) std::swap(l, r);
		const float area = edgeFunction(*t,*r,*b);
		const float rArea = 1.0f / area;
		fillTopFlatTriangle(rArea, *l, *r, *b);
	}
	else if (m->POS.y == b->POS.y)
	{
	const VertexOutType *l = m, *r = b;
		if (l->POS.x > r->POS.x) std::swap(l, r);
		const float area = edgeFunction(*t,*l,*r);
		const float rArea = 1.0f / area;
		fillBottomFlatTriangle(rArea, *t, *l, *r);
	}
	else
	  {
		// Split the triangle
		VertexOutType v4 = VertexOutType::split(*t,*m,*b,dy,iy);
		const VertexOutType *l = m, *r = &v4;
		if (l->POS.x > r->POS.x) std::swap(l, r);
		const float area1 = edgeFunction(*t,*l,*r);
		const float area2 = edgeFunction(*l,*r,*b);
		const float rArea1 = 1.0f / area1;
		const float rArea2 = 1.0f / area2;
		fillBottomFlatTriangle(rArea1, *t, *l, *r);
		fillTopFlatTriangle(rArea2, *l, *r, *b);
	  }
}

Ho boy.... There's quite a lot to unpack there...

Say whaaaat?!

Okay, so to unpack everything:

  • It's assumed that after going through the vertex shader, the vertex coordinates are in screenspace, so basically, we're implying that {-1,-1} and {1,1} were respectively translated to {0,640} and {480,0} in the vertex shader.

  • We first sort our vertices by the Y-coordinates. Because 0 is at the top, and 640 is at the bottom, to ensure that the first coordinate is on the top, we actually check for the Y-coordinate being less, not greater. t is upposed to be the top one, while m and b are the ones below it.

  • We reserve two variables - dy and iy - for the height difference from the top vertex to the ones below it.

  • If m has the same Y-coordinate as t, then hurray, our triangle has a flat top: we calculate its area and its reciprocal, and render it accordingly, but before that, we sort m and t according to their X-coordinate, calling the X-sorted vertices l and r. l is on the left, r is on the right, obviously. After all the sorting, we call fillTopFlatTriangle().

  • Alternatively, if m has the same Y-coordinate as b, we're dealing with a flat-bottom triangle. Same thing applies as before, except we X-sort m and b into l and r, and we call fillBottomFlatTriangle().

  • If the triangle is neither flat-bottom, nor flat-top, we need to split it into two triangles: one flat-top, one flat-bottom. This means that our vertex type needs to be a class or struct with the function split() implemented, returning that vertex type - this is also where the aforementioned variables dy and iy come in.

    • Once we have our fourth vertex, we just sort vertices by X, calculate area for the two triangles, and render each.

Finally, some action!

Not so fast, you impatient child. Just like in the "new" (15 years old, as of 2021) programmable pipelines of OpenGL and Direct3D need to define our uniforms, vertex input and output types, and write our shaders. Except that our shaders will be written in C++, and will be sitting in our application's memory instead of being loaded from a file.

So, basically...

BasicPipeline.hpp
#ifndef BASICPIPELINE_HPP
#define BASICPIPELINE_HPP
#include <glm/glm.hpp>
#include "Texture.hpp"
#include "RenderingPipeline.hpp"

struct BasicUniform {
	int dummy; // I'm only here because something has to be.
};
struct BasicVertexIn {
	glm::vec3 POS;
	glm::vec4 COLOUR;
};
struct BasicVertexOut {
	glm::vec2 POS;
	glm::vec4 COLOUR;
	inline static BasicVertexOut split(const BasicVertexOut& t, const BasicVertexOut& m, const BasicVertexOut& b, float dy, float iy) {
		return { glm::vec2(
				t.POS.x + ((b.POS.x - t.POS.x) / dy) * iy,
				m.POS.y
			),
			glm::vec4(
				t.COLOUR.r + ((b.COLOUR.r - t.COLOUR.r) / dy) * iy,
				t.COLOUR.g + ((b.COLOUR.g - t.COLOUR.g) / dy) * iy,
				t.COLOUR.b + ((b.COLOUR.b - t.COLOUR.b) / dy) * iy,
				t.COLOUR.a + ((b.COLOUR.a - t.COLOUR.a) / dy) * iy
			) };
	}
};

typedef RenderingPipeline<BasicVertexIn,BasicVertexOut,BasicUniform> BasicPipeline;
BasicVertexOut basicVertexShader(const BasicUniform& uniform, const BasicVertexIn& vertex, const glm::ivec4& viewport);
void basicFragmentShader(Texture& framebuffer, const glm::ivec2& point, const BasicUniform& uniform,
						 const BasicVertexOut& v0,const BasicVertexOut& v1, const BasicVertexOut& v2, float w0, float w1, float w2);


#endif // BASICPIPELINE_HPP
BasicPipeline.cpp
#include "BasicPipeline.hpp"

BasicVertexOut basicVertexShader(const BasicUniform &uniform, const BasicVertexIn &vertex, const glm::ivec4 &viewport)
{
	const int viewportW = viewport[2] - viewport[0];
	const int viewportH = viewport[3] - viewport[1];
	return { glm::vec2(
			((vertex.POS[0] + 1.0f) / 2.0f * viewportW) + viewport[0] ,
			(((vertex.POS[1]-1.0f) / -2.0f) * viewportH) + viewport[1]
			), vertex.COLOUR };
}

void basicFragmentShader(Texture &framebuffer, const glm::ivec2 &point, const BasicUniform &uniform,
						 const BasicVertexOut &v0, const BasicVertexOut &v1, const BasicVertexOut &v2, float w0, float w1, float w2)
{
	if(point.x < 0 || point.y < 0) return;
	glm::vec4 colourKernel = {
			(w0 * v0.COLOUR.r) + (w1 * v1.COLOUR.r) + (w2 * v2.COLOUR.r),
			(w0 * v0.COLOUR.g) + (w1 * v1.COLOUR.g) + (w2 * v2.COLOUR.g),
			(w0 * v0.COLOUR.b) + (w1 * v1.COLOUR.b) + (w2 * v2.COLOUR.b),
			(w0 * v0.COLOUR.a) + (w1 * v1.COLOUR.a) + (w2 * v2.COLOUR.a)
			};
	framebuffer.setPixel(point,colourKernel);
}

Okay, so to unpack everything:

  • Our uniform contains nothing but a useless dummy integer that completely gets ignored.

  • Our vertex input has a position and an associated colour.

  • Ditto for our vertex output.

  • When we split our triangle, we have to interpolate, hence the dy and iy.

  • Our vertex shader's only job is currently to project our triangle onto the screen, translating coordinates like {0,1}, {-1,-1} and {1,-1} into {240,0}, {0,640} and {480,640}.

  • The fragment shader interpollates the vertex colours based on the vertex weights. It takes a weighted average of the vertex colours, basically.

But does it actually work?

Well, yes it does!

But obviously, for that to happen, we'll need to modify our render system.

SoftwareRendererSystem.hpp
#ifndef SOFTWARERENDERERSYSTEM_HPP
#define SOFTWARERENDERERSYSTEM_HPP
#include "AppSystem.hpp"
#include "StandardPixelType.hpp"
#include "BasicPipeline.hpp"

class SoftwareRendererSystem : public AppSystem
{
public:
	typedef std::unique_ptr<SDL_Texture,decltype(&SDL_DestroyTexture)> uSdlTexture;
	typedef std::unique_ptr<SDL_Renderer,decltype(&SDL_DestroyRenderer)> uSdlRenderer;
protected:
	uSdlRenderer renderer;
	uSdlTexture framebuffer;
	TextureRgba8 renderBuffer;
	BasicPipeline pipeline;
	void processEvent(const SDL_Event& ev, bool& causesExit);
	void updateLogic();
	void render();
public:
	SoftwareRendererSystem(int width, int height);
};

#endif // SOFTWARERENDERERSYSTEM_HPP
SoftwareRendererSystem.cpp
#include "SoftwareRendererSystem.hpp"


SoftwareRendererSystem::SoftwareRendererSystem(int width, int height)
	: AppSystem("Software Renderer Demo",0,0,width,height,0),
	renderer(SDL_CreateRenderer(this->window.get(),0,0),SDL_DestroyRenderer),
	framebuffer(SDL_CreateTexture(this->renderer.get(), SDL_PIXELFORMAT_RGBA8888, SDL_TEXTUREACCESS_STREAMING, width,height),SDL_DestroyTexture),
	renderBuffer(width,height)
{
	pipeline.viewport[0] = 0;
	pipeline.viewport[1] = 0;
	pipeline.viewport[2] = width;
	pipeline.viewport[3] = height;
	pipeline.vert = basicVertexShader;
	pipeline.frag = basicFragmentShader;
	pipeline.framebuffer = &renderBuffer;
	pipeline.uniform = { 0 };
}

void SoftwareRendererSystem::processEvent(const SDL_Event &ev, bool &causesExit)
{
	switch(ev.type) {
	case SDL_QUIT: causesExit = true; break;
	default: break;
	}
}

void SoftwareRendererSystem::updateLogic()
{

}

static const BasicVertexIn vertices[] = {
	{ glm::vec3(0.0f, 1.0f,0.f), glm::vec4(1.0f, 0.0f, 0.0f, 1.0f) },
	{ glm::vec3(-1.0f, -1.0f,0.0f), glm::vec4(0.0f, 1.0f, 0.0f, 1.0f) },
	{ glm::vec3(1.0f, -0.8f,0.0f), glm::vec4(0.0f, 0.0f, 1.0f, 1.0f) }
};

void SoftwareRendererSystem::render()
{
	pipeline.renderTriangle(vertices[0],vertices[1],vertices[2]);
	SDL_UpdateTexture(framebuffer.get(), nullptr, renderBuffer.getRawPixels(), renderBuffer.getStride() );
	SDL_RenderCopy(renderer.get(), framebuffer.get(), nullptr, nullptr);
	SDL_RenderPresent(renderer.get());
	SDL_Delay(16);
}

And voila, we got our first triangle!

Next time, we'll be looking into Z-Buffers and rendering more complex shapes, with some textures added too.

As before, this tutorial is also uploaded onto Github.

2
$ 2.02
$ 1.99 from @TheRandomRewarder
$ 0.03 from @Geri
Sponsors of Metalhead33
empty
empty
empty
Avatar for Metalhead33
2 years ago
Topics: Programming, 3D, Graphics
Enjoyed this article?  Earn Bitcoin Cash by sharing it! Explain
...and you will also help the author collect more tips.

Comments