NVIDIA's Suspicious Lovelace Lineup
On September 20, 2022, NVIDIA announced its new Lovelace video cards: the RTX 4070 (<-- I'll get to this), 4080, and 4090. The Lovelace cards offer higher transistor density, upgraded tensor cores, and upgraded RT cores. If we are to go by NVIDIA's word, supposedly the Lovelace will provide up to 2x the AI and ray-tracing performance. The RTX 4090 is an absolute chonker and has a total board power of 450W. I would imagine some of the AIB versions of the 4090 to take up 4 slots.
In addition, NVIDIA announced DLSS 3.0 which does the same thing as DLSS 2.X plus frame interpolation. The construction of additional frames help increase the framerate and effectively, the smoothness in gameplay. RTX Remix allows old games to run with ray-tracing on via mods which completely change the way they look.
On the surface level, it looks like Nvidia has a winner in its hands. However, when looking past the surface level, there are numerous aspects from the company's keynote and advertising that made me suspicious.
Comparing Apples to Oranges
If you look at NVIDIA's Lovelace page, you might have come across this performance comparison graph:
At first glance, this looks mightily impressive. The RTX 4090 can dish out double or even quadruple the performance of a 3090 Ti. But upon a closer look at the fine print, it becomes clear that this is not an apples-to-apples comparison. Underneath the graph, we see the phrase "DLSS Frame Generation on RTX 40 series". NVIDIA confirmed that only the Lovelace cards can run DLSS 3.0 because supposedly, the Ampere (and Turing) cards do not have powerful enough tensor cores to handle the frame generation portion.
Because DLSS 3.0 has the unique ability of creating additional frames, that will inherently skew the results in 3.0's favor over DLSS 2.4. It tells me little about the actual Lovelace chips themselves. Sure, there is more to GPU performance than just rasterization, but how does Lovelace stack up against Ampere in a like-for-like workload?
I'll be fair to NVIDIA. If there's an opportunity to gain more performance for minimal image quality penalty, there is little excuse to not take up on that offer. On top of that, the ray-tracing performance is definitely an upgrade over Ampere. That being said, I also have a bunch of questions regarding DLSS 3.0.
Frame (and Latency) Generation
DLSS frame generation is not a new concept. In fact, you may have already experienced frame generation before and the technology has been around for several years. Many televisions have frame interpolation technology like LG's TruMotion that can add new frames in between existing ones to increase smoothness. The added smoothness is what many people refer to as the "soap opera" effect.
What NVIDIA is trying to accomplish is that, but on steroids. The newest tensor cores and optical flow accelerator are substantially more powerful than what televisions contain under the hood. Theoretically, that means the Lovelace cards can perform frame interpolation more efficiently and accurately. In a Cyberpunk 2077 demonstration, DLSS 3.0 managed to almost quadruple the framerate.
That amount of "free performance" sounds fantastic, but there are some caveats that need to be mentioned. Frame construction inherently adds latency. Adding new frames in between existing ones always take time. Even if the interpolation is efficient, you will always add latency. NVIDIA Reflex can mitigate it, but you will not get a net negative latency. As NVIDIA Vice President of Applied Deep Learning Research Bryan Catanzaro stated:
NVIDIA Reflex removes significant latency from the game rendering pipeline by removing the render queue and more tightly synchronizing the CPU and GPU. The combination of NVIDIA Reflex and DLSS3 provides much faster FPS at about the same system latency.
This may pose a problem in action packed games that require fast reflexes and precise timing. Let's say DLSS 3.0 boosts a game's framerate from 30fps to 60fps. While the game has the perceived smoothness of 60fps, it will have the responsiveness of 30fps. That's not an issue in something like a turn-based RPG, but that's a big no-no when it comes to shooters and hack n' slash games.
The other big reservation I have is how well will DLSS 3.0 handle "unpredictable" frames. The Cyberpunk 2077 demonstration merely has the car and motorcyclist driving in a straight line. Camera movement was non-existent and the game environment was sterile. One of the other weaknesses of frame interpolation is that if it guesses the frames incorrectly, then you will get artifacts. The demonstration is too much of a softball to form an accurate assessment and requires independent testing.
The 4080 that's not Really a 4080
During the keynote, NVIDIA announced two 4080 SKUs. If you were to just look at the presentation slides, you would assume that the only difference between the two SKUs would be the RAM. One has 16GB whereas the other has 12GB. But the differences go way beyond that.
These are two entirely different chips with entirely different RAM configurations. The RTX 4080 16GB version uses the AD103 chip and has 26.7% more CUDA cores than the 12GB version which uses the AD104 chip. There's not just a difference in RAM capacity, but also in the memory bus and speed. The cumulative differences lead to a 31.5% reduction in memory bandwidth. In other words, the 16GB and 12GB versions are two completely separate performance tiers. You know which Ampere GPU housed the GA104 chip? The RTX 3070 while the RTX 3080 housed the GA102 chip.
Ultimately, the 4080 12GB version is a RTX 4070 in disguise and even worse, Nvidia is selling it for $899. That's nearly double the RTX 3070's original MSRP of $499. The value of the 16GB of the 4080 is pretty atrocious, too, at a $1199 MSRP compared to the RTX 3080's original $699.
If you are to compare Lovelace's prices to Ampere's current rates, it's pretty clear that NVIDIA is trying to clear Ampere stock. The 3090 Ti Founder's Edition goes for around $1099, slotting it in between the 4080's. A few AIB 3080 Ti's have been sold for $859 or less. JPR (Jon Peddie Research) published its Q2 2022 GPU report and NVIDIA, AMD, and Intel all saw significant declines in GPU shipments. NVIDIA, in particular, saw the worst of the declines.
Closing Thoughts: Don't Be Too Hopeful with AMD
From what I can gather, it looks like the Lovelace keynote was not all that well received. The YouTube upload of the keynote has 15 thousand dislikes to 31 thousand likes which is not a favorable ratio. A majority of Hardware Unboxed's viewers find the RTX 40 series to be of poor value in a poll of 42 thousand respondents.
With AMD slated to launch its RDNA3 cards on November 3, many are pinning their hopes on NVIDIA's competitor to offer much better value. I do not doubt that RDNA3 will be very competitive against Lovelace in terms of rasterization and perhaps, shrink the gap in ray-tracing. FSR 2.1 appears to be a significant improvement over 2.0 and it is platform agnostic. If AMD's >50% power efficiency improvement claim over RDNA2 is true, then RDNA3 looks to be more efficient than Lovelace.
But corporations are not your friends. We have already seen AMD raise its CPU prices with Zen 3 when Intel's Rocket Lake could not compete. AMD has a good opportunity to claw some marketshare away from NVIDIA. On the other hand, AMD may just be content to raise the margins and go with a "Undercut each performance tier by $50" strategy. Not to mention, the laptop market is more lucrative than the DIY enthusiast market, meaning AMD can just gain GPU marketshare by selling laptop chips.
And with Intel's future with discrete GPUs uncertain, there is no other regulating force that can drive prices down. The most we can do is wait until November 3 and see what happens.