What's Wrong with NVIDIA?

Getting to the meat of the problem, how can it be that NVIDIA could perform so poorly in a native DirectX 9 code path and do better, but not extremely, in their own special "mixed mode." In order to understand why, we have to look at the modifications that Valve made to the NV3x code path; taken directly from Gabe Newell's presentation, here are the three major changes that were made:

Special Mixed Mode for NV3x
- Uses partial-precision registers where appropriate
- Trades off texture fetches for pixel shader instruction count (this is actually backwards, read further to learn more)
- Case-by-case shader code restructuring

So the first change that was made is to use partial-precision registers where appropriate. Well, what does that mean? As we've mentioned in previous articles, NVIDIA's pixel shading pipelines can either operate on 16 or 32-bit floating point numbers, with the 32-bit floats providing greater precision. Just like on a CPU, the actual FPUs that are present in the pixel shader units have a fixed number of local storage locations known as registers. Think of a register as nothing more than a place to store a number. With the NV3x architecture, each register can either hold one 32-bit floating point value or it can be used as two 16-bit floating point registers. Thus, when operating in 16-bit (aka partial precision) mode, you get twice as many physical registers as when you're running in 32-bit mode.

Note that using 32-bit floating point numbers doesn't increase the amount of memory bandwidth you're using. It simply means that you're cutting down the number of physical registers to which your pixel shader FPUs have access. What happens if you run out of registers? After running out of registers, the functional units (FPUs in this case) must swap data in and out of the graphics card's local memory (or caches), which takes a significantly longer time - causing stalls in the graphics pipeline or underutilization of the full processing power of the chip.

The fact that performance increased when moving to partial-precision (16-bit) registers indicates that NVIDIA's NV3x chips may have fewer usable physical registers than ATI's R3x0 series. If we're correct, this is a tradeoff that the NVIDIA engineers have made and it is to conserve die space, but we're not here to criticize NVIDIA's engineers, rather explain NVIDIA's performance here.

 

Next, Gabe listed the tradeoff in pixel shader instruction count for texture fetches. To sum this one up, the developers resorted to burning more texture (memory) bandwidth instead of putting a heavier load on computations in the functional units. Note that this approach is much more similar to the pre-DX9 method of game development, where we were mainly memory bandwidth bound instead of computationally bound. The fact that NVIDIA benefited from this sort of an optimization indicates that the NV3x series may not have as much raw computational power as the R3x0 GPUs (whether that means that it has fewer functional units or it is more picky about what and when it can execute is anyone's guess).

The final accommodation Valve made for NVIDIA hardware was some restructuring of shader code. There's not much that we can deduce from this other than the obvious - ATI and NVIDIA have different architectures.

ATI & Valve - Defining the Relationship Improving Performance on NVIDIA
Comments Locked

111 Comments

View All Comments

  • uturnsam - Friday, November 28, 2003 - link

    #110 continued
    Now I know why the guy behind the counter told me to steer clear of the ATI Radeon cards because of the known compatability problems when running games.

    (Computer sales guy thinking-I just read the article in the AnandTech post)

    Translated: I have a shit load of Nvidia cards and if I don't lie my ass off to my Customer's it will be game over for me!!!

    The only reason I started looking at ATI cards was I decided to spend what I saved on the CRT monitor (over the $$LCD) for higher performer card. Mr $Sales$ had me convinced I would be buying an inferior card with ATI. Worth shopping around and scouring reviews :O)
  • uturnsam - Friday, November 28, 2003 - link

    I was going to buy a Geforce5600 but looked at a 9600Pro today the thing is I was wondering if I should really blow the budget and lash out on a 9800Pro.
    I am so glad I came across this article I will stick with the 9600Pro, save some cash, sleep better at night and know when half life 2 is released I will be getting the best performance for the outlay.

  • Anonymous User - Thursday, October 16, 2003 - link

    you can count on your 9500 being in between the 9800 and the 9600, about 30% frame rate above the 9600. the 4 pipelines will help.
  • Anonymous User - Tuesday, September 30, 2003 - link

    I would like to see a test of the dx8 paths on some of the really older cards for those of us who are too broke for these new ones!!

    For instance, I have a geforce2 GTS that I love very much and works just fine on everything else. I don't want to have to upgrade for one game.
  • Anonymous User - Sunday, September 21, 2003 - link

    I would like to see how they compare with a 5900 using Detonator 44.03 driver. Yes I know its an older driver. But in my tests it provided higher benchmarcks than the 45.23 driver.

    Has any body else noticed this?
  • Anonymous User - Friday, September 19, 2003 - link

    So actually Nvidia shader(16/32) are not
    comparable with ATI shader(24-ms dx9 standard)!
    Too bad in a way or another they try to cheat
    again and again.......
    Very bad idea!
  • Anonymous User - Tuesday, September 16, 2003 - link

    #104, the benchmarks and anand's analysis show that hl2 is gpu power limited, not memory/fillrate limited... the 9600 will be limited more by that than by memory or fillrate.
  • Anonymous User - Monday, September 15, 2003 - link

    I think #84 mentioned this, but I didn't see a reply. In the benches, the 9600 pro pulled the exact same (to within .1 fps, which could just be roundoff error) frame rates at 1024 and 1280.

    I don't think I've ever seen a card bump up res without taking a measurable hit (unless it was cpu-limited). In every other game, the 9600 takes a hit going from 1024 to 1280. And the 9700 and 9800 slow down when the resolution goes up, even though they're basically the same architecture. Someone screwed up, either the benchmarks or the graphs.
  • Anonymous User - Monday, September 15, 2003 - link

    #61 Did you take the time to see that valve limited their testing use. Anandtech had no say in all the tests because they were very time limited. Also, try to make coherent sentences.
  • Anonymous User - Sunday, September 14, 2003 - link

    It's not as if GIFs gobble bandwidth, I (as CAPTAIN DIALUP) don't even notice them loading. They're tiny. Even though I don't have trouble receiving this Flash stuff, it pisses me off, because sometimes the same scores will load for all the pages. Why not have a poll or something on this?

Log in

Don't have an account? Sign up now