AMD's 3DNow! - cheap, fast,
available 'Now!'
AMD's 3DNow! Technology (which used to be called the much less annoying AMD-3D) has
been released in the AMD K6-2 and both Cyrix and IDT are expected to use 3DNow! in their
new CPUs. It uses the existing MMX registers to store its values and adds support for a 2
single precision floating point format.
An advantage of using the MMX registers which as you remember are actually the floating
point registers is that any operating system which saves the floating point registers when
multi-tasking (i.e. almost all of them) will keep the programs values safe when switching
tasks. Otherwise an OS patch would be required.
| 32 bit float |
32 bit float |
These values can be added, subtracted, multiplied, converted to and from 32 bit integers
and also maximum, minimum, square roots and reciprocals can be calculated.
You may wonder (you do don't you?) why dividing is not included when it is required for
perspective in 3D graphics, well the reciprocal, or 1/x can be used instead. When
we want to divide y by x, instead of performing y/x, we would find 1/x
then multiply it by y giving y * (1/x). This is more efficient because a
reciprocal is easier to calculate than a divide and is just as useful if we wish to divide
two things by the same value.
E.g. if we want to find the position on the screen (x', y') of a point (x, y, z) we
would calculate:
x' = x * (1/z)
y' = y * (1/z)
And we would only need to calculate 1/z once.
In addition, the programmer has more control over the accuracy, because instead of one
reciprocal instruction, the instruction is composed of 3 parts, pfrcp, pfrcp1
and pfrcp2.
The first instruction, pfrcp finds the results correct to 14 bits of accuracy -
which if we are drawing on a screen of width up to 1024 (needing only 10 bits) is more
than enough. We can stop there if we have enough accuracy, but if the result is needed for
another calculation or if we need greater accuracy, say for anti-aliasing, we can apply pfrcp1
and pfrcp2 to get the full 24 bits of single precision accuracy.
The square root instructions, frequently used to calculate dynamic lighting values,
functions in exactly the same way.
Apart from a couple of extra MMX instructions (useful but not very important) , 3DNow!
adds instructions that allow data to be requested before being used, so that the processor
spends less time waiting for data from memory when it actually needs it. In applications
with lots of data, e.g. huge sound files, large images or many textures, this can make a
very noticeable difference. Part of the problem with MMX is that it added the ability to
process much more information but didn't help at all in getting the information to and
from the processor.
These new instructions are prefetch which simply instructs the memory hierarchy
to move some data into the processor cache, and prefetchw which does the same but
warns that the data will be written to.
Standard Memory Addressing
| [does some other stuff] |
|
[processor blocked] |
|
|
|
(Fetching data from main memory) |
|
Effects Of Prefetching
|
|
[does some other stuff] |
|
|
|
(Fetching data from main memory) |
(value ready) |
|
You can see how the delay of loading values from main memory into processor cache can be
hidden by prefetching.
The AMD K6-2 chip, as mentioned, has two separate 3DNow! units, one of which deals with
multiplies and second and third parts of reciprocals and square roots and the other which
does everything else, such as adds, subtracts, integer/float conversions etc.
This means though things can get slower if you want to multiply twice in a row, or add
twice or whatever, because only one unit is available. AMD have stated that the K6-2 can
perform 1.2 GFLOPS (or Giga (billion) FLoating point OPerations) per second. They work
this out by assuming that each instruction operates on two values, that two instructions
are executed at a time, and the clock speed is 300 MHz, so 2 x 2 x 300M = 1200 MFLOPS or
1.2 GFLOPS.
This peak performance will only be possible when the programmer has carefully written the
code to mix the instructions just right. Thankfully, in practise because most 3D matrix
maths involves lots of adding and multiplying, this should be fairly common. The doubling
of speed when running Quake is a strong testament to the power of 3DNow!
|