MMX And Beyond

 






AMD's 3DNow! - cheap, fast, available 'Now!'

AMD's 3DNow! Technology (which used to be called the much less annoying AMD-3D) has been released in the AMD K6-2 and both Cyrix and IDT are expected to use 3DNow! in their new CPUs. It uses the existing MMX registers to store its values and adds support for a 2 single precision floating point format.

An advantage of using the MMX registers which as you remember are actually the floating point registers is that any operating system which saves the floating point registers when multi-tasking (i.e. almost all of them) will keep the programs values safe when switching tasks. Otherwise an OS patch would be required.

32 bit float 32 bit float


These values can be added, subtracted, multiplied, converted to and from 32 bit integers and also maximum, minimum, square roots and reciprocals can be calculated.
You may wonder (you do don't you?) why dividing is not included when it is required for perspective in 3D graphics, well the reciprocal, or 1/x can be used instead. When we want to divide y by x, instead of performing y/x, we would find 1/x then multiply it by y giving y * (1/x). This is more efficient because a reciprocal is easier to calculate than a divide and is just as useful if we wish to divide two things by the same value.

E.g. if we want to find the position on the screen (x', y') of a point (x, y, z) we would calculate:
  x' = x * (1/z)
  y' = y * (1/z)
And we would only need to calculate 1/z once.

In addition, the programmer has more control over the accuracy, because instead of one reciprocal instruction, the instruction is composed of 3 parts, pfrcp, pfrcp1 and pfrcp2.

The first instruction, pfrcp finds the results correct to 14 bits of accuracy - which if we are drawing on a screen of width up to 1024 (needing only 10 bits) is more than enough. We can stop there if we have enough accuracy, but if the result is needed for another calculation or if we need greater accuracy, say for anti-aliasing, we can apply pfrcp1 and pfrcp2 to get the full 24 bits of single precision accuracy.

The square root instructions, frequently used to calculate dynamic lighting values, functions in exactly the same way.

Apart from a couple of extra MMX instructions (useful but not very important) , 3DNow! adds instructions that allow data to be requested before being used, so that the processor spends less time waiting for data from memory when it actually needs it. In applications with lots of data, e.g. huge sound files, large images or many textures, this can make a very noticeable difference. Part of the problem with MMX is that it added the ability to process much more information but didn't help at all in getting the information to and from the processor.

These new instructions are prefetch which simply instructs the memory hierarchy to move some data into the processor cache, and prefetchw which does the same but warns that the data will be written to.

 

Standard Memory Addressing

[does some other stuff]
 load value 
[processor blocked]
 process value 
(Fetching data from main memory)

 

Effects Of Prefetching

 prefetch value 
[does some other stuff]
 load value 
 process value 
(Fetching data from main memory) (value ready)


You can see how the delay of loading values from main memory into processor cache can be hidden by prefetching.

The AMD K6-2 chip, as mentioned, has two separate 3DNow! units, one of which deals with multiplies and second and third parts of reciprocals and square roots and the other which does everything else, such as adds, subtracts, integer/float conversions etc.

This means though things can get slower if you want to multiply twice in a row, or add twice or whatever, because only one unit is available. AMD have stated that the K6-2 can perform 1.2 GFLOPS (or Giga (billion) FLoating point OPerations) per second. They work this out by assuming that each instruction operates on two values, that two instructions are executed at a time, and the clock speed is 300 MHz, so 2 x 2 x 300M = 1200 MFLOPS or 1.2 GFLOPS.
This peak performance will only be possible when the programmer has carefully written the code to mix the instructions just right. Thankfully, in practise because most 3D matrix maths involves lots of adding and multiplying, this should be fairly common. The doubling of speed when running Quake is a strong testament to the power of 3DNow!

 

 
 

 

 
     
   

 

 
   

 
     
 

                   

 
   

 

 
 
Last Updated 01-01-2001

All trademarks used are properties of their respective owners.
Copyright © 1998-2000 Adrian Wong. All rights reserved.

 
Visit the new Tech ARP @ http://www.techarp.com/ !