| Almost everyone must have heard about MMX one way or another and the only thing most
know is that they must have it or their machine will be inferior somehow. So what's behind
it, what's it useful for and what's happening now that all new processors have MMX? I'll
be taking you through these questions and hopefully more..
Introduction To SIMD And MMX
The main idea behind MMX and all of the new 'Multimedia Instructions Sets' is that a
lot of data of the same type is processed, e.g. pixels in images, samples of sound or
polygons in a 3D environment. The software normally performs the same action on all of
these things when it's processing them, e.g. changing a color, adding an echo or
transforming vertices for display on screen. Generally it's faster to deal many of these
things at once than to do them each one after the other, and this is what MMX and the
others allow.
In the past 30 years many supercomputers have used a technique for processing large
amounts of data at the same time called vector processing. This means taking an
array of items and performing an operation on all of the elements very efficiently. To do
this, many had registers that stored not just one value but many, so perhaps 16 values
could be operated on instead of just one. This is a very effective technique for large
scale simulations and a lot of time was invested in making compilers that could recognize
where it was safe to do this in programs and also producing memory systems which could
supply data fast enough.
Nowadays, as has happened many times, our personal computers are only just catching up
with the designs of the supercomputers from decades ago. This approach is generally called
SIMD, meaning Single Instruction, Multiple Data. One instructions is used to
operate on several separate values at the same time.
To exploit this idea, Intel used the 8 entry stack of 80 bit floating point registers
of the x86 and made instructions which operate on them as 8 64 bit registers. Furthermore,
for most instructions these registers are split into smaller parts, so a register can
represent the following things:
| 16 bits |
16 bits |
16 bits |
16 bits |
| 8 bits |
8 bits |
8 bits |
8 bits |
8 bits |
8 bits |
8 bits |
8 bits |
An instruction such as paddw (packed add words) would operate
like so:
This lets us add four 16-bit values as fast as we could normally add one. It's obvious
at this stage to see how MMX can speed up many functions - in theory, some image
processing operations could run 8 times faster.
The second advantage is saturating arithmetic - it doesn't involve getting wet,
but it does allow us to add two values and knowing that they will not wraparound - e.g. if
we want to add two pixel values. As they are 8 bit numbers the maximum value they can hold
is 255, so if we add two of these values and they come to something over 255 the results
will be the amount by which they were over 256.
E.g. 100 + 200 would be 44, because 100 + 200 = 300, and 300 - 256 = 44.
If we are adding pixel values, we probably don't want this - we'd prefer if they just got
as bright as they could and stayed at that value, so if we add 250 and 50 we'd want 255,
our maximum brightness. This is where saturating arithmetic comes in. If we use paddusb
(packed unsigned add with saturation), we can add 8 pixel values to 8 others and know that
all the pixels have stayed in the range 0 to 255. In a normal program, we'd have to add
the values then check if they were over 255, then change then, which would take much more
time than just doing a simple addition.
Lastly, MMX can do basic shuffling of the data in its registers with the pack and
unpack instructions. These allow the processor to expand or contract parts of registers so
that they can fit in different formats of MMX registers, e.g. 16 bit word -> 32 bit
dword. What are these instructions good for? Well they are mostly used to change the
format of data, but they can be used to quickly transpose (flip along the diagonal axis) a
2D block of pixels or other values.
So these are the advantages of MMX, what's wrong with it?
The Drawbacks
Well firstly, although the supercomputers had advanced compilers that could work out
when to use vector processing, personal computers don't, so all the work has to be done by
programmers writing the code in assembly, using instructions like the ones I have
mentioned. I can tell you for sure this takes much longer than writing in C or C++, so
this is one good reason why not all applications use MMX even if it would help.
Next, not all operations work on all types of data, e.g. you can't multiply 8 bit
values together. This means there is more overhead converting things and making sure all
the values are handled correctly. The memory system is still the same too, so the
processor can still only move the same amount of data to and from the screen in a given
time.
Also, because MMX temporarily hijacks the floating point registers, they can't be used
to do floating point operations at the same time (safely at least). And unfortunately, MMX
seems to vandalize the registers so much that it takes hundreds of cycles for the
processor to reset them when it has to do floating point instructions again. This means
you can't just use floating point math when you need it then do other things right away
with MMX. It doesn't mean you can't do lots of floating point stuff for a while and then
change over and do lots of MMX things - that's a quite good approach. I think it's how the
Unreal software renderer operates - it probably works out where all the polygons should be
with standard floating point operations and then switches to MMX code to do texturing and
transparency effects.
Finally, and most importantly for all the people who play 3D games or design 3D
objects, MMX only works on integer values, with a preference for 16 bit values at that. A
polygon based environment normally using floating point values of at least 32 bits in size
so that it can represent everything from the tiniest details right up to things the size
of the world (like the background at the horizon). It's not possible to represent all this
and retain accuracy through all of the calculations if you are only using 16 bits.
Moving Beyond MMX
So, now everyone has been convinced to buy a processor that runs Photoshop slightly
faster, what now?
Well, as mentioned, MMX is not great for 3D graphics, especially when you have a
dedicated 3D card to draw the pixels - but this is why people want really fast machines
for their home!
The responses to this by the manufacturers are the second generation of multimedia
instructions sets, which improve the previous efforts and most importantly add the ability
to operate on multiple single precision floating point values. Now we can seriously think
about speeding up 3D and sending out polygons as fast as the new 3D accelerators (claim
to) deal with them.
Although there has been some support for this kind of thing in MIPS and SPARC
processors for some time, only now are we finding personal computer chips with this
feature.
|