MMX And Beyond







Almost everyone must have heard about MMX one way or another and the only thing most know is that they must have it or their machine will be inferior somehow. So what's behind it, what's it useful for and what's happening now that all new processors have MMX? I'll be taking you through these questions and hopefully more..

 

Introduction To SIMD And MMX

The main idea behind MMX and all of the new 'Multimedia Instructions Sets' is that a lot of data of the same type is processed, e.g. pixels in images, samples of sound or polygons in a 3D environment. The software normally performs the same action on all of these things when it's processing them, e.g. changing a color, adding an echo or transforming vertices for display on screen. Generally it's faster to deal many of these things at once than to do them each one after the other, and this is what MMX and the others allow.

In the past 30 years many supercomputers have used a technique for processing large amounts of data at the same time called vector processing. This means taking an array of items and performing an operation on all of the elements very efficiently. To do this, many had registers that stored not just one value but many, so perhaps 16 values could be operated on instead of just one. This is a very effective technique for large scale simulations and a lot of time was invested in making compilers that could recognize where it was safe to do this in programs and also producing memory systems which could supply data fast enough.

Nowadays, as has happened many times, our personal computers are only just catching up with the designs of the supercomputers from decades ago. This approach is generally called SIMD, meaning Single Instruction, Multiple Data. One instructions is used to operate on several separate values at the same time.

To exploit this idea, Intel used the 8 entry stack of 80 bit floating point registers of the x86 and made instructions which operate on them as 8 64 bit registers. Furthermore, for most instructions these registers are split into smaller parts, so a register can represent the following things:

64 bits

32 bits 32 bits

16 bits 16 bits 16 bits 16 bits

8 bits 8 bits 8 bits 8 bits 8 bits 8 bits 8 bits 8 bits

An instruction such as paddw (packed add words) would operate like so:

1 2 3 4
+
0 1 2 3
=
1 3 5 7

This lets us add four 16-bit values as fast as we could normally add one. It's obvious at this stage to see how MMX can speed up many functions - in theory, some image processing operations could run 8 times faster.

The second advantage is saturating arithmetic - it doesn't involve getting wet, but it does allow us to add two values and knowing that they will not wraparound - e.g. if we want to add two pixel values. As they are 8 bit numbers the maximum value they can hold is 255, so if we add two of these values and they come to something over 255 the results will be the amount by which they were over 256.

E.g. 100 + 200 would be 44, because 100 + 200 = 300, and 300 - 256 = 44.

If we are adding pixel values, we probably don't want this - we'd prefer if they just got as bright as they could and stayed at that value, so if we add 250 and 50 we'd want 255, our maximum brightness. This is where saturating arithmetic comes in. If we use paddusb (packed unsigned add with saturation), we can add 8 pixel values to 8 others and know that all the pixels have stayed in the range 0 to 255. In a normal program, we'd have to add the values then check if they were over 255, then change then, which would take much more time than just doing a simple addition.

Lastly, MMX can do basic shuffling of the data in its registers with the pack and unpack instructions. These allow the processor to expand or contract parts of registers so that they can fit in different formats of MMX registers, e.g. 16 bit word -> 32 bit dword. What are these instructions good for? Well they are mostly used to change the format of data, but they can be used to quickly transpose (flip along the diagonal axis) a 2D block of pixels or other values.

So these are the advantages of MMX, what's wrong with it?

 

The Drawbacks

Well firstly, although the supercomputers had advanced compilers that could work out when to use vector processing, personal computers don't, so all the work has to be done by programmers writing the code in assembly, using instructions like the ones I have mentioned. I can tell you for sure this takes much longer than writing in C or C++, so this is one good reason why not all applications use MMX even if it would help.

Next, not all operations work on all types of data, e.g. you can't multiply 8 bit values together. This means there is more overhead converting things and making sure all the values are handled correctly. The memory system is still the same too, so the processor can still only move the same amount of data to and from the screen in a given time.

Also, because MMX temporarily hijacks the floating point registers, they can't be used to do floating point operations at the same time (safely at least). And unfortunately, MMX seems to vandalize the registers so much that it takes hundreds of cycles for the processor to reset them when it has to do floating point instructions again. This means you can't just use floating point math when you need it then do other things right away with MMX. It doesn't mean you can't do lots of floating point stuff for a while and then change over and do lots of MMX things - that's a quite good approach. I think it's how the Unreal software renderer operates - it probably works out where all the polygons should be with standard floating point operations and then switches to MMX code to do texturing and transparency effects.

Finally, and most importantly for all the people who play 3D games or design 3D objects, MMX only works on integer values, with a preference for 16 bit values at that. A polygon based environment normally using floating point values of at least 32 bits in size so that it can represent everything from the tiniest details right up to things the size of the world (like the background at the horizon). It's not possible to represent all this and retain accuracy through all of the calculations if you are only using 16 bits.

 

Moving Beyond MMX

So, now everyone has been convinced to buy a processor that runs Photoshop slightly faster, what now?

Well, as mentioned, MMX is not great for 3D graphics, especially when you have a dedicated 3D card to draw the pixels - but this is why people want really fast machines for their home!

The responses to this by the manufacturers are the second generation of multimedia instructions sets, which improve the previous efforts and most importantly add the ability to operate on multiple single precision floating point values. Now we can seriously think about speeding up 3D and sending out polygons as fast as the new 3D accelerators (claim to) deal with them.

Although there has been some support for this kind of thing in MIPS and SPARC processors for some time, only now are we finding personal computer chips with this feature.

 

 
 

 

 
     
   

 

 
   

 
     
 

                   

 
   

 

 
 
Last Updated 02-01-2001

All trademarks used are properties of their respective owners.
Copyright © 1998-2000 Adrian Wong. All rights reserved.

 
Visit the new Tech ARP @ http://www.techarp.com/ !