Quote
Emil Mikulic <darkm...@connexus.apana.org.au> wrote:
>Which is the fastest?
>XOR, MOV or INC/DEC
>I figure that XOR should be the fastest because
>bit-twiddling is a lot faster than MOVing a
>part of the memory or a still number to a register.
A quick look in the Turbo Assembler Quick Reference Guide shows that this would
depend on the CPU (86, 286, 386, 496, etc), the size of the operand (byte, word,
dword), and the type and location of the operand (register, memory, stack-frame,
etc).
I assume that XOR and MOV refer to clearing a register, so lets consider the
various ways to clear a register -
486 386 286 86 (size) example
MOV r8,im8 1 2 2 4 2 mov AL,0
MOV r16,im16 1 2 2 4 3 mov AX,0
no flags altered
XOR r8,r8 1 2 2 3 2 xor AL,AL
XOR r16,r16 1 2 2 3 2 xor,AX,AX
affects O,S,Z,A,P,C
SUB r8,r8 1 2 2 3 2 sub AL,AL
SUB r16,r16 1 2 2 3 2 sub AX,AX
affects O,S,Z,A,P,C
lea r16,m 1 2 2 10 4 lea AX,[DS:0]
no flags altered
I tossed the last one in for the fun of it.
Anyway, the differences you assumed will only exist in the 8086. All other
versions of the x86 require the same number of clock cycles across the board.
Actual speed is based upon a lot more. The above times assume:
1) the instruction has been pre-fetched, decoded, and is ready for execution.
2) Bus cycles do not require wait states.
3) There are no bus HOLD requests delaying the processor.
4) No exceptions are detected during instruction execution.
5) Memory operands are aligned.
A series of fast executing (fewer than two cycles per opcode byte) instructions
can drain the prefetch queue and increase execution time. With typical
instruction mixes, actual time will be within 5-10% of the sum of the individual
times. Of course the 8k onboard cache of the 486 would tend to limit penalties
for fast executing instructions.
From the above you can see that most of the time you spend worrying about which
instructions execute faster would be better spent on other activities. Even the
1-cycle difference in the 8086 XOR and MOV instructions can easily be canceled
when used with too many fast instructions that drain the 4-byte 8088, 6-byte
8086, ... prefetch queue. Then again, 1 single interrupt can upset the delicate
balance of things.
So when you see myself or others using "xor AL,AL" or "sub AL,AL" to zero a
register it may simply be from many years of habit, or it could really be an
efficient way to clear the register and the carry flag (e.g. mov AL,0; clc).
Quote
>Correct me if I'm wrong.
>Please, reply by MAIL as I don't visit the newsgroup
>often.
I think not. Normally I would have cc'd a copy, but the selfish nature of this
request got the better of me today. Must be a touch of indigestion. ;-)
....red