Board index » delphi » 32bit Move() and Fillchar() replacement

32bit Move() and Fillchar() replacement

In article <Pine.SUN.3.91.950906103313.16872F-100000@verdi>,
Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:

Quote
>On 5 Sep 1995, Trixter / Hornet wrote:
>Kvantii wrote also:
>> >Simple enough... :)  Anyway, this was more info than what you had
>> >requested, but oh well... (Oh, i reccomend that you rewrite move() and
>> >fillchar() like I did, since I made it swtich to 16-bit math instead of
>> >8-bit :)
>> *That* is a smart move.  :)
>If anyone wants to see my "improved" move() and fillchar() which I put
>into a unit which I almost always link in I'd be happy to post it.  (I
>know that TCA has an even better one though, in case he'd like to post
>his version as well)

Oh, fine.  Now that the world knows, I suppose I have to post it ;)

This is a 32bit move() replacement.  Just throw the procedure in at the
top of your program, and all your move() calls will work a little faster
(about 4 times faster for counts above 128 bytes).

procedure move(var Source,Dest; Count:word); assembler;
asm
        push    ds
        les     di,Dest
        lds     si,Source
        mov     cx,count
        mov     bx,cx
        shr     cx,2
        db 66h; rep movsw
        and     bx,3
        mov     cx,bx
        rep     movsb
        pop     ds
end;

The fillchar() replacement isn't as nice, since with fillchar you can do
things like: fillchar(foo,128,'H'); or fillchar(foo,128,TRUE);  
That makes conversions more difficult.  Still, for those of you who don't
do that:

procedure fillchar(var X; Count:word; Value:byte); assembler;
asm
        les     di,X
        mov     cx,count
        mov     al,value
        mov     ah,al
        mov     bx,ax
        db 66h; shl ax,16
        mov     ax,bx
        mov     bx,cx
        shr     cx,2
        db 66h; rep stosw
        and     bx,3
        mov     cx,bx
        rep     stosb
end;

To get around the character or boolean types, do ord('H') or ord(TRUE).
(It all compiles to the same code).

This procedure will only increase the speed of fillchar() calls when the
Count is above 128 or so.  It will also fillchar to pointers faster than
arrays (since pointers are aligned, and arrays are [usually] not).  I
just wrote the fillchar() procedure, so it is not as optimized as it
could be.

Quote
>> (Trixter / Hornet)
>Quantum Porcupine

--TCA of NewOrder
newor...@carina.unm.edu
http://www.nmt.edu/~surface

Gotta love those names.

 

Re:32bit Move() and Fillchar() replacement


On 7 Sep 1995, NewOrder Demo Group wrote:

Quote
> In article <Pine.SUN.3.91.950907010140.22861C-100000@verdi>,
> Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:
> >On 6 Sep 1995, NewOrder Demo Group wrote:

> [snip]

> >I remember yours looking a little different then that, maybe I'm thinking
> >of fillchar (later on)... but not to be picky, since dx isn't used,
> >couldn't you use mov dx, ds and mov ds, dx instead of push and pop
> >respectively?

> It takes the same amount of time either way, unless you are in 32bit PM,
> in which case pushing DS actually pushes a doubleword (to keep the
> stack aligned).  Then it would be faster.  Although, I suppose it would
> have been better if I had used dx to store ds (I just wasn't thinking).

Well, I just noticed that dx was never used in your code, and there were
no muls or divs to wipe it out or anything... :)

Quote
> >Hmmm... you must have rewritten both of them since I last saw them, as I
> >remember something utilizing adc in one of those... oh well, thse look
> >more logical anyway :)

> Actually, I don't use either one [very much].  I had one place in Wally
> that uses move, and no places that use fillchar (although I did throw the
> new move() in just in case I use it more often).  As for the lack of ADC,
> I coded them about 1 minute before the post (to test them out).  I'd been
> optimizing code for the 486 lately and so I was stuck in that 'mode' of
> thinking.  To get maximum speed out of a pentium, I would have done some
> different things.  To get absolute maximum speed (for 32bit PM), there
> are some other neat tricks.

Oh I see... :)

- Show quoted text -

Quote
> For example:

> lea ebx,[edi+ecx]
> mov edx,[ebx]
> shr ecx,2
> inc ecx
> rep movsd
> mov [ebx],edx

> works great in 32bit mode, but horribly in 16bit mode.  Linus (of Linux fame)
> has said that using all the registers in an unrolled pseudo loop runs
> fastest on the pentium, but I'd hate to handle the overhead associated
> with saving all the regs and setting that up.  Unless you are moving
> megabytes of memory around (which you shouldn't be), then I don't see
> why anyone would use it (other than Linus :).

Well, in 16-bit mode the processor wouldn't want to handle e?x registers
you know :)

Quote
> >How about I put up my fillchar as well, just for comparison's sake (it's,
> >once again, only 16-bit, but it seems to work just fine anyway :)

> >procedure FillChar (var src, dest; count : word; value : byte); assembler;

> >asm
> >  mov bx, count
> >  mov cx, bx
> >  shr cx, 1
>   ^^^^^^^^^^^^^^this could be a little faster by doing:
>    shr cx,33

That doesn't make any sense whatsoever.  How would you be ABLE to shift
right 33 bits? Or does the processor do carryovers from bx or something?
(if it does then I wasn't aware of this fact)...

- Show quoted text -

Quote

> >  mov al, value
> >  mov ah, al

> >  les di, dest
> >  rep stosw

> >  and bx, 1
> >  jz @NoStosB
> >  stosb
> > @NoStosB:
> >end;

> We now resume you to your normally scheduled pascal coding questions.

Yeah, this is c.l.pascal.borland, isn't it? :)  oh well, the asm
statements ARE part of the compiler you know... :)  (And I've noticed
a couple of BASIC questions in passing lately which DEFINITELY don't
belong here... or I could have, once again, misread something :)

Quote
> --TCA of NewOrder
> newor...@carina.unm.edu
> http://www.nmt.edu/~surface

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Re:32bit Move() and Fillchar() replacement


On 6 Sep 1995, NewOrder Demo Group wrote:

Quote
> In article <Pine.SUN.3.91.950906103313.16872F-100000@verdi>,
> Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:
> >On 5 Sep 1995, Trixter / Hornet wrote:
> >Kvantii wrote also:
> >> >Simple enough... :)  Anyway, this was more info than what you had
> >> >requested, but oh well... (Oh, i reccomend that you rewrite move() and
> >> >fillchar() like I did, since I made it swtich to 16-bit math instead of
> >> >8-bit :)
> >> *That* is a smart move.  :)
> >If anyone wants to see my "improved" move() and fillchar() which I put
> >into a unit which I almost always link in I'd be happy to post it.  (I
> >know that TCA has an even better one though, in case he'd like to post
> >his version as well)

> Oh, fine.  Now that the world knows, I suppose I have to post it ;)

Thank you for so graciously accepting the terms before anyone wanted me
to post my much more elementary 16-bit version as below:

procedure move (var src, dest; count : word); assembler;

asm
  push ds
    mov bx, count
    mov cx, bx
    shr cx, 1

    les di, src
    lds si, dest
    rep movsw

    and bx, 1
    jz @NoMovsB
    movsb
   @NoMovsB:

  pop ds
end;

- Show quoted text -

Quote
> This is a 32bit move() replacement.  Just throw the procedure in at the
> top of your program, and all your move() calls will work a little faster
> (about 4 times faster for counts above 128 bytes).

> procedure move(var Source,Dest; Count:word); assembler;
> asm
>    push    ds
>    les     di,Dest
>    lds     si,Source
>    mov     cx,count
>    mov     bx,cx
>    shr     cx,2
>    db 66h; rep movsw
>    and     bx,3
>    mov     cx,bx
>    rep     movsb
>    pop     ds
> end;

I remember yours looking a little different then that, maybe I'm thinking
of fillchar (later on)... but not to be picky, since dx isn't used,
couldn't you use mov dx, ds and mov ds, dx instead of push and pop
respectively?

- Show quoted text -

Quote
> The fillchar() replacement isn't as nice, since with fillchar you can do
> things like: fillchar(foo,128,'H'); or fillchar(foo,128,TRUE);  
> That makes conversions more difficult.  Still, for those of you who don't
> do that:

> procedure fillchar(var X; Count:word; Value:byte); assembler;
> asm
>    les     di,X
>    mov     cx,count
>    mov     al,value
>    mov     ah,al
>    mov     bx,ax
>    db 66h; shl ax,16
>    mov     ax,bx
>    mov     bx,cx
>    shr     cx,2
>    db 66h; rep stosw
>    and     bx,3
>    mov     cx,bx
>    rep     stosb
> end;

Hmmm... you must have rewritten both of them since I last saw them, as I
remember something utilizing adc in one of those... oh well, thse look
more logical anyway :)

Quote
> To get around the character or boolean types, do ord('H') or ord(TRUE).
> (It all compiles to the same code).

> This procedure will only increase the speed of fillchar() calls when the
> Count is above 128 or so.  It will also fillchar to pointers faster than
> arrays (since pointers are aligned, and arrays are [usually] not).  I
> just wrote the fillchar() procedure, so it is not as optimized as it
> could be.

How about I put up my fillchar as well, just for comparison's sake (it's,
once again, only 16-bit, but it seems to work just fine anyway :)

procedure FillChar (var src, dest; count : word; value : byte); assembler;

asm
  mov bx, count
  mov cx, bx
  shr cx, 1

  mov al, value
  mov ah, al

  les di, dest
  rep stosw

  and bx, 1
  jz @NoStosB
  stosb
 @NoStosB:
end;

In other words it's my move() only with ax = 257*value and, well, a fill
instead of a move. :)

Quote
> >> (Trixter / Hornet)
> >Quantum Porcupine

> --TCA of NewOrder
> newor...@carina.unm.edu
> http://www.nmt.edu/~surface

> Gotta love those names.

And I doubt any one of us has any idea where the hell we got them from. :)

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Re:32bit Move() and Fillchar() replacement


In article <Pine.SUN.3.91.950908011838.21088C-100000@verdi>,
Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:

Quote
>Wow, and now I see why... :) but what about thhe 64-bit regs on the
>Pentium? (not that I've seen any Pentium compilers yet :)

There are no real 64-bit accessable registers on the pentium.  There
are a few internal 64-bit regs, but they get dumped to edx:eax.

--TCA of NewOrder
newor...@carina.unm.edu
http://www.nmt.edu/~surface

Re:32bit Move() and Fillchar() replacement


In article <42l60a$...@carina.unm.edu>,
NewOrder Demo Group <newor...@unm.edu> wrote:

Quote

>This is a 32bit move() replacement.  Just throw the procedure in at the
>top of your program, and all your move() calls will work a little faster
>(about 4 times faster for counts above 128 bytes).

Um, no.  All your Move() calls where both source and destination are DWord
aligned will be four times faster.  If either is merely word aligned, you
will get much less than four times speedup, and if either is not even word
aligned, you will get only a very tiny speedup.
--

http://www.armory.com/~jon                       Personal and Technical Pages
http://www.armory.com/~jon/hs/HomeSchool.html Home School Resource Pages

Re:32bit Move() and Fillchar() replacement


On 7 Sep 1995, NewOrder Demo Group wrote:

Quote
> In article <Pine.SUN.3.91.950907150149.3939E-100000@verdi>,
> Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:
> >On 7 Sep 1995, NewOrder Demo Group wrote:

> >> >procedure FillChar (var src, dest; count : word; value : byte); assembler;

> >> >asm
> >> >  mov bx, count
> >> >  mov cx, bx
> >> >  shr cx, 1
> >>   ^^^^^^^^^^^^^^this could be a little faster by doing:
> >>    shr cx,33
> >That doesn't make any sense whatsoever.  How would you be ABLE to shift
> >right 33 bits?

> Sure it does.  There are two forms of sh[l/r].  One is a two byte, three
> clock opcode (for shifting by one..it is a throwback to the older 8088/8086
> processors), the other is a three byte, two clock opcode.  Most compilers
> will use the slower opcode when shifting by one.  (This is just the
> default it will use for shifting by one).  Any multi count shifts are
> handled by the three byte opcode.  Internally the processor will do this:
> sh[l/r] reg,count mod 32.  The mod 32 is there because the largest
> register size is 32bits, and if you were able to do something like:
> shl eax,216 it would cause massive system stalls.  By doing sh[l/r]
> reg,33 .. it will automatically use the three byte opcode (since the
> count is greater than 1) and internally only shift by one, thus saving
> you a clock.  I told you people would have a hard time understanding
> my code. :)

Wow, and now I see why... :) but what about thhe 64-bit regs on the
Pentium? (not that I've seen any Pentium compilers yet :)

Quote
> We now resume you to your normally scheduled pascal coding questions.

Hmmm... what's this, like the 4th time you've said this? :)

Quote
> --TCA of NewOrder
> newor...@carina.unm.edu
> http://www.nmt.edu/~surface

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Re:32bit Move() and Fillchar() replacement


Quote
In article <42pr9n$...@news.scruz.net>, Jon Shemitz <j...@armory.com> wrote:
>In article <42l60a$...@carina.unm.edu>,
>NewOrder Demo Group <newor...@unm.edu> wrote:

>>This is a 32bit move() replacement.  Just throw the procedure in at the
>>top of your program, and all your move() calls will work a little faster
>>(about 4 times faster for counts above 128 bytes).

>Um, no.  All your Move() calls where both source and destination are DWord
>aligned will be four times faster.  If either is merely word aligned, you
>will get much less than four times speedup, and if either is not even word
>aligned, you will get only a very tiny speedup.

Yes, forgive me for omitting that.  However, the "wider" moves are still
faster than the "narrower" ones.  I mentioned that fillchar would fill to
pointers faster than arrays, but I forgot to mention the same would happen
with move().

--TCA of NewOrder
newor...@carina.unm.edu
http://www.nmt.edu/~surface

Re:32bit Move() and Fillchar() replacement


Quote
On Fri, 8 Sep 1995, NewOrder Demo Group wrote:
> In article <Pine.SUN.3.91.950908155439.11647A-100000@verdi> you write:
> >On 8 Sep 1995, NewOrder Demo Group wrote:

> >> In article <Pine.SUN.3.91.950908011838.21088C-100000@verdi>,
> >> Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:
> >> >Wow, and now I see why... :) but what about thhe 64-bit regs on the
> >> >Pentium? (not that I've seen any Pentium compilers yet :)

> >> There are no real 64-bit accessable registers on the pentium.  There
> >> are a few internal 64-bit regs, but they get dumped to edx:eax.

> >That's pretty lame... yeah, now I remember once reading the Pentium
> >instruction set which came with MASM6.11 (complete piece of {*word*99}, I was
> >trying it out at work) and it was funny that I don't remember any
> >extended registers, as all the 64-bit ops get put in either edx:eax or
> >into some location in memory (i.e. CMPXCHG8, or the 64-bit CMP with
> >Exchange)... what's the point of a new processor when there's no new
> >registers?!?! :)

> Backward compatibilty.  If/when you learn more about the intel processor
> you will understand why there are only 8 registers (ax,bx,cx,dx,bp,sp,si,di).
> The encoding of each instruction [that modifies a register] has a 3 bit
> field which tells the CPU which register to modify.  3 bits is only enough
> to hold 8 values (0 through 7).  To add more registers would require
> changing the lengths of the field, thus making it not backward compatable.
> The 66h prefix specifies size (doubleword in RM, word in PM).  You should
> start reading comp.lang.asm.x86.  (Note: That says READING, not posting).

Ah hah... I was wondering how the processor was supposed to tell the
difference between the same-numbered opcodes... and you've pretty much
told me everything I needed to know, so I won't go to .x86, at least not yet.

Maybe Intel should have put on another prefix for 64-bit in any mode
though... (I doubt there's a 64-bit pmode since 4gig should be enough for
everyone... of course last time someone decided what should be enough for
everyone we got stuck with 640k! :)

Quote
> --TCA of NewOrder
> newor...@carina.unm.edu
> http://www.nmt.edu/~surface

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Re:32bit Move() and Fillchar() replacement


On 8 Sep 1995, NewOrder Demo Group wrote:

Quote
> In article <Pine.SUN.3.91.950908011838.21088C-100000@verdi>,
> Kvantti Piikkisika  <jsha...@nmsu.edu> wrote:
> >Wow, and now I see why... :) but what about thhe 64-bit regs on the
> >Pentium? (not that I've seen any Pentium compilers yet :)

> There are no real 64-bit accessable registers on the pentium.  There
> are a few internal 64-bit regs, but they get dumped to edx:eax.

That's pretty lame... yeah, now I remember once reading the Pentium
instruction set which came with MASM6.11 (complete piece of {*word*99}, I was
trying it out at work) and it was funny that I don't remember any
extended registers, as all the 64-bit ops get put in either edx:eax or
into some location in memory (i.e. CMPXCHG8, or the 64-bit CMP with
Exchange)... what's the point of a new processor when there's no new
registers?!?! :)

i.e. back in the PDP days, people would drool on the PDP-6 because it had
12 whole registers instead of however many the PDP-1 had (I think 4 or 6
or something... this is just from reading the book "Hackers" by Steven Levy)

Quote
> --TCA of NewOrder
> newor...@carina.unm.edu
> http://www.nmt.edu/~surface

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Re:32bit Move() and Fillchar() replacement


On 8 Sep 1995, Jon Shemitz wrote:

Quote
> In article <42l60a$...@carina.unm.edu>,
> NewOrder Demo Group <newor...@unm.edu> wrote:

> >This is a 32bit move() replacement.  Just throw the procedure in at the
> >top of your program, and all your move() calls will work a little faster
> >(about 4 times faster for counts above 128 bytes).

> Um, no.  All your Move() calls where both source and destination are DWord
> aligned will be four times faster.  If either is merely word aligned, you
> will get much less than four times speedup, and if either is not even word
> aligned, you will get only a very tiny speedup.

Hey, it's still better than using System.TPU's 8-bit implementation! :)

-----------------\------------------------------------------------------------
Quantum Porcupine \-\  People kept on telling me to get on with it and write
a.k.a. Joshua Shagam > my damn .sig file.  Well I finally got around to it
jsha...@nmsu.edu /--/  since I finally got SIG of it all.
----------------/-------------------------------------------------------------
  Missing Link / Puuttuva Lenkki

Other Threads