Board index » cppbuilder » Re: Anything faster then memcpy ?
Asger Joergensen
CBuilder Developer |
Asger Joergensen
CBuilder Developer |
Re: Anything faster then memcpy ?2007-12-19 10:38:54 PM cppbuilder55 Hi Bob Bob Gonder says: Quote
QuoteThe folks who know these things hang out in Kind regards Asger |
Bob Gonder
CBuilder Developer |
2007-12-19 11:16:23 PM
Re:Re: Anything faster then memcpy ?
Asger Joergensen wrote:
Quote>/* duplicate line 0 to rest of bmp (vertical stripes) */ Note that 1 looks like 0 by the time it is copied. So the effect is to copy 0 to 2, and 3, and..... QuoteIf src and dest overlap, the behavior of memcpy is undefined. There are corner cases where memcpy might not do what the code expects, such as when length of line 0 is less than the move-unit size of the function (If it moves 128 bits at a time, and the line is 64 bits, then it might copy line 1 into the last line.) Trying to visualize this: memcpy( 2, 1, 8 ) via 4 digit copy 1234 5678 xxxx memcpy4digit( 2, 1 ) copy 1234 to location 2 1123 4678 xxxx memcpy4digit( 2+4, 1+4 ) copy 4678 to location 6 1123 4467 8xxx via 2 digit copy 1234 5678 xxxx memcpy2digit( 2, 1 ) 1124 5678 xxxx memcpy2digit( 2+2, 1+2 ) 1122 4678 xxxx memcpy2digit( 2+4, 1+4 ) 1122 4468 xxxx memcpy2digit( 2+6, 1+6 ) 1122 4466 8xxx Substitute "scanline" for "digit". Point is if the copy reads more than one unit (scanline) per cycle, it can have unintended consequences. Since memcpy uses 32 bit copies, and the faster SSE versions use 64, 128 (maybe 256 someday) bit copies, it will do what we expect down to a scanline length of 11 24bit pixels (at 256 bit copies). |
Alan Bellingham
CBuilder Developer |
2007-12-20 12:11:08 AM
Re:Re: Anything faster then memcpy ?
Bob Gonder < XXXX@XXXXX.COM >wrote:
QuoteAsger Joergensen wrote: also perfectly possible that the underlying code does exactly the opposite. It's quite possible that on some architectures code that decrements an index, and stops when the index passes zero, is more efficient. The specification of memcpy() is such that copying the elements in random order would also be acceptable, though such an implementation would be at least mildly perverse. Quote>If src and dest overlap, the behavior of memcpy is undefined. wasn't. That means that your assumption is only that - an assumption. So long as you stay on the same library, you'll *probably* be OK. QuoteThere are slower versions that allow for overlapped true copies. work out scenarios: a) No overlap - copy any way desired b) Total overlap - don't copy at all (it's odd how self-assignment pops up) c) src>dst ... use incrementing index d) src < dst ... use decrementing index Again, if there are strong processor reasons to go for something more complicated, then the library writers can, so long as the result matches what the standard prescribes. Alan Bellingham -- Team Browns ACCU Conference 2008: 2-5 April 2008 - Oxford, UK {smallsort} |
Alex Bakaev [TeamB]
CBuilder Developer |
2007-12-20 12:51:16 AM
Re:Re: Anything faster then memcpy ?
Remy Lebeau (TeamB) wrote:
QuoteNo, they are not. The size of each element of a scanline depends on the |
Bob Gonder
CBuilder Developer |
2007-12-20 01:12:46 AM
Re:Re: Anything faster then memcpy ?
Alan Bellingham wrote:
QuoteBob Gonder wrote: if not sure, and probably comment the UB for future library changes). QuoteThe specification of memcpy() is such that copying the elements in Fire up your time dialator. Watch the pretty patterns. (Don't you miss the old mainframes with their bits dancing in their bulbs?) Quote>>If src and dest overlap, the behavior of memcpy is undefined. his library works when he goes about using UB tricks. And memcpy is a UB waiting to happen as it has no checks. (I think if it had checks, it would no longer be UB?) QuoteIf the writers of the C standard had thought My descriptions are from working knowlege, and often are not StandSpec. If some idiot provider breaks my UB, I'll just have to write my own version. QuoteThat means that your assumption is only that - an assumption. So |
Remy Lebeau (TeamB)
CBuilder Developer |
2007-12-20 01:50:34 AM
Re:Re: Anything faster then memcpy ?
"Asger Joergensen" < XXXX@XXXXX.COM >wrote in message
QuoteIf I have a bottleneck it is memcpy else that profiling shows them when they finally try it. The TBitmap::Scanline property is more likely to be the real bottleneck. It is not a simple operation internally. Every time you access the property, the underlying image data is freed and regenerated, the Row parameter validated, and the number of bytes per scanline recalculated. Multiply all of that by the height of the bitmap, and that is a lot of work being done. Gambit |
Asger Joergensen
CBuilder Developer |
2007-12-20 05:03:54 AM
Re:Re: Anything faster then memcpy ?
Hi Remy Lebeau (TeamB)
Remy Lebeau (TeamB) says: Quote
QuoteThe TBitmap::Scanline property is more likely to be the real bottleneck. It But checking the source, showed me that even if the image data isn't free'd there sure is done a lot of calculation pre line. So I came up with the code below, which gave me 10% extra speed on long line (1000pix.) and 50% on short lines (100pix.). So once again You were right. ~ Dam.&%ยค# I am a little woried though, that I had to do [-(y*LineW)] I gues the default TBitmap is Bottom-Up. How do I check which direction the bitmap have, TBitmapImage in the TBitmap is private. I found the WinAPI BITMAPINFO in the help, but I cant find the function to get it. Thanks for Your help Kind regards Asger static void __fastcall GradientFillRectH(TBmp *Bmp, TRect &Rct, TAjColor C1, TAjColor C2) { int Bottom = Rct.Bottom; int W = Rct.Width(); int BW = W*3; int LineW = (BW+3)&~3; int BlitStart = Rct.Top + MEMCOPY_H; TScanColor *ScanLineStart = static_cast<TScanColor*>( Bmp->ScanLine[Rct.Top]); TScanColor *ScanLine = &ScanLineStart[Rct.Left]; GradientArray(ScanLine, W, C1, C2);//Calculating colors in the first line LPBYTE ByteLine = (LPBYTE)ScanLine; for(int y = Rct.Top+1; y < Bottom && y < BlitStart; ++y) { LPBYTE NewByteLine = &ByteLine[-(y*LineW)]; memcpy(NewByteLine, ByteLine, BW); } if(BlitStart < Bottom) FillHorizontal(Bmp->Canvas->Handle, Rct.Left, Rct.Top, W, Rct.Bottom); //using BitBlt if more then 32 lines } //------------------------------------------------------------------------- |
Remy Lebeau (TeamB)
CBuilder Developer |
2007-12-20 06:30:50 AM
Re:Re: Anything faster then memcpy ?
"Asger Joergensen" < XXXX@XXXXX.COM >wrote in message
QuoteAnd how do I do profiling ? runtime. A profiler can analyze and calculate other metrics of your code as well, such as the number of times a function is called and by whom, etc. QuoteI am a little woried though, that I had to do [-(y*LineW)] QuoteHow do I check which direction the bitmap have |
Asger Joergensen
CBuilder Developer |
2007-12-20 08:32:10 PM
Re:Re: Anything faster then memcpy ?
Hi Remy
Remy Lebeau (TeamB) says: Quote
Thanks for Your help. Kind regards Asger |
Dennis Jones
CBuilder Developer |
2007-12-21 08:26:24 AM
Re:Re: Anything faster then memcpy ?
"Asger Joergensen" < XXXX@XXXXX.COM >wrote in message
Quote>>And how do I do profiling ? many years ago. If you want a decent profiling tool, you are going to have to pay for it. There may be free or shareware tools out there, but I would be doubtful of their capability. AQTime is a very good profiling tool. A little spendy, yes; but worth the cost if you need it. - Dennis |
Jonathan Benedicto
CBuilder Developer |
2007-12-25 01:16:19 AM
Re:Re: Anything faster then memcpy ?
Asger Joergensen wrote:
QuoteIs there somthing faster then memcpy ? Jon |
Asger Joergensen
CBuilder Developer |
2007-12-27 12:27:47 AM
Re:Re: Anything faster then memcpy ?
Hi Jonathan
Jonathan Benedicto says: QuoteAsger Joergensen wrote: So I think IPP is a little to pricy for my project. Kind regards Asger |