Board index » delphi » Alternative SZ IntToStrB&V v0.02

Alternative SZ IntToStrB&V v0.02


2005-09-10 06:35:37 AM
delphi21
Attached new version.
Version v0.02 features:
- Optional benchmark methods:
1. RDTSC
2. Query Performance Counter
- Optional repeating measurement and taking minimum
- Optional data preparations:
1. 20 digit groups (positive and negative values)
2. 6 groups numbers (according to current Denni's and Avatar's benchmarks)
3. Custom range
- Validation
- Optional keeping previous results for comparison
- Possibility to contiously executing test desired number of times
- I have created and added my own function IntToStr_SZ_PAS_1, which is
slow 'common sence' solution, primarily created for comparison with RTL.
To do:
- Int64 benchmark support
- Selecting only desired functions for test
- Accuracy report
- Better UI
Last minute note:
- Bug in Query Performance report. It is calculated by only one loop since
there is no sence otherwise - ignore "The fastest loop of 50 attempts" line.
RDTSC report:
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
Test is provided on 2500 random numbers for any digit - 20 groups
--------------------------------------------------------------------------------
Function name D1+ D2+ D3+ D4+ D5+ D6+ D7+ D8+ D9+ D10+ D1- D2- D3- D4- D5- D6- D7- D8- D9- D10- Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 45 45 56 57 63 64 73 74 74 74 45 45 56 57 63 64 73 74 75 74 1251 9.9257 1.0000
IntToStr_AI_PAS_12 69 78 77 79 82 88 115 126 166 170 70 79 81 84 86 89 120 149 135 171 2114 5.8737 0.5918
IntToStr_JOH_PAS_4 62 63 103 102 148 149 191 192 233 233 62 63 103 102 148 149 191 192 234 233 2953 4.2049 0.4236
IntToStrOuc_IA32_2 172 219 224 221 226 224 229 227 225 227 215 218 221 221 222 225 225 227 226 227 4421 2.8086 0.2830
IntToStr_AZ_Pas_1 176 189 200 207 218 228 244 249 258 267 196 202 215 219 235 237 253 264 264 273 4594 2.7029 0.2723
IntToStr_LBG_PAS_1 226 265 320 364 411 473 519 558 613 652 228 270 319 365 402 472 517 561 609 660 8804 1.4104 0.1421
IntToStr_SZ_PAS_1 256 303 345 385 443 478 523 562 615 652 254 300 341 380 429 481 521 570 616 651 9105 1.3638 0.1374
IntToStr_RTL 442 495 535 503 560 601 643 750 797 835 460 498 466 508 567 609 723 768 810 847 12417 1.0000 0.1007
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
--------------------------------------------------------------------------------
Function name B1+ B2+ B3+ B4+ B5+ B6+ Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 45 56 64 45 56 73 339 9.4277 1.0000
IntToStr_AI_PAS_12 69 80 148 70 86 174 627 5.0973 0.5407
IntToStr_JOH_PAS_4 62 102 152 62 102 191 671 4.7630 0.5052
IntToStrOuc_IA32_2 178 225 225 219 222 226 1295 2.4680 0.2618
IntToStr_AZ_Pas_1 186 212 239 206 218 258 1319 2.4230 0.2570
IntToStr_LBG_PAS_1 230 337 443 243 323 528 2104 1.5190 0.1611
IntToStr_SZ_PAS_1 256 359 467 265 344 532 2223 1.4377 0.1525
IntToStr_RTL 453 503 580 451 475 734 3196 1.0000 0.1061
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
Test is provided on 250000 random numbers from -2147483648 to 2147483647
--------------------------------------------------------------------------------
Function name Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 58 11.6897 1.0000
IntToStr_JOH_PAS_4 115 5.8957 0.5043
IntToStr_AI_PAS_12 120 5.6500 0.4833
IntToStrOuc_IA32_2 225 3.0133 0.2578
IntToStr_AZ_Pas_1 238 2.8487 0.2437
IntToStr_LBG_PAS_1 392 1.7296 0.1480
IntToStr_SZ_PAS_1 411 1.6496 0.1411
IntToStr_RTL 678 1.0000 0.0855
--------------------------------------------------------------------------------
QUERY PEROFROMANCE REPORTS:
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
Test is provided on 2500 random numbers for any digit - 20 groups
--------------------------------------------------------------------------------
Function name D1+ D2+ D3+ D4+ D5+ D6+ D7+ D8+ D9+ D10+ D1- D2- D3- D4- D5- D6- D7- D8- D9- D10- Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 304 303 391 376 423 422 491 499 468 470 294 300 370 406 424 427 493 500 504 470 8335 10.6049 1.0000
IntToStr_JOH_PAS_4 447 433 703 683 1016 1001 1310 1288 1591 1581 436 430 726 683 1018 1026 1292 1313 1571 1693 20241 4.3670 0.4118
IntToStr_AI_PAS_12 478 749 872 1047 1249 1411 1708 1863 2087 2247 453 707 936 1104 1260 1497 1702 1921 2098 2222 27611 3.2013 0.3019
IntToStrOuc_IA32_2 1421 1531 1489 1493 1510 1600 1530 1568 1606 1636 1449 1456 1525 1573 1510 1567 1575 1569 1679 1558 30845 2.8657 0.2702
IntToStr_AZ_Pas_1 1280 1467 1645 1780 1988 2101 2364 2508 2674 2768 1381 1502 1735 1872 2061 2275 2431 2598 2719 2833 41982 2.1055 0.1985
IntToStr_LBG_PAS_1 1616 1905 2186 2508 2827 3346 3606 3880 4242 4553 1558 1925 2165 2479 2901 3320 3614 3869 4255 4550 61305 1.4418 0.1360
IntToStr_SZ_PAS_1 1940 2110 2357 2658 3051 3361 3647 3955 4244 4551 1735 2097 2378 2637 2937 3422 3640 3912 4224 4563 63419 1.3938 0.1314
IntToStr_RTL 3137 3410 3716 3729 4133 4414 4694 5261 5519 5818 3188 3471 3502 3795 4198 4523 5035 5341 5605 5903 88392 1.0000 0.0943
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
Test is provided on 2500 random numbers - 6 groups
--------------------------------------------------------------------------------
Function name B1+ B2+ B3+ B4+ B5+ B6+ Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 1016 749 1153 1007 779 1163 5867 4.6199 1.0000
IntToStr_AI_PAS_12 830 1224 2172 1128 1232 2217 8803 3.0791 0.6665
IntToStr_JOH_PAS_4 1245 1274 2248 1272 1299 2254 9592 2.8258 0.6117
IntToStrOuc_IA32_2 2184 1842 2347 2177 1870 2345 12765 2.1234 0.4596
IntToStr_AZ_Pas_1 1591 1942 2765 1723 2011 2809 12841 2.1108 0.4569
IntToStr_LBG_PAS_1 2174 2801 4464 2202 2819 4447 18907 1.4336 0.3103
IntToStr_SZ_PAS_1 2372 3026 4417 2423 2948 4415 19601 1.3828 0.2993
IntToStr_RTL 3658 4156 5658 3723 4171 5739 27105 1.0000 0.2165
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
SZIntStrB&V v0.02, author Sasa Zeman
Alternative IntToStr 32-bits FastCode functions speed test.
The fastest loop of 50 attempts
Test is provided on 250000 random numbers from -2147483648 to 2147483647
--------------------------------------------------------------------------------
Function name Sum Performance index
--------------------------------------------------------------------------------
IntToStr_JOH_IA32_4 142426 4.1546 1.0000
IntToStr_AI_PAS_12 245550 2.4098 0.5800
IntToStr_JOH_PAS_4 250209 2.3649 0.5692
IntToStrOuc_IA32_2 265401 2.2295 0.5366
IntToStr_AZ_Pas_1 300471 1.9693 0.4740
IntToStr_LBG_PAS_1 462606 1.2791 0.3079
IntToStr_SZ_PAS_1 468935 1.2618 0.3037
IntToStr_RTL 591721 1.0000 0.2407
--------------------------------------------------------------------------------
Sasa
--
www.szutils.net
 
 

Re:Alternative SZ IntToStrB&V v0.02

Quote
Attached new version.

Version v0.02 features:
I fail to see what the point is in making an entire benchmark program
if all you care about is benchmarking. You can easily just create a
benchmark that fits into my B&V and use that. That saves you from
having to implement loads of features.
Note also that a benchmark should not be adjustable - there should be
just one benchmark (possibly spit into subbenchmarks) which will test
the functions everytime. Note for example that for the benchmark result
to be meaningful (in the sense defined by Dennis) the "Loops" always
has to be set to 1.
 

Re:Alternative SZ IntToStrB&V v0.02

Avatar Zondertau writes:
Quote
I fail to see what the point is in making an entire benchmark program
if all you care about is benchmarking. You can easily just create a
benchmark that fits into my B&V and use that. That saves you from
having to implement loads of features.
You are welcome to made adjustments in existed B&V if you have time for that.
As I am mentioned, it is much faster for me to create benchmark from begining,
instead to examine existed.
However, for now only expanding groups from 6 to 20 is useful in you B&V
to show more realistic results. RDTSC method will never return correct
value in only one loop, what ever precaution measures are taken.
Quote
Note also that a benchmark should not be adjustable - there should be
just one benchmark (possibly spit into subbenchmarks) which will test
the functions everytime.
Customizable benchmark shows some interesting facts (mentioned below).
That is the main reason I create it on that way.
Quote
Note for example that for the benchmark result
to be meaningful (in the sense defined by Dennis) the "Loops" always
has to be set to 1.
Take for example IntToStr_JOH_PAS_4 and IntToStr_AI_PAS_12. Its performance
are quite similar, but with 20 groups RDTSC (fastest of 50) shows that
IntToStr_AI_PAS_12 is faster than IntToStr_JOH_PAS_4, but Query performance shows
differently. What is actually true?
To examine that question further, take another approach and test these
two function with the same conditions:
With 2500 numbers per group:
-----------------------------------------
Query performance on:
Groups20 winner is IntToStr_JOH_PAS_4
Groups6 winner is IntToStr_AI_PAS_12
RDTSC (fastest of 50):
Groups20 winner is IntToStr_AI_PAS_12
Groups6 winner is IntToStr_AI_PAS_12
RDTSC (fastest approach disabled):
Groups20 winner is IntToStr_JOH_PAS_4
Groups6 winner is IntToStr_AI_PAS_12
-----------------------------------------
With 25000 numbers per group:
-----------------------------------------
Query performance on:
Groups20 winner is IntToStr_JOH_PAS_4
Groups6 winner is IntToStr_AI_PAS_12
RDTSC (fastest of 50):
Groups20 winner is IntToStr_AI_PAS_12
Groups6 winner is IntToStr_AI_PAS_12
RDTSC (fastest approach disabled):
Groups20 winner is IntToStr_JOH_PAS_4
Groups6 winner is IntToStr_AI_PAS_12
----------------------------------
Only logical answer is that "fastest on N" is not quite realistic.
However, that cant prove whis functioin is actually fastest because
instability in both cases - with large number of tested values, as
well as with few. Of course, I can not guarainty that I made all
precausion measures correctly before benchmark is starte.
With my benchmark it is also proven that 6 groups are not enough
for fair/correct benchmark. Reason is simply differen algorithms
used in these functions.
Sasa
--
www.szutils.net
 

Re:Alternative SZ IntToStrB&V v0.02

'>>Note for example that for the benchmark result
Quote
>to be meaningful (in the sense defined by Dennis) the "Loops" always
>has to be set to 1.

Take for example IntToStr_JOH_PAS_4 and IntToStr_AI_PAS_12. Its
performance are quite similar, but with 20 groups RDTSC (fastest of
50) shows that IntToStr_AI_PAS_12 is faster than IntToStr_JOH_PAS_4,
but Query performance shows differently. What is actually true?
The case where a function repeatedly invoked with the same argument is
irrelevant, so the Loops=1 case determines which is faster. One can
probably pick any non-RTL function and find a benchmark that makes this
one looks fastest. This doesn't make it true - only realistic
benchmarks give useful results.
In this case IntToStr_AI_PAS_12 apperently has more branching and/or
more benefit of caching than the other.
Choosing between QueryPermanceCounter and RdTsc should be irrelevant if
the measurement time is enough. I currently use QueryPerformanceCounter
because other benchmarks (made by Dennis) do so.
Quote
To examine that question further, take another approach and test these
two function with the same conditions:
Both "groups" benchmarks have the serious disadvantage that the
size-selection branch prediction if perfect (Groups20) or better than
in normal situations (Groups6). This means that part of the code is not
tested correctly. This is the disadvantage of using ranges.
Quote
Only logical answer is that "fastest on N" is not quite realistic.
However, that cant prove whis functioin is actually fastest because
instability in both cases - with large number of tested values, as
well as with few. Of course, I can not guarainty that I made all
precausion measures correctly before benchmark is starte.

With my benchmark it is also proven that 6 groups are not enough
for fair/correct benchmark. Reason is simply differen algorithms
used in these functions.
There is indeed no universal fastest function and neither does a
perfect benchmark exist. The only thing we can do is try and construct
a benchmark in such a way that Dennis' criteria are satisfied, so there
is at least some realism in the benchmarking situation.
IMHO this specific benchmark should:
(1) Test random numbers from the entire range to make result-size
branch
prediction hard; this means:
(1a) No testing on a per-subrange basis
(1b) No successive tests with the same argument
(2) Have a realistic distribution, which would IMHO be proportional to
the inverse of the logarithm
(3) Test a large number of different values
(4) Try and make sure the recorded time includes (nearly) only calls to
IntToStr
(5) Be stable
Currently these are satisfied:
Current 1b 3 4
My alternative 1a 1b 2 3 4
Groups20 2 4 5 (1b and 3 depending on settings)
Do you agree to this analysis? If not, which point do you disagree on?
Anyways i will try and eliminate the strange spread results I am getting
ASAP.
 

Re:Alternative SZ IntToStrB&V v0.02

Hi Avatar
I think that you are on the rigth track, but I am somewhat biased.
I would like someone to have a look at the benchmark (and validation) too,
and state his opinion.
Best regards
Dennis
 

Re:Alternative SZ IntToStrB&V v0.02

Avatar Zondertau writes:
Quote

There is indeed no universal fastest function and neither does a
perfect benchmark exist. The only thing we can do is try and construct
a benchmark in such a way that Dennis' criteria are satisfied, so
there is at least some realism in the benchmarking situation.
Is it possible to hook the RTL IntToStr and record the values when running
some 'typical' real programs? Then we could have a playback benchmark, just
as for the MM.
--
Anders Isaksson, Sweden
BlockCAD: web.telia.com/~u16122508/proglego.htm
Gallery: web.telia.com/~u16122508/gallery/index.htm
 

Re:Alternative SZ IntToStrB&V v0.02

Avatar Zondertau writes:
I have posted new version which should comply almost all requirments you count.
I will comment below.
Quote
IMHO this specific benchmark should:
(1) Test random numbers from the entire range to make result-size
branch
prediction hard; this means:

(1a) No testing on a per-subrange basis
Fully comply with random range.
Quote
(1b) No successive tests with the same argument
Fully comply by benchmark settings.
Quote
(2) Have a realistic distribution, which would IMHO be proportional to
the inverse of the logarithm
Currently, RandomRange function is used and distribution is good as that function
calculate it. This is the easiest to change. Instead of that, checking for example
2.5000.000 of random numbers should gives quite well distribution.
Quote
(3) Test a large number of different values
Fully comply.
Quote
(4) Try and make sure the recorded time includes (nearly) only calls to
IntToStr
Fully comply with new version. However, that requirements is bad with
QueryPerformanceCounter since it is quite slow. However, if we need to preserve
speed executing, we need to sacrify 5) and again return to making array with data.
Quote
(5) Be stable
Fully comply.
Quote
Currently these are satisfied:

Current 1b 3 4
My alternative 1a 1b 2 3 4
Groups20 2 4 5 (1b and 3 depending on settings)

Do you agree to this analysis? If not, which point do you disagree on?
Basically I do agree with all. (1a) is just a bit problematic.
With testing subranges it will be clearly shown what is the function
performance for specific group of parameters. That will also show on which
arguments (parameters) function givet best results.
In practice, middle ranged function on test can be the best choice if
handle the best range used mostly in application. We here actally try to find
the best universal function which handle all situation equaly and goves
the best performance by total benchmark result (fully comply (1a) ).
However, other function can handle specific frequently used range better.
And that is unfortunate. However, it is very hard to determinate what is
that range which is preferable for general use, since using it in specific
application can be very bad choice.
That upper wroten comments are partially the same for (2).
All that imply that benchmark need to be a bit flexible (customizable)
to find the best solution after some additional analyze.
Quote
Anyways i will try and eliminate the strange spread results I am getting
ASAP.
I have also fixed some small incorrectness in my new version:
1. Validation wasn't check low(integer)
2. Report is a bit changed and report now coorectly show info data.
Major changes are created to fully comply (4) and eliminate creating arrays.
That also can negative infuence on QueryPerformance. Currently preforming testing.
Sasa
--
www.szutils.net