Board index » off-topic » Re: Found something I could not explain while assembling a test-file ....

Re: Found something I could not explain while assembling a test-file ....


2004-01-21 02:51:26 AM
off-topic13
Hi Rudy,
[...]
Quote
I have no idea how to respond to you now. :-\ I tell you that I've got
documentation (of a, at that time, respectable source) that has got those
"symbols" tagged to those numbers, and your response is I don't ? I'm
allmost sure you did not mean that, but it sure sounds like it ...?
I mean you have got *wrong* documentation. And this fact doesn't
depend on the respectability of the source ;-)
[...]
Quote
The problem is that it looks to me I have to do this check for *every*
command, so I can be sure that that command actually utilizes the prefix,
and what the changes are (if not cast upon a mod-r/m or reg byte).
[...]
Just remember the simple rule: it is important for your disassembler
to understand what the code "wants" to do. It doesn't matter at all
whether the code can work or not! <--- btw, this is *not* a joke ;-)
IMHO the best way for you is setting some internal "flags" on receiving
any prefixes, using the flags in decoding the current instruction (only if
they are necessary) and resetting them before next instruction.
 
 

Re:Re: Found something I could not explain while assembling a test-file ....

<Stepan Polovnikov>schreef in berichtnieuws
XXXX@XXXXX.COM ...
Hello Stepan,
Quote
>Which brings-up another point : What does a processor do if
>it receives a 66h/67h prefix to a command that does not utilize
>it (like, for example, a CLC with a 66h prefix. Or a "push
>data32" with a 67h prefix) ? Does it generate a "illegal
>command" exeption, or does it silently ignore it ?

Nothing. CPU is ignore it.
What is what I thought. I do not *like* it, but that is a whole other
matter :-) Hmm ... If you ever want to confuse people when you need a few
NOP's, alternate them with prefix-codes. Just make sure the last one is
really a NOP :-)
Quote
>And would that be true for the whole family (if applicable),
>or does it change from processor-to-processor ?
0F (POP CS) was used for extension of intruction set from 186.
F30F,F20F (REP,REPNE) prefix was used for extension of intruction set
(SSE).
660F (OS:) prefix was used for extension of intruction set (SSE2).
40..4F (INC/DEC reg) was used as prefix for extension of intruction set
(Athlon 64) (long mode).

Prefix is only set propery of command.
66 - is set ChangeOpSize
67 - is set ChangeAdrSize
26,2E,36,3E,64,65 - is set SegOverride

The prefixes can be mixed.
66 26 64 67 66 B8 90 90 mov ax,9090h - is a normal opcode for execution
(32bit mode)
That's exactly the sort of nonsense I hoped the processor would not accept
...
Thanks for your response.
Regards,
Rudy Wieser
 

Re:Re: Found something I could not explain while assembling a test-file ....

Vasya Pupkin < XXXX@XXXXX.COM >schreef in berichtnieuws
XXXX@XXXXX.COM ...
Quote
Hi Rudy,
Hello Vasya,
Quote
>I have no idea how to respond to you now. :-\ I tell you that I've
got
>documentation (of a, at that time, respectable source) that has got
those
>"symbols" tagged to those numbers, and your response is I don't ? I'm
>allmost sure you did not mean that, but it sure sounds like it ...?

I mean you have got *wrong* documentation. And this fact doesn't
depend on the respectability of the source ;-)
I realized after I did send my message that you must have ment that. Sorry,
I appear to be a bit touchy in this regard.
Quote
[...]
>The problem is that it looks to me I have to do this check for *every*
>command, so I can be sure that that command actually utilizes the
prefix,
>and what the changes are (if not cast upon a mod-r/m or reg byte).
[...]

Just remember the simple rule: it is important for your disassembler
to understand what the code "wants" to do. It doesn't matter at all
whether the code can work or not! <--- btw, this is *not* a joke ;-)
I think that on this point your and my points-of-vision differ : If the
processor can't execute it, I want to know !
Simply, because if the processor can't execute it, the programmer can't have
written it (as part of an actually running program), which would mean that
seeing such an instruction would mean I'm out-of-sync.
Quote
IMHO the best way for you is setting some internal "flags" on receiving
any prefixes, using the flags in decoding the current instruction (only if
they are necessary) and resetting them before next instruction.
Which appears to be allmost exactly what I'm doing now :-)
Regards,
Rudy Wieser
 

{smallsort}

Re:Re: Found something I could not explain while assembling a test-file ....

Hi Rudy,
[...]
Quote
I think that on this point your and my points-of-vision differ : If the
processor can't execute it, I want to know !
If only you are sure the processor is going to execute it.
The instruction (or its "part") may be inserted on purpose
to cheat you ;-)
Quote
Simply, because if the processor can't execute it, the programmer can't
have written it (as part of an actually running program),
Some programmers can ;-) Moreover, they do it specially.
Quote
which would mean that seeing such an instruction would mean I'm
out-of-sync.
No, this wouldn't mean anything. Sometimes you cannot know whether
partucular code will be executed or not. And that is often impossible to
find out without step-by-step tracing the whole program.
However, if you are not going to disassemble any protected programs,
just forget all I wrote above ;-)
 

Re:Re: Found something I could not explain while assembling a test-file ....

Vasya Pupkin wrote in message < XXXX@XXXXX.COM >...
Quote
Hi Rudy,

[...]
>I think that on this point your and my points-of-vision differ : If the
>processor can't execute it, I want to know !

If only you are sure the processor is going to execute it.
The instruction (or its "part") may be inserted on purpose
to cheat you ;-)

>Simply, because if the processor can't execute it, the programmer can't
>have written it (as part of an actually running program),

Some programmers can ;-) Moreover, they do it specially.
You're very right. Use an invalid opcode, follow it with whatever
you can make up and retrieve those bytes in an invalid opcode
handler to do some special processing. It's quite a well known
approach to extending the instruction set.
Robert
--
Robert AH Prins
XXXX@XXXXX.COM
 

Re:Re: Found something I could not explain while assembling a test-file ....

Vasya Pupkin < XXXX@XXXXX.COM >schreef in berichtnieuws
XXXX@XXXXX.COM ...
Quote
Hi Rudy,
Hello Vasya,
Quote
>I think that on this point your and my points-of-vision differ : If the
>processor can't execute it, I want to know !

If only you are sure the processor is going to execute it.
The instruction (or its "part") may be inserted on purpose
to cheat you ;-)
Thanks for making my point :-) If/when I encounter such an instruction, I
can be sure that I've been conned (or have started at a wrong point).
Quote
>Simply, because if the processor can't execute it, the programmer can't
>have written it (as part of an actually running program),

Some programmers can ;-) Moreover, they do it specially.
Not as part of a *running* program they can't ! Or hey must have catched
the exeption, and use that to continue somewhere else.
Quote
>which would mean that seeing such an instruction would mean I'm
>out-of-sync.

No, this wouldn't mean anything. Sometimes you cannot know whether
partucular code will be executed or not. And that is often impossible to
find out without step-by-step tracing the whole program.

However, if you are not going to disassemble any protected programs,
just forget all I wrote above ;-)
Step-by-step is the only thing a dumb disassembler can do ... But there is
nothing stopping me of writing a wrapper that uses the disassembler to see
what the flow of a program is (by following jumps, calls and sorts).
Encountering a non-executable in this way is a sure sign of the need of
user-intervention :-)
Regards,
Rudy Wieser
 

Re:Re: Found something I could not explain while assembling a test-file ....

Hi Robert,
[...]
Quote
You're very right. Use an invalid opcode, follow it with whatever
you can make up and retrieve those bytes in an invalid opcode
handler to do some special processing. It's quite a well known
approach to extending the instruction set.
Tsss! Please, don't tell it to Rudy! ;-) Otherwise he will be able
to crack our programs ;-)))
 

Re:Re: Found something I could not explain while assembling a test-file ....

Hi Rudy,
[...]
Quote
>Some programmers can ;-) Moreover, they do it specially.

Not as part of a *running* program they can't ! Or hey must have catched
the exeption, and use that to continue somewhere else.
[...]
Your first statement says No, but the second one says Yes ;-)
 

Re:Re: Found something I could not explain while assembling a test-file ....

Vasya Pupkin < XXXX@XXXXX.COM >schreef in berichtnieuws
XXXX@XXXXX.COM ...
Quote
Hi Rudy,
Hello Vasya
Quote
>>Some programmers can ;-) Moreover, they do it specially.
>
>Not as part of a *running* program they can't ! Or hey must
>have catched the exeption, and use that to continue somewhere
>else.

Your first statement says No, but the second one says Yes ;-)
You're quite right :-)
I think that seeing an illegal opcode will be a sign that something is going
amiss, or that that the programmer has used an instruction-extender. Either
way, I would want to know about it.
Maybe I will add a switch to my disassembler that tell's it to run in "can
be executed", or in "grab as much as you can" -mode :-)
But my aim is to first create the "can be executed" -mode (as
instruction-extenders are not that wide spread).
Regards,
Rudy Wieser
 

Re:Re: Found something I could not explain while assembling a test-file ....

Quote
Most of the data I find or am offered is at best incomplete (even Intel's
own opcode-lists, as they do not bother to, for example, mention at which
commands OpSiz and AdrSiz have effect, nor what they exactly do to the
commands they do have an effect on). De{*word*81}s are unreliable, and
searches
on Web do not reveal anything worthwhile/complete/reliable either.

Simply said : I can't seem to find, generate or cajole a reliable list
against which I can check the output of my own program. And I'm not
prepared to, just like MS and Borland seem to have done, just fake it &
hope
I will get away with it.
if i were to just put a binary file together with all possible combinations
of VALID instructions (just the bytes, not the actual disassembly of them),
would that be something you can use?
(naturally, in the case of immediate values or displacements, i'll NOT list
all possible combinations... that would be too large... a 32-bit immediate
has 4Gb possible values... multiply by 4 bytes per value.. that's 16 Gb
right there, per instruction using a 32-bit immediate, plus 4Gb for the
opcode)
...anyway.. i was thinking of doing something like this,...
starting with all valid 16-bit mod-r/m decodings (minus obvious repeats,
plus necessary repeats)
db 00,00
db 00,01
db 00,02
db 00,03
db 00,04
db 00,05
db 00,06,01,02
db 00,07
db 00,40,01
db 00,46,01
db 00,80,01,02
db 00,86,01,02
db 00,c0
db 00,c1
db 00,c2
db 00,c3
db 00,c4
db 00,c5
db 00,c6
db 00,c7
then all possible 32-bit encodings (done much the same way.. using 67,00 as
the prefix and opcode bytes)... naturally this list will assume you're
decoding in a 16-bit code segment... hence the necessary 67 before the
00....
then i'd run through all the opcode bytes
would this be an acceptable "test file" for you? is this what you're looking
for?
 

Re:Re: Found something I could not explain while assembling a test-file ....

"Vasya Pupkin" < XXXX@XXXXX.COM >wrote in message
Quote
Hi Rudy,

[...]
>Well. That is probably the end of my disassembler program ....

Well. To take a correct decision, try to assemble & disassemble
the following simplest example (in 16-bit mode):

mov ax,03EBh
jmp $-2
db 0BAh
... ; insert a "secret" instruction here ;-)

I don't think you can find a *non-interactive* disassembler which is able
to restore the original "code" automatically. So... Do you still want to
create your disassembler? ;-)))

>Most of the data I find or am offered is at best incomplete (even
Intel's
>own opcode-lists, as they do not bother to, for example, mention at
which
>commands OpSiz and AdrSiz have effect, nor what they exactly do to the
>commands they do have an effect on).

IA-32 Intel Architecture Software Developer's Manual
Volume 2: Instruction Set Reference
CHAPTER 2. INSTRUCTION FORMAT
2.2. INSTRUCTION PREFIXES

[...]
>I surely would like to know how *anyone* was able to write a
disassembler.

Most of disassemblers expect some "correct" code. And that is their main
mistake. That is why there still are lots of methods to cheat them ;-)

--
Best regards,
Vasya Pupkin.


it took me about 2 minutes (and almost giving into the temptation to fire up
debug for the simple task) to see what that one was doing.... of course,
that one isn't too bad of an example... just create a disassembler that
looks at all branch addresses, and keeps a record of how many bytes to each
next instruction... then when you go back to check that the branch addresses
have already been decoded, you can see that you might have disassembled
through that area, but the code tricked you, and so you have to give the
"alternate disassembly" as well... (also, when you're doing this... you
stop your current disassembly "thread" at unconditional jumps, such as jmp
short, jmp near, jmp long, jmp far, ret, retf, iret, iretd... etc.)
ok.. hopefully i haven't lost anyone in that last paragraph.... if so, i'll
wait while you wander around and find me or another tourguide again...
anyways... what is the trickiest is conditional self-modifying code.... as
well as code or data relocations.... i'm still needing to analyze that sort
of stuff myself, to see what's going on.... if you have a program that's
smart enough to analyze that kind of code, then you basically have an
emulator, as it would have to keep track of all registers and flags and
memory changes.....
 

Re:Re: Found something I could not explain while assembling a test-file ....

"R.Wieser" < XXXX@XXXXX.COM >wrote in message
Quote
<Stepan Polovnikov>schreef in berichtnieuws
XXXX@XXXXX.COM ...

Hello Stepan,

>>Which brings-up another point : What does a processor do if
>>it receives a 66h/67h prefix to a command that does not utilize
>>it (like, for example, a CLC with a 66h prefix. Or a "push
>>data32" with a 67h prefix) ? Does it generate a "illegal
>>command" exeption, or does it silently ignore it ?
>
>Nothing. CPU is ignore it.

What is what I thought. I do not *like* it, but that is a whole other
matter :-) Hmm ... If you ever want to confuse people when you need a
few
NOP's, alternate them with prefix-codes. Just make sure the last one is
really a NOP :-)

of course, REP NOP, i've recently found out, is another command on P4s... =
PAUSE...
and AMD manuals says that you should use only one prefix from each of the 5
sets of prefixes (segment overrides, operand override, address override,
repeats, locks).... use of multiple prefixes from the same set is supposedly
undefined, but i know on the AMD K6-2-550, anything after the first prefix
in each set is just ignored... (with the exception of segment overrides...
the last is used) ....(thus i would decode like this, but would also flag
the instruction in the disassembly (multiple same-set prefix flag).... and
if i go over the 15 byte limit in disassembling, i would flag it with an
instruction-limit-overrun flag as well)..
Quote
>The prefixes can be mixed.
>66 26 64 67 66 B8 90 90 mov ax,9090h - is a normal opcode for execution
>(32bit mode)

That's exactly the sort of nonsense I hoped the processor would not accept
hehe... same here... but they do accept it... let's see what we have...
OSIZE, CS:, FS:, ASIZE, OSIZE, mov (e)ax,9090h
two OSIZE don't negate themselves... my experience is that two osize do the
same as one osize... since he said it's a 32-bit codesegment, we're decoding
this as a 16-bit instruction now.... once we finally get to the opcode
ASIZE is meaningless in this instruction, but that doesn't mean we can't put
it in there anyway.... prefixes are usually ignored if they don't have a
purpose on an instruction... basically look at it like this.. you're setting
a few flags on how the processor will decode the instruction... if we don't
use that flag when we're disassembling the opcode and operands, it doesn't
matter....
CS and FS overrides used... my decoder would handle multiple segment
overrides like this.... check the segment override present flag... if one is
present, then we set a multiple segment override flag (an error flag telling
us to append info at the end of the line for the user to know that
instruction is "awkward").... then, regardless of whether a segment override
has already been detected, record the name of the new segment that we'll be
using, in another variable.... that way the last segment override will be
used...
now... in this example instruction above... since the only thing being used
in decoding the opcode and operands is the OSIZE... our instruction is
interpreted (in a 32-bit segment) as:
OSIZE (we interpret the instruction as 16-bit).... mov ax,9090
thus all others are basically ignored....
and.... i've lost my train of thought and can't remember my purpose for
typing all this, so i'll just end right here and send this post as is...
 

Re:Re: Found something I could not explain while assembling a test-file ....

"R.Wieser" < XXXX@XXXXX.COM >wrote in message
Quote
Vasya Pupkin < XXXX@XXXXX.COM >schreef in berichtnieuws
XXXX@XXXXX.COM ...
>Hi Rudy,

Hello Vasya,

>>I have no idea how to respond to you now. :-\ I tell you that I've
got
>>documentation (of a, at that time, respectable source) that has got
those
>>"symbols" tagged to those numbers, and your response is I don't ?
I'm
>>allmost sure you did not mean that, but it sure sounds like it ...?
>
>I mean you have got *wrong* documentation. And this fact doesn't
>depend on the respectability of the source ;-)

I realized after I did send my message that you must have ment that.
Sorry,
I appear to be a bit touchy in this regard.

>[...]
>>The problem is that it looks to me I have to do this check for *every*
>>command, so I can be sure that that command actually utilizes the
prefix,
>>and what the changes are (if not cast upon a mod-r/m or reg byte).
>[...]
>
>Just remember the simple rule: it is important for your disassembler
>to understand what the code "wants" to do. It doesn't matter at all
>whether the code can work or not! <--- btw, this is *not* a joke ;-)

I think that on this point your and my points-of-vision differ : If the
processor can't execute it, I want to know !
so decode to the best of your ability and append a message stating what was
"not proper" in that instruction... like multiple segment overrides.... a
segment override without memory reference.... multiple osize... multiple
asize... multiple locks... multiple reps... asize without memory
reference....
of course, these are non-failing errors, normally..... what i mean is... the
processor will still do something... but these instructions are normally not
output by proper assemblers... someone might've deliberately done it...
then you have the failing-errors... like your very recent favorite.... "CALL
AL"... instructions that would definitely give a #UD.... THIS is where you
would want to interpret to the best of your ability, then supply a message
stating what was wrong with the instruction (or at least... "#UD -- Invalid
Instruction")... but of course, for this, you would have to supply your
decoder with knowledge of what encodings would generate #UD... this
shouldn't be too hard,... it'll just take time, as you would probably want
to read the intel or amd docs,... the intruction reference sections would
tell you that stuff...
Quote
Simply, because if the processor can't execute it, the programmer can't
have
written it (as part of an actually running program), which would mean that
seeing such an instruction would mean I'm out-of-sync.
not necessarily... could be self-modifying code to decrypt/decompress... in
which case, you might as well display such stuff as a block of memory and go
on to check branches...
Quote
>IMHO the best way for you is setting some internal "flags" on receiving
>any prefixes, using the flags in decoding the current instruction (only
if
>they are necessary) and resetting them before next instruction.

Which appears to be allmost exactly what I'm doing now :-)

Regards,
Rudy Wieser



 

Re:Re: Found something I could not explain while assembling a test-file ....

"Robert Prins" < XXXX@XXXXX.COM >wrote in message
Quote
Vasya Pupkin wrote in message < XXXX@XXXXX.COM >...
>Hi Rudy,
>
>[...]
>>I think that on this point your and my points-of-vision differ : If the
>>processor can't execute it, I want to know !
>
>If only you are sure the processor is going to execute it.
>The instruction (or its "part") may be inserted on purpose
>to cheat you ;-)
>
>>Simply, because if the processor can't execute it, the programmer can't
>>have written it (as part of an actually running program),
>
>Some programmers can ;-) Moreover, they do it specially.


You're very right. Use an invalid opcode, follow it with whatever
you can make up and retrieve those bytes in an invalid opcode
handler to do some special processing. It's quite a well known
approach to extending the instruction set.

Robert
--
Robert AH Prins
XXXX@XXXXX.COM

until Intel or AMD decide to re-use invalid encodings for other
instructions... though i doubt stuff like "CALL reg8" would be reassigned by
Intel or AMD to be a different opcode... they have so much empty space in
their current tables... by the time they run out, they'll have moved on to a
more organized setup...
(though... let's hope the IA-64 instruction set does NOT catch on.... i can
see why Microsoft decided to drop their inline assembly from Visual C++...
only the programmers that make compilers, assemblers and optimizers would
want to mess with the IA-64 instruction set... well.. not that they'd WANT
to... but they'd HAVE to...)
 

Re:Re: Found something I could not explain while assembling a test-file ....

Quote
But my aim is to first create the "can be executed" -mode (as
instruction-extenders are not that wide spread).
just curious.. have you created the "decode single instruction" part yet? or
is that step zero?