Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disassembler adjustments, attempt #2 #7

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

anthony-coulter
Copy link
Contributor

The format string change fixes a bug: fsgnj instructions write to floating registers, not integer registers.

I add half-precision floating point instructions because they were recently approved by the RISC-V Foundation and their formats are similar to the other three precisions that you already support. The RISC-V Foundation has approved other extensions (particularly the bit manipulation and vector extensions) but they're more complicated so I haven't added them yet.

The pseudoinstruction cleanup fixes some regressions from last October in which branch instructions were needlessly disassembling to pseudoinstructions. The worst offender was "blt rs1, rs2, LABEL" expanding to "bgt rs1, rs2, LABEL" but there was also a quirk where "bge zero, zero, LABEL" was expanding to "blez zero, LABEL" when "bgez zero, LABEL" is arguably better.

The O(n^2) string builder improvement seems unnecessary but I've been using your disassembler as a reference implementation against which I compare my own output on all 1 billion possible 32-bit instructions. Those repeated calls to strlen were killing me.

The aliases of fsgnj (fmv, fabs, and fneg) have a floating-point destination
register; the old code displayed this as an integer register.
Do not needlessly disassemble "blt rs1, rs2, LABEL" as "bgt rs2, rs1, LABEL".

Likewise, disassemble "blt zero, zero, label" as "blt zero, LABEL" and not
"bgtz zero, LABEL" even though they are equivalent, and disassemble
"bge zero, zero, LABEL" as "bgez zero, LABEL" and not "blez zero, LABEL".
@michaeljclark
Copy link
Owner

oh cool. it mostly looks good from eyeballing the changes. I will check it out and test it. tests are a bit thin in this repo because we originally tested this code by interpreting what we decoded so appreciate if you can share results.

the only thing is with the pseudos, I was semi-deliberate about reversing branch directions and using the branch {gt|lt} zero shorthands. iirc it applies all possible pseudo lifts. I have a memory of writing code that goes in both directions and this disassembler is an extract from that code which was largely machine-generated. I guess we could label the branch direction pseudos as forward-only - i.e. assemble them but don't lift them. I have a feeling that binutils also decodes pseudos in its disassembly output. so if you have a preference to not decode pseudos it might be better to add an option to disable them. disabling the branch label reversing is less contentious than disabling the zero shorthands. what does binutils do there? also when I read the pseudo-instruction cleanup commit fragment I only see a re-ordering of one constraint so I'll have to take a closer look at that.

if you want to disable pseudo wholesale you could add a flags parameter or maybe add an integer set to each of the pseudos that check_constraints could filter on.

    if (flags & flag_pseudos) decode_inst_lift_pseudo(&dec);

or

    decode_inst_lift_pseudo(&dec, pseudo_default_set);

the other changes all look good. can I merge it without the pseudos cleanup fragment? as that's kind of a user preference thing.

@anthony-coulter
Copy link
Contributor Author

I asked version 2.39 of the GNU's RISC-V binutils to assemble and then disassemble this sequence:

hello:  blt     a0,a1, hello
        bge     a0,a1, hello
        bltu    a0,a1, hello
        bgeu    a0,a1, hello
        blt     a0,x0, hello
        bge     a0,x0, hello
        blt     x0,a0, hello
        bge     x0,a0, hello
        blt     x0,x0, hello
        bge     x0,x0, hello
        slt     a0, x0, x0        
        # No surprises below this line
        jal     x0, hello
        jal     x1, hello
        jalr    x0, a0, 0
        jalr    x1, a0, 0
        jalr    x0, x1, 0
        addi    x0, x0, 0
        addi    a0, x0, 0
        addiw   a0, a1, 0
        slt     a0, a1, x0
        sltiu   a0, a1, 1
        fence   iorw, iorw
        fsgnj.s         fa0, fa1, fa1
        fsgnjx.s        fa0, fa1, fa1
        fsgnjn.s        fa0, fa1, fa1

The result was:

0000000000000000 <hello>:
   0:   00b54063                blt     a0,a1,0 <hello>
   4:   feb55ee3                bge     a0,a1,0 <hello>
   8:   feb56ce3                bltu    a0,a1,0 <hello>
   c:   feb57ae3                bgeu    a0,a1,0 <hello>
  10:   fe0548e3                bltz    a0,0 <hello>
  14:   fe0556e3                bgez    a0,0 <hello>
  18:   fea044e3                bgtz    a0,0 <hello>
  1c:   fea052e3                blez    a0,0 <hello>
  20:   fe0040e3                bltz    zero,0 <hello>
  24:   fc005ee3                blez    zero,0 <hello> <-- Surprise!
  28:   00002533                sltz    a0,zero
  2c:   fd5ff06f                j       0 <hello>
  30:   fd1ff0ef                jal     ra,0 <hello>
  34:   00050067                jr      a0
  38:   000500e7                jalr    a0
  3c:   00008067                ret
  40:   00000013                nop
  44:   00000513                li      a0,0
  48:   0005851b                sext.w  a0,a1
  4c:   0005a533                sltz    a0,a1
  50:   0015b513                seqz    a0,a1
  54:   0ff0000f                fence
  58:   20b58553                fmv.s   fa0,fa1
  5c:   20b5a553                fabs.s  fa0,fa1
  60:   20b59553                fneg.s  fa0,fa1

All the non-branch instructions at the bottom got disassembled into pseudoinstructions as expected so let's look at the branches.

  1. When neither argument to a branch is x0 the disassembler prefers the primitive blt[u] and bge[u] over the pseudoinstructions bgt[u] and ble[u].
  2. When exactly one argument to a branch is x0 there is only one possible compare-to-zero pseudoinstruction to disassemble to, and so the disassembler uses it.
  3. When both arguments to a branch (or to slt) are x0' the diassembler has to choose between two compare-with-zero pseudoinstructions; it chooses bltzandblezfor branches andsltz` for the set-less-than op.

My code agrees with the GNU disassembler except on part of point 3: I disassemble to bltz and sltz but I disassemble the middle instruction to bgez instead of blez. Oh, dear.

To be clear---my changes to your code don't affect any psuedoinstruction disassembly except for branches, and in general I make your code behave more like GNU binutils than before by preferring blt over bgt. I honestly didn't know that GNU disassembled bge x0,x0, label to a blez pseudoinstruction until I tested it this morning and I'm not sure how I feel about that---my way says that if both arguments to a branch are x0, keep rs1 as an arbitrary register that happens to be x0 but use rs2=x0 to choose what pseudoinstruction to use. I'm not sure what approach the GNU disassembler is taking; maybe it's just searching through an alphabetized list.

What is your stance on the pseudoinstructions after reading this? Are you sold on the idea of preferring blt over bgt? And if so, what do you think about the bge x0,x0.label issue? Is it worth trying to figure out how to get that particular instruction to match the GNU disassembler, or can we let that slide?

One other completely unrelated thing---I think one of my commits added a hard tab to the file. (I use tabs instead of spaces and the muscle memory is hard to break.) But GitHub is a pain to figure out so I'll try to sort this out if/when I update the other commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants