Realmode bhyve

I have been poking around bhyve, seeing what is up and I came across this article about writing a Linux kvm driver from scratch . In the article is an example of minimal program to run as a first test in the kvm driver:

; Output to port 0x3f8
mov dx, 0x3f8

; Store the address of the message in bx, so we can increment it
mov bx, message

loop:
    ; Load a byte from `bx` into the `al` register
    mov al, [bx]

    ; Jump to the `hlt` instruction if we encountered the NUL terminator
    cmp al, 0
    je end

    ; Output to the serial port
    out dx, al
    ; Increment `bx` by one byte to point to the next character
    inc bx

    jmp loop

end:
    hlt

message:
    db "Hello, KVM!", 0

That seems fun, a nice small example of getting some code running. I don't really want to write my own bhyve, I like the one we have, but it might be nice to try and get this running.

I assembled the example:

nasm -fbin nello.S nello

And looked around to see how to load a bios in bhyve. bhyve(8) has some examples at the end, it looks like the -l flag can be used to set a bootrom (bios) like so:

$ sudo bhyve -l bootrom,./nello nello

vm exit[0]
        reason          VMX
        rip             0x000000000000fff0
        inst_length     3
        status          0
        exit_reason     48 (EPT violation)
        qualification   0x0000000000000784
        inst_type               0
        inst_error              0

Well that didn't work. I poked a bit in bhyve, but it wasn't clear what to do about an EPT violation. The examples also mentioned using /usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd , I opted for the CSM version:

$ sudo bhyve -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CSM.fd hello

I had a poke around the CSM bootrom and while it is always fun to use hexdump, it really didn't help me understand what was wrong with my example assembly.

I tried with BHYVE_UEFI_CSM.fd and guess what I got:

vm exit[0]
        reason          VMX
        rip             0x000000000000fff0
        inst_length     3
        status          0
        exit_reason     48 (EPT violation)
        qualification   0x0000000000000784
        inst_type               0
        inst_error              0

The same trap!

I think that means I need to figure out the minimal viable bhyve command that will run known good bootrom before I try running that example. The last example in bhyve(8) is:

Run a UEFI virtual machine with a VARS file to save EFI variables.  Note
that bhyve will write guest modifications to the given VARS file.  Be
sure to create a per-guest copy of the template VARS file from /usr.

      bhyve -c 2 -m 4g -w -H \
        -s 0,hostbridge \
        -s 31,lpc -l com1,stdio \
        -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CODE.fd,BHYVE_UEFI_VARS.fd
         uefivm

-w waits for the debugger and -H emulates halt to save power, no need for those. So I tried:

bhyve -s 0,hostbridge -s 31,lpc -l com1,stdio -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI_CSM.fd hello

And that worked:

Boot Failed. CDROM 0
Boot Failed. Harddisk 1
UEFI Interactive Shell v2.1
EDK II
UEFI v2.40 (BHYVE, 0x00010000)
Error. No mapping found
Press ESC in 1 seconds to skip startup.nsh or any other key to continue.

Now to try my bios:

$ sudo bhyve -s 31,lpc -l com1,stdio  -l bootrom,./nello hello
bhyve: ROM size 65552 is not a multiple of the page size
Device emulation initialization error: No such file or directory

32 (the raw unpadded 16 bit program size) is also not a multiple of the page size, I padded out the example using TIMES 4096 - ($ - $$) db 0 from a bootsector nasm example

This has not succeeded.

Fine, whatever, I will use gdb to look at what is going on. bhyve supports the -G flag to integrate with gdb. I added

-G wlocalhost:1234

to the bhyve command asking bhyve to wait for gdb to attach and continue listening on localhost port 1234.

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x000000000000fff0 in ?? ()
(gdb) x/32i 0x000000000000fff0
=> 0xfff0:      add    %al,(%rax)
   0xfff2:      add    %al,(%rax)
   ...
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
(gdb) x/32x 0x000000000000fff0
0xfff0: 0x00000000      0x00000000      0x00000000      0x00000000
0x10000:        0x00000000      0x00000000      0x00000000      0x00000000
...
0x10060:        0x00000000      0x00000000      0x00000000      0x00000000
(gdb) x/32x 0x0
0x0:    0x00000000      0x00000000      0x00000000      0x00000000
0x10:   0x00000000      0x00000000      0x00000000      0x00000000
0x20:   0x00000000      0x00000000      0x00000000      0x00000000
0x30:   0x00000000      0x00000000      0x00000000      0x00000000

Connecting and poking around shows the obvious places are all zeros (or sometimes all 1s).

gdb has a 'find' command for searching memory, our example is pretty distinctive so it should find it.

Didn't work for me this time

Stepping immediately just starts the program, for nello we are stopped with rip as 0x000000000000ffef.

0x000000000000ffef in ?? ()
(gdb) x/64x $rip
0xffef: 0x960000ff      0x00ffff00      0x00000200      0x46f00000
0xffff: 0x00000000      0x00000000      0x00000000      0x00000000

disassembly time, FreeBSD's llvm-objdump doesn't have support for 16 bit x86 (fair), so I grabbed binutils and used a command like this:

x86_64-unknown-freebsd15.0-objdump -b binary -m i386 -D -Maddr16,data16 -Mintel nello

Working from objdump I tweaked some offsets to get bytes into the correct places with padding, but there wasn't an obvious clue what was up. I couldn't associate the memory I could read in gdb to anything from my binary.

$ hexdump -C nello
00000000  ba f8 03 bb 11 00 8a 07  3c 00 74 04 ee 43 eb f6  |........<.t..C..|
00000010  f4 48 65 6c 6c 6f 2c 20  62 68 79 76 65 21 00 90  |.Hello, bhyve!..|
00000020  90 90 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
*
0000fff0  e9 0d 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00010000

I turned to qemu to see if that helped:

$ qemu-system-i386 -bios nello -S -s -nographic

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0000fff0 in ?? ()
(gdb) x/32xb 0xffff0000
0xffff0000:     0xba    0xf8    0x03    0xbb    0x11    0x00    0x8a    0x07
0xffff0008:     0x3c    0x00    0x74    0x04    0xee    0x43    0xeb    0xf6
0xffff0010:     0xf4    0x48    0x65    0x6c    0x6c    0x6f    0x2c    0x20
0xffff0018:     0x62    0x68    0x79    0x76    0x65    0x21    0x00    0x90
(gdb) x/16xb 0xfffffff0
0xfffffff0:     0xe9    0x0d    0x00    0x00    0x00    0x00    0x00    0x00
0xfffffff8:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
(gdb) c
Continuing.

That all looks good, it matches up with our hexdump of the bios example. If I hit ^C then we stop at 0x00000011 .

^C
Program received signal SIGINT, Interrupt.
0x00000011 in ?? ()

If we recall that we are running in 16 bit mode in the last sector and convert that off set into the memory dumps we find the byte value 0xf4 an x86 halt instruction.

"HLT causes the 80386 to stop execution. Following a halt, execution can
only be resumed by the receipt of an enabled interrupt or by a reset of
the computer."

- Programming the 80386

So we did what we wanted to and stopped, but qemu gave us no output. I think that has confirmed that the bios image is now correct if not functional. So either we are running fine in bhyve and just not getting output, or there is something else up.

In the example minimal Linux hypervisor they just did a straight printf for an IO vmexit. Lets catch the vmexit handlers in bhyve and see what is up:

diff --git a/usr.sbin/bhyve/amd64/vmexit.c b/usr.sbin/bhyve/amd64/vmexit.c         
index e0b9aec2d17a..e1669c2b5051 100644                                            
--- a/usr.sbin/bhyve/amd64/vmexit.c                                                
+++ b/usr.sbin/bhyve/amd64/vmexit.c                                                
@@ -72,6 +72,7 @@ vm_inject_fault(struct vcpu *vcpu, int vector, int errcode_valid,
 static int                                                                        
 vmexit_inout(struct vmctx *ctx, struct vcpu *vcpu, struct vm_run *vmrun)          
 {                                                                                 
+fprintf(stderr, "%s:%d\n", __func__, __LINE__)                                    
        struct vm_exit *vme;                                                       
        int error;                                                                 
        int bytes, port, in;

I reconfigured my test script to output serial to /dev/nmdm0A so I would get printfs from bhyve, but nothing.

Our assembly doesn't do what we think it should does.

Adding port configuration from this so and osdev wiki led my modified bhyve to print on calls to vmexit_inout .

$ sudo sh ./run.sh nello
outputting serial to /dev/nmdm0B
waiting for gdb
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75
vmexit_inout:75

Those 10 vmexit_input lines match up perfectly with the configuration and test example. This is an excellent debugging sign.

With an extreme amount of further faffing I discovered that the loop in the example I started from was not making it to the first print statement. I confirmed this by stripping away all of the configuation and just spat out some characters explicitly.

In the hexdump nasm was loading the wrong address into bx , but even with the correct address in bx I got no output. As I only wanted to say hello from real mode, I'm done. Debugging segments (segments, not even once) in a pre bios environment where you can't single step just isn't my idea of fun.

The example I started from was run from the base address, by writing their own kvm driver they were able to configure the instruction pointer and segments to look sensible. Me - an idiot, decided to work with the brain melting x86 hardware as it is.

Most of my fighting here was because gdb connecting to the bhyve stub isn't able to read guest memory in the bios region. Neither qemu or bhyve let me single step instructions, which just makes debugging here tedious.

OS Dev wiki is a great resource, but it is very annoying to have lots of "you shouldn't do this" everywhere when you push their 'perfect path'. I just want to know what I need to know.

If you want to play with real mode in bhyve you can start from this, minimal, working example:

; A 64k bios for bhyve which does nothing at all
bits 16
equ PORT 0x3f8

%macro outb 1
        mov al, %1
        out dx, al
%endmacro

start:
        mov dx, PORT                    ; store the port

        outb 0x0a                       ; print a message
        outb 'b'
        outb 'h'
        outb 'y'
        outb 'v'
        outb 'e'
        outb '!'
        outb 0x0a
end:
    hlt                                 ; hang around

TIMES 0xFFF0 - ($ - $$) db 0            ; pad out to reset vector
; cpu is going to start from 0xFFF0, with CS set to 0xF000 basically we are
; going to start at 0xFFFFFFF0, with only 16 bytes to play with, but we can
; just to start of the 64k segment reasonably easily.
jmp start                               

TIMES 0x10000 - ($ - $$) db 0           ; padd out to 64k

Hopefully that end isn't too negative, I had a lot of fun doing this, I just don't want to do anymore of it.