When developing Linux Kernel code, I have found myself wanting to have a test fixture inside the Firmware that lets me inspect the values communicated out of and into the Linux Kernel. I am currently writing one such fixture in Qemu. And I have an interrupt that is not getting handled by the Linux Kernel, I think because it is not getting delivered.
I have found it quite valuable to run this Qemu process in the Gnu Debugger. Here is how I (with help) got to the bottom of the mystery.
One prep step is to disable some reporting in GDB. WHen GDB starts, it offers to load in debug info, but I do not need or want that. By default, GDB will break one each signal of SIGUSR1, and there are too many of them. GDB also it prints output each time a thread ends, and I don;t care about that. Add the following line to ~/.gdbinit
set debuginfod enabled off
handle SIGUSR1 noprint nostop
set print thread-events off
(Or you can type these into the gdb command prompt.)
Here is how I am running the VM. Note that the first line points to a version of Qemu that I have built myself.
gdb --args ../qemu/build/qemu-system-aarch64 \
-machine virt \
-enable-kvm \
-m 16G \
-cpu host \
-smp 16 \
-nographic \
-bios /usr/share/edk2/aarch64/QEMU_EFI.fd \
-drive if=none,file=../virt/my_vm.qcow2,id=hd0 \
-device virtio-blk-device,drive=hd0,bootindex=0 \
-drive file=../virt/Fedora_Server_dvd_aarch64_42_1.1.iso,id=cdrom,if=none,media=cdrom \
-object memory-backend-file,id=mem,size=16G,mem-path=/dev/shm,share=on \
-numa node,memdev=mem \
-chardev socket,id=char0,path=/tmp/virtiofs_socket \
-virtfs local,path=/root/adam/linux,mount_tag=mylinux,security_model=passthrough,id=fs0 \
-device virtio-scsi-device \
2>&1 | tee /tmp/qemu.log
While there is a -gdb flag that you can include in the qemu command line, I found it did not work for me. Additionally, I may take the gdb –args string into an env var, and use that to switch whether or not to debug.
The –args flag passes on the command line arguments into gdb to be used when the program is run. Thus, once we are on the gdb command prompt, we can set a break point like this:
break pcc_timer_callback
And then simply call run without any parameters.
Since this VM is launching the Linux Kernel, there will be points in the process where the command prompt returns and you can type. For example, during grub, you can hit return to speed through the timer and launch the selected kernel: or change the selected kernel if you want. For my workflow, I need to log in to console, and then run a test script. It is this test script that triggers the break point I set above.
One benefit to gdb is that it tells what functions are really assigned to the function pointers. For example, in the raise_irq call chain, there is a call to
irq->handler(irq->opaque, irq->n, level);
And stepping through, I can see that it steps into
kvm_arm_gic_set_irq(s->num_irq, irq, level);
And thus I can inspect the irq number:
(gdb) print irq
$1 = 80
This IS the number I assigned. However…later on I see this code is executed (hw/intc/arm_gic_kvm.c starting at line 57):
if (irq < (num_irq - GIC_INTERNAL)) {
/* External interrupt. The kernel numbers these like the GIC
* hardware, with external interrupt IDs starting after the
* internal ones.
*/
irqtype = KVM_ARM_IRQ_TYPE_SPI;
cpu = 0;
irq += GIC_INTERNAL;
}
At first I didn’t think much of it, but, later on, a coworker and I started looking inside the Linux kernel at /proc/interrupts I see these pair of lines.
12: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 80 Level pcc-mbox
13: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 81 Level pcc-mbox
14: 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 GICv3 33 Level uart-pl011
So the interrupt handler is registered, but no interrupts have been delivered. THe 80 and 81 are the interrupt numbers. My coworker suggested I look at the next line. The UART has an interrupt of 33, but inside the Qemu code, I see this:
static const int a15irqmap[] = {
[VIRT_UART0] = 1,
And looking for that specific UART create code:
int irq = vms->irqmap[uart];
...
qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
GIC_FDT_IRQ_TYPE_SPI, irq,
GIC_FDT_IRQ_FLAGS_LEVEL_HI);
The UART is registered as Interrupt 33 inside the Linux Kernel, but Interrupt 1 inside Qemu. Lets go look at the value for GIC_INTERNAL
#define GIC_INTERNAL 32
What happens if we add 32 to the interrupt value in the code that reports the interrupt ID to the Kernel? My test runs. I can’t right now look at the interrupt delivery, as I have an infinite loop, and that is not a surprise as this code is still under development.