A PCC driver in Qemu

In order to perform test driven development, you need a way to drive your code that can isolate behavior. Linux Kernel drivers that communicate with hardware devices can be hard to test: you might not have access to the hardware from your test systems, or the hardware may be flakey. I have such a set of issues with the Platform Communication Channel (PCC) drivers I am working with.

My primary work has been with a network driver that only exists on the newest hardware. However, I also need to be able to handle some drivers that would only work against old hardware. There are also PCC based drivers for hardware that my company does not support or have access to. I might want to make a test to ensure that changes to the Linux Kernel PCC driver does not change its behavior against these drivers. There exists no system where all of these drivers would be supported. But I can build one with Qemu.

The Qemu based driver might not completely simulate the hardware exactly as implemented, and that is OK: I want to be able to do things with Qemu I cannot do with current hardware. For example, the MCTP-over-PCC driver should be able to handle a wide array of messages, but the hardware I have access to only supports a very limited subset of message types.

The full code for the device is here.

Here is how I went about building a Qemu based PCC driver.

Qemu Device Basics

I want this code to run on Aarch64 (ARM64) natively. That means that I run the machine specified in hw/arm/virt.c. Thus, the first line of my run script is:

../qemu/build/qemu-system-aarch64 -machine virt \

The device itself lives in hw/arm/pcc.c. It was originally called mctp-pcc.c, but I soon realized that there was no reason to make it MCTP specific. While the code is testing type 3/4 devices, I suspect it would work fine for a type 2 or other driver with a minimum of changes.

Every device has to hang off a bus. Thus I started by creating a device like this article suggests: off the system bus: SysBusDevice parent_obj; This differs from some of the other examples out there where you are create, say a PCIe device, as there is a way to dynamically load PCIe devices: you cannot dynamically load SysBus devices, at least not in the default AARCH64 Qemu virt machine. Thus, I have to modify the virt.c code to add in my device.

ACPI Tables

I had to generate two new ACPI Tables: Secondary System Descriptor Table (SSDT) and and Platform Communication Channel (PCCT.) These tables are gnenerated from a vall in virt.c to create_pcc_devices. This function probably should be moved to a pcc specific file so it could potentially be shared by other virtual machine types, but for now it co-exists in virt.c as well. For now it is hard coded to only build the one device. This is obviosuly not going to scale. I will talk about how to improve this at the end of the article.

The bulk of the code in the driver is for generating the entries for the PCCT. The data in the PCCT has the address of the shared memory registers and data buffer, and the IRQ ID used to communicate between the OS and the platform. THe information is stored in a structure called PcctExtMemSubtable, which will then be written to the PCCT using ACPI primitives. This structure is filled during the device realise function mctp_pcc_realize.

The SSDT is a bit more free form, and does not have a structure to support it, but probably should. Right now I am just writing the direct primitives for the entry.

Memory Mapped IO

Both the outbox and inbox channels are mapped to single, contiguous block of memory. When reads or writes happen, Qemu forwards them to custom functions. I can then use the memory offset to identify if this a register or if it is the shared buffer. One of these memory offsets is the doorbell, and is used to implement the IRQ processing.

Each machine type in Qemu has a memory map table. In virt.c it is called

static const MemMapEntry base_memmap[] = {...

I found a space in the middle of the table that was unclaimed and use it for both of the channels of the PCCT: the code looks like this:

    [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
    /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
    [VIRT_PCC] =                { 0x0a008000, 0x00008000 },
    [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },

There is enough room between VIRT_PCC and VIRT_PLATFORM_BUS for multiple PCC entries. NUM_VIRTIO_TRANSPORTS is set to 32 (0x20). Multipltied by 0x200 = 0x0a004000 there is still plenty of room beyond the end of that and 0x0a008000.

Mapping IRQs

Just as the machine has a mapping for memory mappied IO, the machine has a table for IRQs. For virt.c this table is defined as

static const int a15irqmap[] = {

I added the PCC IRQs in like this:L

    [VIRT_SMMU] = 74,    /* ...to 74 + NUM_SMMU_IRQS - 1 */
[VIRT_PCC] = 80 , /* and 81 */
[VIRT_PLATFORM_BUS] = 112, /* ...to 112 + PLATFORM_BUS_NUM_IRQS -1 */



Since NUM_SMMU_IRQS is defined as 4, we have enough room for 2 IRQs at 80.

The ARM64 Virtual machine uses a GIC. IT has an internal offset, so ID 1 inside Qemu because IRQ 33 inside the linux virtual machine. Thus the actual mapping takes place inside create_pcc_devices:

   qemu_irq out_q_irq = qdev_get_gpio_in(vms->gic, outbox_irq)
   sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, out_q_irq);
   qemu_irq in_q_irq = qdev_get_gpio_in(vms->gic, inbox_irq);
   sysbus_connect_irq(SYS_BUS_DEVICE(dev), 1, in_q_irq);

The outbox is designed to be triggered from the OS, and then to trigger it back once a message has been processed. The inbox is for sending messages to the OS.

One thing that is not well done yet is that these numbers are not then communicated to the Device: right now we you magic constants to keep them in sync. This is something to improve in the future.

Flattened Device Tree

Qemu has a standard way to represnt all hardware devices. Even though ACPI can play this role in a physical machine, Qemu goes with the more uniform FLattened Device Tree. Thus, for each device we create, we need to create a FDT entry. This includes knowing about the interrupts assigned.

 qemu_fdt_add_subnode(ms->fdt, nodename);
    qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", TYPE_MCTP_PCC);
    qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", 2, base, 2, size);
    qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
                           GIC_FDT_IRQ_TYPE_SPI, outbox_irq, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
                           GIC_FDT_IRQ_TYPE_SPI,  inbox_irq, GIC_FDT_IRQ_FLAGS_LEVEL_HI)

Communication outside Qemu

When an interrupt comes from the OS to Qemu, I copy the contents of the shared buffer to a file in /tmp/pcc/outbox. I have written a program called PCCD which runs as an external process. PCCD uses Inotify to identify that a new file has been written and closed, and will then process the file. PCCD responds by posting a message to an inbox directory. Qemu also uses Inotify to identify that there is a new message, and stores it in the shared buffer. It then triggers an IRQ in the OS which tells the OS that there is a message to read. All files names are generated from timestamps.

Testing the system

I was able to reuse a shell script I had written for the MCTP over PCC driver to send messages to the Kernel. I copied this inside the VM. This is essentially the same test as I use to test the physical hardware implementation. However, now I can extend it to run messages that the Hardware does not implement. TO do this, I can implement the messages in PCCD.

Future Improvements

The PCCT itself could be thought of as a type of bus. It may make sense to create a new Bus Type to support it and the devices that hang off it. That would allow a way to scope in PCC specific behavior.

There is a mechanism to create DSDT entries for ACPI device interfaces. It loops through all the devices on a Bus and checks to see if the device implements the AcpiDeviceIf interface. If it does, it adds a couple functions to the device. While our devices are ACPI devices, we do not need those functions. Instead, we can take the pattern and create a PCC interface that allows the device to define its own values.

struct AcpiDevPccIfClass {
    /* <private> */
    InterfaceClass parent_class;

    /* <public> */
    dev_pcc_fn build_dev_pcc;
    dev_ssd_fn build_dev_ssd;
    dev_pcct_ext_mem_subtable_fn get_ext_mem_subtable;
};

This interface could be hung off of the SystemBus, but then we need to enumerate each SystemBusDevice to see if it has this interface.

Both options seem viable.

The benefit to going with the PCCBus is we should be able to then make the devices loadable at run time via command line parameters. To do that with SystemBus would require a change to virt.c that might not be acceptable.

And I need a struct for the SSDT.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.