PCIe CXL investigation

I’ve been looking in to PCIe+CXL. These are my notes.

There is a cxl_test module the Linux tree under tools/testing/cxl/.

There is a cxl command line tool. On Ubuntu and CentOS you install it via the ndctl package. This is short for libnvdimm, or Nonvoltile Memory. I think it is needed for CXL Kernel tests, but it is interesting in its own right, too.

When trying to build the cxl_test module, from it’s directory I got…

make -C ../../.. M=$PWD
/home/ayoung/linux/tools/testing/cxl/config_check.c: In function ‘check’:
././include/linux/compiler_types.h:352:45: error: call to ‘__compiletime_assert_117’ declared with attribute error: BUILD_BUG_ON failed: !IS_MODULE(CONFIG_CXL_BUS)
  352 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)

This means I need to change the config option to the kernel build from ‘y’ to ‘m’ in order to build it as a module. The make menuconfig search function shows the output below. Note a that PCI Support is the top menu item on the device drivers page.

   Symbol: CXL_BUS [=y]                                                                                                                                                                                 
   Type  : tristate                                                                                                                                                                                     
   Defined at drivers/cxl/Kconfig:2                                                                                                                                                                 
     Prompt: CXL (Compute Express Link) Devices Support                                                                                                                                               
     Depends on: PCI [=y]                                                                                                                                                                             
       Main menu                                                                                                                                                                                      
         -> Device Drivers                                                                                                                                                                            
   (1)     -> PCI support (PCI [=y])

One made, there are a bunch of .ko files in the subdir:

$ find . -name \*.ko
$ sudo insmod test/cxl_test.ko 
insmod: ERROR: could not insert module test/cxl_test.ko: Unknown symbol in module
11204.608668] cxl_test: Unknown symbol cxl_decoder_autoremove (err -2)
[11204.615136] cxl_test: Unknown symbol devm_cxl_add_dport (err -2)
[11204.621236] cxl_test: Unknown symbol is_cxl_memdev (err -2)
[11204.626927] cxl_test: Unknown symbol cxl_decoder_add_locked (err -2)
[11204.633917] cxl_test: Unknown symbol cxl_switch_decoder_alloc (err -2)
[11204.640706] cxl_test: Unknown symbol cxl_endpoint_decoder_alloc (err -2)
[11204.647649] cxl_test: Unknown symbol to_cxl_port (err -2)
[11204.653117] cxl_test: Unknown symbol register_cxl_mock_ops (err -2)
[11204.659676] cxl_test: Unknown symbol unregister_cxl_mock_ops (err -2)

The mock module reports the error

[11573.093178] cxl_mock: Unknown symbol nvdimm_bus_register

So Building ../nvdimm using the same approach as above. This symbol is defined in

../nvdimm/nfit.mod.c:105:	{ 0xe9117c1f, "nvdimm_bus_register" },

That brings up the errors

[11907.753694] libnvdimm: Unknown symbol __wrap_devm_memunmap (err -2)
[11907.760070] libnvdimm: Unknown symbol __wrap___release_region (err -2)
[11907.766676] libnvdimm: Unknown symbol __wrap___devm_request_region (err -2)
[11907.773764] libnvdimm: Unknown symbol __wrap_memunmap (err -2)
[11907.779997] libnvdimm: Unknown symbol __wrap___devm_release_region (err -2)
[11907.787085] libnvdimm: Unknown symbol __wrap_memremap (err -2)
[11907.793345] libnvdimm: Unknown symbol __wrap_iounmap (err -2)
[11907.799217] libnvdimm: Unknown symbol __wrap___request_region (err -2)
[11907.806304] libnvdimm: Unknown symbol __wrap_devm_memremap (err -2)

Some guidance from Dan Williams on how to run the test: https://github.com/pmem/ndctl/blob/main/README.md. To Build nvdimm code:

make M=tools/testing/nvdimm
make M=tools/testing/cxl/
sudo make M=tools/testing/nvdimm modules_install

Both of those give:

depmod: WARNING: /lib/modules/5.19.0_ampcxl_+/extra/test/nfit_test.ko needs unknown symbol libnvdimm_test
depmod: WARNING: /lib/modules/5.19.0_ampcxl_+/extra/test/nfit_test.ko needs unknown symbol acpi_nfit_test
depmod: WARNING: /lib/modules/5.19.0_ampcxl_+/extra/test/nfit_test.ko needs unknown symbol pmem_test
depmod: WARNING: /lib/modules/5.19.0_ampcxl_+/extra/test/nfit_test.ko needs unknown symbol device_dax_test
depmod: WARNING: /lib/modules/5.19.0_ampcxl_+/extra/test/nfit_test.ko needs unknown symbol dax_pmem_test

When I try to run the ndctl test:

sudo meson test -C build

The tests are skipped


Due to

libkmod: DEBUG libkmod/libkmod-module.c:202 kmod_module_parse_depline: 1 dependencies for nfit
test/init: ndctl_test_init: nfit.ko: appears to be production version: /lib/modules/5.19.0_ampcxl_+/kernel/drivers/acpi/nfit/nfit.ko
__ndctl_test_skip: explicit skip test_libndctl:2600
nfit_test unavailable skipping tests

The instructions above showed the way forward: I needed to perform a modules_install of the modules built for the test (tools/testing/nvdimm and tools/testing/cxl including explicitly installing the ones for the tools/testing/nvdimm/test) before the tests will run. Which is clearly stated in the instructions.

The error in the logfile now shows that the code is x86_64 specific: there is a failure to load the module nd_e820 which is related to memory management on x86_64 platforms. The file: ndctl/test/core.c has the following line:

        if (access("/sys/bus/acpi", F_OK) == 0)
                family = NVDIMM_FAMILY_INTEL;

and then later

                        if (family != NVDIMM_FAMILY_INTEL &&
                            (strcmp(name, "nfit") == 0 ||
                             strcmp(name, "nd_e820") == 0))

However, my machine does have the path /sys/bus/acpi but will not build/load the nd_8280 module. This seems to indicate at least where to start working on the test: making an appropriate AARCH64 Family for the core test framework. I suspect the right thing is to add in a check to something like /proc/cpu and look at the manufacturer. Alternately, I could look at uname -r and see what architecture the Kernel is running on, if the solution is less vendor specific than required for x86_64. Tasks for future days.

For now, I am just going to highjack this check and say that it should set family equal to NVDIMM_FAMILY_AARCH64. With that, the first test passes, maybe some others, have not looked that closely yet.

Next up I will continue through the tests and see what else I can hammer in to place to get them to pass.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.