Last post I showed how to do multiplication for a vector of integers using ARM64 instructions. Lots of use cases require these kinds of operations to be performed in bulk. The Neon coprocessor has instructions that allow for the parallel loading and multiplication of numbers. Here’s my simplistic test of these instructions.
Continue readingCategory Archives: Software
Vector Multiplication in ARM64 assembly
Lets start with the basics: we can multiply the cells two vectors in C code and disassemble the resulting binary. This is a trivial operation of multiplying each cell in the first vector by the corresponding cell in the second vector. It is not a cross product. This will be a naive implementation, but it should get us started.
Continue readingAgile is not agile
Based on previous experiences, I would be hard pressed to join a team that has Scrum masters.
I expect that my previous experience was extreme, but not outside expectations for an organization that felt the need to officially embrace Agile(tm) and the concept of agile practitioners as part of the development process. It has been a bit over a year, and my frustration at the process has had a bit of time to settle, but I still feel my virtual hackles rise up when I think about the process.
Continue readingperf option to test branch records
perf record --branch-filter any,save_type,u true |
The Overtone Scale?
If you look at the overtone series, specifically from the 8th to the 16th, you’lll notice that they fit into an octave and almost, but not quite, make up a major scale.. I’d like to look a little closer at that difference.
If we start on a C, the notes of the series are (almost)
C D E F# G Ab Bb B C
I say almost because they are not actually these notes…these are merely the closest approximations to the notes if you are playing a well-tempered clavier or its descendants. If you are playing something a little more flexible, such as a fretless string instrument like the violin, you can adjust your playing and play exactly these pitches.
Continue readingPoking at Performance Events from User land
Linux has a set of events you can query to look at performance of … well lots of things. Its a generic mechanism. Here’s a quick peek at the set of values I can see if I look at an AltraMax running Fedora 36.
Continue readingApply Linux Kernel Patches from LKML
Linux kernel work can call for you to test out a patch set that someone has posted to the Linux Kernel Mailing List (LKML). If the patch sets are sufficiently long enough, you want to apply them all together, and not have to down load them individually. I recently worked through this, and here’s how I got things to work.
Continue readingPCIe CXL investigation
I’ve been looking in to PCIe+CXL. These are my notes.
Continue readingStarting CPUs on ARM64
The systems I am working with have 80 or more cores in them. I’ve recently had to investigate processes around core start up. Here are my notes.
Continue readingEnabling ARM64 CPU Capabilities in the Linux Kernel
ARM64 design defines features long before there is a CPU that can implement those features. Since the ARM ecosystem is so varied, there are many different CPU designs out there with different capabilities. A general purpose linux Kernel build put out by a major distribution has to work across a wide array of chips by a large nuymber of vendors. Thus, there is an enumeration of the capabilities inside the Kernel and mechnism for describing how to probe each of these capabilities.
Continue reading