These are my notes for compile time proxies generated from C++. I’m not sure I will be able to understand them in the future, so good luck to you if you feel the need to read them.
Java Dynamic proxies are a well established means of reducing code by extracting a cross cutting concern. The C++ philosophy is more “Why put off to runtime that which can be performed at compile time.” How would we get the same kind of flexibility from C++ as we get from Java Dynamic proxies?
First, we would need a handful of helper classes that mimic the introspection API of Java. If we have the simple classes of Method, Field, Parameter, and Class, we can perform much of the logic we need. Refer to the Java reflexion API to see roughly what these classes should contain and what they do.
Code generation is the obvious approach, and the lack of introspection of the C++ makes abstract syntax tree analysis it the only viable approach currently available. We can get all the information we require from g++ if we just ask nicely. FOr example, if we add the flag -fdump-translation-unit to g++ we get the file with the AST in an ultra-normalized form. For example, I want to find all of the classes defined in the file generated when I compile ExampleTestCase.cpp. The file ExampleTestCase.cpp.t00.tu on line 414 has:
@1086 identifier_node strg: ExampleTestCase lngt: 15
If we then search for what @1086 means:
adyoung@adyoung-devd$ grep -n “@1086 ” ExampleTestCase.cpp.t00.tu
1749:@783 type_decl name: @1086 type: @554 srcp: ExampleTestCase.h:14
1762:@787 function_decl name: @1086 type: @1093 scpe: @554
2414:@1086 identifier_node strg: ExampleTestCase lngt: 15
4237:@1932 type_decl name: @1086 type: @554 scpe: @554
4242:@1935 function_decl name: @1086 mngl: @2450 type: @2451
28445:@13185 function_decl name: @1086 mngl: @14801 type: @14802
We see that this identifier is used several places, but the two interesting ones are the type_decl lines, and they both refer to entry @554. Most likely the function definitions are something like the constructors. This is the data on that record:
@554 record_type name: @783 size: @43 algn: 64 vfld: @784 base: @785 accs: priv tag : struct flds: @786 fncs: @787 binf: @788
It needs some prettying up, to get it all on one line, but other than that, it looks right. The big thing is the tag: struct that tells us this is a c struct. C++ must be forced to conform to c at some point, so classes become structs.
Let’s take it even simpler. If we make an empty C++ file, called empty.cpp and compile it with:
g++  -fdump-translation-unit  -c -o empty.o empty.cpp
we get a file with a lot of standard symbols defined:
grep identifier empty.cpp.001t.tu | wc -l
1215
If we add a single static variablle, the venerable xyzzy, we can easily find it in the file:
adam@frenzy:~/devel/cpp/proxy$ echo “static int xyzzy;” >> xyzzy.cpp
adam@frenzy:~/devel/cpp/proxy$ g++  -fdump-translation-unit  -c -o xyzzy.o xyzzy.cpp
adam@frenzy:~/devel/cpp/proxy$ grep identifier xyzzy.cpp.001t.tu | wc -l
1216
We’ve only added a single line, that looks like this:
@4     identifier_node strg: xyzzy   lngt: 5
If we now add a Noop struct to that, we get a little bit more info:
adam@frenzy:~/devel/cpp/proxy$ echo “struct Noop{}; static int xyzzy;” >> Noop.cpp
adam@frenzy:~/devel/cpp/proxy$ make Noop.o
g++ -fdump-translation-unit   -c -o Noop.o Noop.cpp
adam@frenzy:~/devel/cpp/proxy$ grep identifier Noop.cpp.001t.tu | wc -l
1217
Note that I’ve added -fdump-translation-unit to the CPPFLAGS in a Makefile.
Each change has a significant effect on the resultant file:
adam@frenzy:~/devel/cpp/proxy$ wc -l Noop.cpp.001t.tu
6853 Noop.cpp.001t.tu
adam@frenzy:~/devel/cpp/proxy$ wc -l xyzzy.cpp.001t.tu
6845 xyzzy.cpp.001t.tu
adam@frenzy:~/devel/cpp/proxy$ wc -l empty.cpp.001t.tu
6841 empty.cpp.001t.tu
Because the symbol gets added early (@4) it bumps all of the other symbols in the file up one, so a diff would take a little parsing. A visual inspection quickly shows that the following section has been added to xyzzy.cpp.001t.tu
@3     var_decl        name: @4      type: @5      srcp: xyzzy.cpp:1
chan: @6      link: static  size: @7
algn: 32Â Â Â Â Â Â used: 0
@4     identifier_node strg: xyzzy   lngt: 5
@5     integer_type    name: @8      size: @7      algn: 32
prec: 32      sign: signed  min : @9
max : @10
If we compare the two files based on the @ signs:
adam@frenzy:~/devel/cpp/proxy$ grep — @ xyzzy.cpp.001t.tu | wc -l
4427
adam@frenzy:~/devel/cpp/proxy$ grep — @ empty.cpp.001t.tu | wc -l
4424
We can see we have added three, which corresponds with what we have above.
Just adding the emptyr struct adds 10 lines:
adam@frenzy:~/devel/cpp/proxy$ grep — @ Noop.cpp.001t.tu | wc -l
4434.
To make iut a little easier, I went in and put a carriage return after struct Noop{};Â Now I can look for Noop.cpp:1 or Noop.cpp:2
This eems to be the set of lines added for struct Noop:
@6     type_decl       name: @11     type: @12     srcp: Noop.cpp:1
note: artificial             chan: @13
@7     integer_cst     type: @14     low : 32
@8     type_decl       name: @15     type: @5      srcp: <built-in>:0
note: artificial
@9     integer_cst     type: @5      high: -1      low : -2147483648
@10    integer_cst     type: @5      low : 2147483647
@11    identifier_node strg: Noop    lngt: 4
@12    record_type     name: @6      size: @16     algn: 8
tag : struct  flds: @17     binf: @18
Let’s see what happens if we add field.
Here’s OneOp.cpp
struct OneOp{
int aaa;
};
static int xyzzy;
adam@frenzy:~/devel/cpp/proxy$ grep — @ Noop.cpp.001t.tu | wc -l
4434
adam@frenzy:~/devel/cpp/proxy$ grep — @ OneOp.cpp.001t.tu | wc -l
4439
We get another five lines. Let’s see if this is linear.
adam@frenzy:~/devel/cpp/proxy$ grep — @ TwoOp.cpp.001t.tu | wc -l
4444
adam@frenzy:~/devel/cpp/proxy$ grep — @ ThreeOp.cpp.001t.tu | wc -l
4449
Let’s try a function now.
adam@frenzy:~/devel/cpp/proxy$ cat OneFunc.cpp
struct OneFunc{
int narf();
};
static int xyzzy;
adam@frenzy:~/devel/cpp/proxy$ grep — @ OneOp.cpp.001t.tu | wc -l
4439
adam@frenzy:~/devel/cpp/proxy$ grep — @ OneFunc.cpp.001t.tu | wc -l
4448
About double the info.
My next goal will be to diagram out the data structures we have here using UML.
Things look fairly straight forward in the decifering until we get to function_type. There, we have a reference to retn which in this case happens to be a void, but could concievably be any of the data types.
I have long since abandonded this approach, but may pick it back up again some day, so I will publish this and let the great crawlers out there make it avaialble to some poor sap that wants to continue it. If you do so, please let me know.