MMXX Internals Tutorial

Introduction

Please read the overview file before you read this tutorial. Since this is a preliminary development release of MMXX, the main focus in the current documentation is to provide advanced information necessary to people who intend to work on MMXX itself, rather than on MMXX-enabled applications.

In the included "basictest" example, there are only two MMXX-enabled classes, according to the following hierarchy:

In module testhost: class testme;

In module Module.dll: class derius : public testme;

The derivation relationship crosses a module boundary, an all-too-important fact that can be largely ignored from the standpoint of the code that manipulates instances of those classes. While this is also true of traditional C++ dynamic linking, not only cross-module inheritance but also mere cross-module method invocation is enough to trigger the fragile base class problem in that context. Fortunately, both are possible, FBC-free, and relatively transparent if MMXX is used.

Application code does need to utilize MMXX facilities to deal with situations that C++ just can't handle directly (such as instantiating a class that was entirely invisible to the compiler at compile time) as well as to make up for those circumstances where the C++ constructs are simply insufficient for cross-module operation in the MMXX scenario, such as by replacing object pointers and references in cross-module function signatures with module-safe object handles (x_ptr<>, x_cptr<>, x_ref<> and x_cref<>), using the VirtualDelete() method instead of delete, etc.

However, you'll find that familiarizing yourself with these features won't take long, and that the MMXX infrastructure does not limit C++ freedom nearly as much as one might expect.

MMXX Internals (advanced)

In order to learn to use MMXX it's helpful to understand how MMXX implements cross-module operation. Instead of relying on C++ dynamic linking (which suffers from the fragile base class problem), MMXX relies on its own dynamic linker. This linker uses multiple virtual method tables (vtables) for each class, rather than a single shared one as C++ implementations do. Since single-vtable compilers are not directly able to cope with this arrangement, MMXX isolates the modules entirely at the binary level, constructing a system of support information and code used to bind the modules together. Thus the C++ compiler is entirely unaware that the code it generates will need to be linked with other code before it can run. Because of this property, MMXX sometimes has the side-effect of making otherwise incompatible compilers produce modules that can work together (if circumstances allow, in particular the use of matching calling conventions.)

Example 1: Shadow classes

Since classes play most of their roles at compile time, they must somehow be visible to the compiler even though their true definitions are to remain foreign until runtime. MMXX uses the concept of a shadow class, a class whose implementation consists entirely of linker support code and data. Shadow classes are declared and implemented automatically by the support code generation process. In a sense, they are "stub" classes; their interfaces are almost identical to those of their real counterparts that they provide access to, but their implementations are MMXX-generated and they contain no data members other than for MMXX runtime support.

From the standpoint of the C++ compiler, the hierarchy above is modified by introducing a shadow class for class testme in Module.dll (to differentiate among the testme real and the testme shadow classes, we'll use the $real and $shadow suffices):


             +---------------+
testhost:    |  testme$real  |
             +---------------+
                    /|\	
                     |A	
---------------------|-----------------------------------------------
(module boundary)    |
                    \|/	
             + - - - - - - - +                +---------------+
Module.dll:  : testme$shadow : <------------- |    derius     |
             + - - - - - - - +        B       +---------------+

In the arrows above, B indicates the consuete C++ derivation relationship. The C++ compiler used to build Module.dll is aware of this relationship, and manages it in the traditional fashion. The A arrow indicates a peer-shadow cross-module relationship instead. This relationship is implemented entirely in MMXX support code and metadata, and the C++ compiler is unaware of it.

An important implication of this is that whenever class derius is instantiated, MMXX ends up creating two objects from the compiler's perspective - an instance of testme$real in testhost, and an instance of derius in Module.dll - which are permanently bound together and indeed behave like a single object (and should in all circumstances be treated as such) from the application's perspective. For the remainder of this document, we'll use the terms compiler object or compiler instance to refer to what the compiler sees as distinct objects, but are really only portions of what an application sees as a single logical object or instance.

Method Dispatching

To illustrate how this arrangement works in practice, consider the following methods:


class testme {
...
public:
    double divideby(double);
        // non-virtual, implemented in testme
    virtual double queryval() const = 0;
        // pure virtual, to be implemented by subclass
    virtual double getscaling();
        // virtual, can be overridden by subclass
...	
};
		
class derius : public testme {
...	
    virtual double queryval();
        // implements pure virtual method from testme
    virtual double getscaling();
        // overrides testme::getscaling()
...	
};

This is what happens when those methods are invoked by code in testhost:


// inside testhost:
    testme *deriusObj = ...;
        // assume deriusObj is an instance of derius manipulated
        // through a testme* base class pointer
    double a = deriusObj->divideby(10.0);
        // not too special - regular C++ behavior, goes to
        // testme$real::divideby(), no support code gets invoked
    double b = deriusObj->queryval();
        // this invokes testme$real::queryval(), whose definition
        // was automatically provided by MMXX (the =0 was omitted
        // in the declaration.) The support code crosses the
        // module boundary and invokes testme$shadow::queryval().
        // As a matter of fact, the MMXX auto-generated definition
        // for testme$shadow::queryval() simply returns 0.0, BUT
        // ...it never gets invoked, since the method was
        // overridden by derius::queryval(), which is invoked
        // instead (through regular C++ inheritance.)

This is what happens when these methods are invoked by code in Module.dll instead:


// inside Module.dll
    derius *deriusObj = ...;
        // since Module.dll has knowledge of class derius, it
        // can use a derius* to refer to this instance,
        // however even if we used a testme* the behavior would
        // be identical since all these methods are declared in
        // class testme.
    double a = deriusObj->divideby(10.0);
        // since there is no derius::divideby(), this
        // invokes testme$shadow::divideby() through
        // regular C++ inheritance, whose definition was
        // automatically provided by MMXX. The support
        // code there crosses the module boundary and
        // invokes testme$real::divideby().
    double b = deriusObj->queryval();
        // not too special - regular C++ behavior, goes to
        // derius::queryval(), no support code gets invoked.

Location independence

The resulting behavior is exactly the same as one would obtain if there was no module boundary, a single testme class, and no MMXX-generated support code. This is so true that in fact, it's very easy to move implementations around, rearranging them into modules arbitrarily, as long as the MMXX conventions and support infrastructure are used; all MMXX constructs behave correctly whether the implementations are module-local or not. All that needs to be done to move a class is to switch its module location setting! (i.e. the module-specific values of the _MMXX_SHADOW_<class_name> macros, more on this later.)

Virtual methods

Notice that we've neglected the virtual getscaling() method. Since this method is neither "non-virtual" nor "pure virtual", there is a "gotcha" here in that the interface and implementation of this method must be named differently, for the simple reason that MMXX must generate support code that gets to run when the method is invoked even if the method is implemented locally. For this reason, while the interface to the method remains double r = obj->getscaling(); the implementation must be provided as double testme::_I_getscaling() { ... } that is to say with an _I_ prefix (the declaration of _I_getscaling() within the class declaration itself is provided automatically.) Here is what happens when getscaling() is invoked in testhost:


// inside testhost:
    double c = deriusObj->getscaling();
        // the definition of testme$real::getscaling() was
        // automatically provided by MMXX. If the method
        // hadn't been overridden by derius::getscaling(),
        // MMXX would invoke the user-provided
        // testme$real::_I_getscaling() implementation,
        // but since it has, the support code crosses the
        // module boundary and invokes
        // testme$shadow::getscaling(), which was itself
        // provided by MMXX. However, the latter never gets to
        // run, since it's overridden by derius::getscaling(),
        // which itself simply invokes derius::_I_getscaling()
        // which is the correct user-provided implementation.

Here's what happens when getscaling() is invoked from Module.dll code instead:


// inside Module.dll:
    double c = deriusObj->getscaling();
        // this invokes derius::getscaling(), which itself
        // simply invokes derius::_I_getscaling() which is
        // the correct user-provided implementation.

To be complete in our exposition, let's assume hypothetically that getscaling() had not been overridden by class derius. In that case, invoking it from testhost would result in the MMXX-provided testme$real::getscaling() method calling the user-provided testme$real::_I_getscaling() method directly. Invoking it from Module.dll instead would result in the invocation of the MMXX-provided testme$shadow::getscaling() method, which itself would (by way of the MMXX-provided testme$shadow::_I_getscaling() method to be precise) cross the module boundary and result in the invocation of testme$real::_I_getscaling() as well.

The moral of this long exposition is that through various "magic tricks," MMXX filled in enough support code for us not to have to worry about whether our implementations are module-local or not. Once you've acquired enough confidence in MMXX and have gotten used to its required constructs, you can safely forget about shadow classes and support code altogether, just like you don't keep vtables and inheritance layouts in mind when you write C++ code.

Example 2. Multiple inheritance

Let us now consider a few more elaborate examples, starting with multiple inheritance across module boundaries. Consider the following derivation graph:

In module 1: class A { }; class B { };

In module 2: class C : public A, public B { };

From the application's perspective, this works in the same way as the previous example - namely that the implementation locality of the classes should be treated as largely insignificant. In this case, MMXX uses three compiler objects to make up a single application object:


              +----------+               +----------+
   Module 1:  |  A$real  |               |  B$real  |
              +----------+               +----------+
                   /|\                        /|\
                    |                          |
--------------------|--------------------------|------------
module boundary     |P/S                       |P/S
                    |                          |
                   \|/                        \|/
              +- - - - - +               +- - - - - +
              : A$shadow :               : B$shadow |
              + - - - - -+               + - - - - -+
                    |\~                      ~/|
                      \                      /
                       \                    /
   Module 2:            \Der               /Der
                         \                /
                          \              /
                           +------------+
                           |     C      |
                           +------------+

Note that we've taken to mark the arrows either P/S for peer/shadow relationships or Der for derivation relationships. A subtle implication of this is that an instance of class C is super-polymorphic from the perspective of module 1, since it's both an A and a B even though the compiler had no knowledge of any class C that derives from both A and B. For this reason, dynamic casts between A* and B* performed through MMXX_DynCast() can succeed even in some circumstances where they would fail if dynamic_cast was used.

Example 3: Inheritance Depth

Now consider a deeper inheritance hierarchy:

In module 1: class D { }; class E { }; class F : public D { };

In module 2: class G : public E, public F { };

This is constructed as follows:


             +----------+        +----------+       +----------+
   Module 1: |  D$real  | <----- |  F$real  |       |  E$real  |
             +----------+   Der  +----------+       +----------+
                  /|\                 /|\                /|\
                   |                   |                  |
-------------------|-------------------|------------------|-------
module boundary    |P/S                |P/S               |P/S
                   |                   |                  |
                  \|/                 \|/                \|/
             +- - - - - +        +- - - - - +       +- - - - - +
             : D$shadow : <----- : F$shadow :       : E$shadow :
             + - - - - -+   Der  + - - - - -+       + - - - - -+
                                       |\~              ~/|
   Module 2:                             \Der           /Der
                                          \            /
                                           +----------+
                                           |    G     |
                                           +----------+

Note that the hierarchy of shadows closely mimics the hierarchy of real classes. This particular arrangement allows us to always dispatch methods correctly, in a fashion that's compatible with a variety of inheritance layouts. In this case, there are also 3 compiler objects (a compiler instance of F$real and one of E$real in module 1, and one of G in module 2) which as always are bound into a single indivisible application object.

Example 4: More than two modules

Now consider multiple modules:

In module 1: class H { };

In module 2: class I : public H { };

In module 3: class J : public I { }; class K : public H { };

This is more complicated in that the linker will utilize different peer-shadow link arrangements depending on the particular class being instantiated. This is constructed as follows:


             +------------+
   Module 1: |   H$real   |
             +------------+
             /|\        /|\
              |          |
              |      +---|----------------------------------------
              |      :   |P/S2                        1/2 boundary
              |      :   `------.      Module 2:
              |      :          |
              |      :         \|/
              |      :    +- - - - - - +        +------------+
--------------|------:    : H$shadow@2 | <----- |   I$real   |
1/3 boundary  |      :    + - - - - - -+   Der  +------------+
              |P/S1  :         /|\                /|\
              |      :          |                  |
              |      +----------|------------------|--------------
              |                 |                  |  2/3 boundary
              |          .------'                  |
              |          |P/S3                     |P/S4
   Module 3:  |          |                         |
             \|/        \|/                       \|/
             +- - - - - - +                     +- - - - - - +
             : H$shadow@3 : <------------------ : I$shadow@3 :
             + - - - - - -+         Der         + - - - - - -+ 
                   /|\                                /|\
                    |                                  |
                    |Der                               |Der
                    |                                  |
             +------------+                     +------------+
             |      K     |                     |      J     |
             +------------+                     +------------+

When K is instantiated, the linker installs the P/S1 peer-shadow link, and everything works as in the previous examples. Notice that module 2 is not at all involved with instances of K. However, when J is instantiated, class I and its entire derivation hierarchy must be shadowed through module 2. Since the derivation hierarchy includes H, the linker installs the P/S2 link to shadow H into module 2, and the P/S3 link to shadow H's module 2 shadow into module 3. It also installs the P/S4 link to shadow I into module 3. Once again, this all happens transparently from the standpoint of the application.

Cross-module references and Ghosts

So far, we've only considered cross-module inheritance, and have not touched on mere cross-module reference. What if in the original testhost example, Module.dll wanted to access an instance of class testme, rather than an instance of the derived class derius? In this case, there are two scenarios. In the first, testme is instantiated inside Module.dll. In the second, testme is instantiated inside testhost, then a reference provided to Module.dll. The first scenario is easily described in terms of what we've already seen: the testme$shadow class is instantiated, it is bound to a testme$real instance through a peer-shadow relationship, and the support code machinery works as it did in the case where its derived class derius was instantiated instead (except that the pure virtual method testme$real::queryval() is bound to an empty stub which simply returns 0.0, and that the virtual method testme$real::getscaling() is not overridden.)

The second scenario is more general. It does not only apply to when a handle to a host-created object is passed to a module, but in general to all circumstances in which a handle to an object is passed to a module in the form of a pointer or reference to a class Z (which may be the object's true class or one of its bases,) and the module does not contribute a shadow of Z to that object's active peer-shadow relationships.

Note that in order to obtain a Z pointer or reference, the module in question already has access to Z as a visible MMXX-enabled class. In other words, there is a shadow of Z available to the compiler as a class, this shadow simply does not take part in the network of peer-shadow relationships for the object in question; more simply, there is no local compiler instance. In this scenario, MMXX creates a ghost, or an extra compiler instance of the module's Z shadow, and attaches it to the object by installing a ghost link (to be specific, the link is connected to the compiler instance of Z$real.) Once again, this happens transparently from the standpoint of the application, which always sees a single application object.

Example 5: Ghosts

This is best illustrated with an example. Suppose that in example 4, Module 3 passed a pointer to an instance of class J to module 2 in the form of a class H base class pointer, that is to say, an x_ptr<H>. In this case, module 2's class H shadow (H$shadow@2) takes part in the peer-shadow relationships graph, since instances of class J go through it to refer to H$real. Here, no ghost is created, and the x_ptr<H> simply dereferences into a pointer to the existing compiler instance of H$shadow@2 (contained, of course, within the I$real compiler instance, since I is the subclass of H that J actually derives from.)

Now suppose that module 3 were to pass a pointer to an instance of class K instead, again to module 2 and again in the form of an x_ptr<H>. This time, H$shadow@2 is not involved in the peer-shadow relationships graph, since H$shadow@3 is bound directly to H$real for instances of class K. In this case, as soon as module 2 dereferences the x_ptr<H>, MMXX will instantiate H$shadow@2 into a ghost compiler object, then link it to H$real with a ghost link. Here's an illustration of the layout after the ghost is installed:


             +------------+
   Module 1: |   H$real   |
             +------------+
             /|\        /|\
              |          |
              |      +---|----------------------------------------
              |      :   |GHOST LINK                  1/2 boundary
              |      :   `------.      Module 2:
              |      :          |
              |      :          |
              |      :    +- - - - - - +
--------------|------:    : H$shadow@2 |
1/3 boundary  |      :    + - - - - - -+
              |P/S1  :
              |      :
              |      +--------------------------------------------
              |                                       2/3 boundary
              |
              |
   Module 3:  |
             \|/
             +- - - - - - +
             : H$shadow@3 :
             + - - - - - -+
                   /|\
                    |
                    |Der
                    |
             +------------+
             |      K     |
             +------------+

Method invocations on ghosts are routed through the corresponding methods on the module/compiler instance combination that the ghost link points to, which results in the expected behavior. Note that each module that gains access to an object in this fashion will result in a ghost being created, unless an appropriate ghost is already available in which case it's returned again (indeed, dereferenced pointers can even be tested for equality as usual.) All ghosts attached to the MMXX class graph are retained until the object is deleted.

It is not unusual for more than one ghost to be created for the same object/module combination. In the second example (the one illustrating classes A, B and C), an hypothetical third module might have shadows for both A and B but not for C. If this module was separately passed an x_ptr<A> and an x_ptr<B>, both referring to the same C instance by way of different base class pointers, MMXX would create two ghosts for the third module alone. Even in the context of this third module, MMXX_DynCast() would be able to convert back and forth between pointers to the two ghosts, so the particular arrangement employed by MMXX remains transparent to application code. Note that if the third module did have a shadow for C, a single ghost of C would be created and ghost-linked to the C$real compiler instance (ghosts are placed at the most specific joining point available.)

In module testhost:	`class testme;`
In module Module.dll:	`class derius : public testme;`

In module 1:	`class A { }; class B { };`
In module 2:	`class C : public A, public B { };`

In module 1:	`class D { }; class E { }; class F : public D { };`
In module 2:	`class G : public E, public F { };`

In module 1:	`class H { };`
In module 2:	`class I : public H { };`
In module 3:	`class J : public I { }; class K : public H { };`