Templates in the Wild

A Postmortem

Jonathan Crapuchettes
Senior Infrastructure Developer

Who am I?

  • Produced economic models for US, UK, and Canada
  • Helped pushed D into usage at EMSI
  • Been developing with D since version 1

What is EMSI?

  • Software and data company
  • Specializes in labor market data and economics
  • Started in Moscow, ID around 2000
  • Recently purchased by CareerBuilder
  • ~100 employees in the US and UK
  • Major clients include education and workforce professionals

Labor Market Data






Labor market data forms the basis for understanding the connection between people, economies, and work.

EMSI Data

Jobs, earnings, sales, value added, and population for...
  • ~1,100 industries
  • ~750 occupations
  • 16 demographic groups
  • ~3,100 counties
  • ~40,000 zip codes
  • 51 states
  • Years from 2001 to 2024

And that is just for the US!

History of EMSI's Data Processing

History of EMSI's Data Processing

History of EMSI's Data Processing

Why D?

Three main goals for data framework

  • Improve developer productivity
  • Improve data quality guarantees
  • Improve processing efficiency

Why D?

  • Accessible to developers with a wide range of skills
  • Feature set (e.g., inline assembly, associative arrays)
  • Powerful templating capabilities
  • Optimized for specific use cases
  • Easy C function calling
  • Fast compile times
  • Built-in unit testing

Data Processing Structures

  • Hierarchies
  • Dimensions
  • Stores
  • Cubes

Data Processing Structures

Hierarchies
 Root
   |
   +- Child0
           |
           +- Child0.0
           +- Child0.1
           +- Child0.2
   +- Child1
           |
           +- Child1.0
                     |
                     + Child1.0.0
   +- Child2
A hierarchy is a connected, directed, acyclic graph.

Data Processing Structures

Dimensions

  • Abstract container for hierarchies
  • Have templated names that become part of the type
  • Used for filtering, grouping, and labeling

Data Processing Structures

Cubes and Stores

  • Cubes contain stores and offer common functionality
  • Stores
    • Combine dimensions to create address spaces
    • Store measures

    Data Processing Structures


    Templated Structures

    struct Hierarchy(CodeT) {}
        
    struct Dimension(string Name, NodeT) {}
    
    class DataStore(DataT, Dimensions...) if (Dimensions.length > 0
        && allSatisfy!(isDimension, Dimensions)) {}
        
    struct Cube(StorageT) {}
    struct NodeMeasures
    {
        double employment = 0;
        double earnings = 0;
    }
    alias CT = Cube!(DataStore!(NodeMeasures,
                        Dimension!("Area", BasicNode!uint),
                        Dimension!("Industry", BasicNode!uint));
    CT cube = denseCube!NodeMeasures(dimArea, dimInd);

    Data Framework

    • Library of templated structs, classes, and functions
    • ~10K LoC
    • Used for...
      • Mathematical operations
      • Dimension manipulation
      • Cube serialization
      • Data optimizations
      • Testing
      • And other functions...

    1: Data Store Basics

    class DenseDataStore(DataT, Dimensions...)
    {
        //...
        alias Address = staticMap!(GetType, Dimensions);
        //...
        immutable Address rootAddress;
        //...
        this(Dims...)(Dims dims) {  //Keep mutability of dimensions
            foreach (I, dim; dims)
                rootAddress[I] = dim.root.id;
        }
        //...
        DataT opIndex(const Address addr);
        //...
    }
    template GetType(DimT) if (isDimension!DimT)
    {
        alias GetType = DimT.CodeT;
    }
    auto cube = denseCube!double(dimArea, dimInd);
    double node = cube["Latah", "Industrial Construction"]; //error
    double node = cube[16057, 236220];

    2: Dense Cube Addition

    Left Cube
    Right Cube
    New Cube
    =
    +

    2: Dense Cube Addition

    auto denseAdd(LeftCube, RightCube)(const LeftCube lCube, const RightCube rCube)
        if (isCube!LeftCube && isCube!RightCube)
    {
        static assert(is(LeftCube.Dimensions == RightCube.Dimensions),
                    "The two cubes must have the same dimensions.");
        //...
        
        alias T = GetCubeScalarDataType!LeftCube;
        //...
        
        auto newCube = denseCube!(LeftCube.MeasureT)(rCube.dimensions);
        
        (cast(T[])newCube.data)[] = (cast(T[])lCube.data)[] + (cast(T[])rCube.data)[];
        
        //...
        
        return newCube;
    }

    3: Getting Measures

    template GetMeasures(DataT)
    {
        alias GetMeasures = Filter!(IsMeasure, MemberNames!DataT);
    }
    template IsMeasure(string Name)
    {
        enum LowerName = Name.toLower();
        static if (LowerName.endsWith("_min") ||
                    LowerName.endsWith("_max") ||
                    LowerName.endsWith("_disc") ||
                    LowerName.endsWith("_conf"))
            enum IsMeasure = false;
        else
            enum IsMeasure = true;
    }
    DataT ret;
    foreach (Measure; GetMeasures!DataT)
        mixin(`ret.` ~ Measure ~ ` = 0;`);
    

    4: N-Dimensional Tree Traversal

    struct ValidSparseNode
    {
        Address addr;           //TypeTuple of hierarchy codes
        DataT data;             //User defined data type
    
        RepeatType!(ValidSparseNode*, Dims.length) parent;
        RepeatType!(ValidSparseNode*, Dims.length) child;
        RepeatType!(ValidSparseNode*, Dims.length) sibling;
    }
    template RepeatType(Type, size_t Times)
    {
        static if (Times == 0)
            alias RepeatType = TypeTuple!();
        else
            alias RepeatType = TypeTuple!(Type, RepeatType!(Type, Times - 1));
    }
    void preOrderTraversal(size_t Dim, F)
        (ValidSparseNode* node, F callback)
    {
        static if (Dim < Dimensions.length - 1)
            preOrderTraversal!(Dim + 1, F)(node, callback);
        else
            callback(node);
    
        ValidSparseNode* child = node.child[Dim];
        while (child !is null) {
           preOrderTraversal!(Dim, F)(child, callback);
           child = child.sibling[Dim];
        }
    }

    4: N-Dimensional Tree Traversal

    5: Wrap Mutability

    template WrapMutability(RefT, WrapT)
    {
        static if (is(RefT == const))
            alias WrapMutability = const(WrapT);
        else static if (is(RefT == immutable))
            alias WrapMutability = immutable(WrapT);
        else
            alias WrapMutability = WrapT;
    }
    struct Cube(StorageT)
    {
        //...
        auto parentOf(string DimName, this This)(const Address address) inout
        {
            alias RetT = WrapMutability!(This, DataT);
            enum DimIndex = GetDimensionIndex!(Dimensions, DimName);
            auto addrNode = address[DimIndex] in dimensions[DimIndex];
            
            Address parentAddr = address[0..$];
            parentAddr[DimIndex] = addrNode.parent.id;
            return cast(RetT)this[parentAddr];
        }
    }

    Issues with D

    • Const/Immutable Ranges
    • Required templating of constructors to get mutability
    • Backtraces
    • Can't get member names from Tuples
    • Explaining ExpressionTuples vs. TypeTuples vs. Tuples
    • GC
    • No enhanced getopt

    Issues with D

    Demangle This!
    _D4aias4cube7storage12valid_sparse261__T20ValidSparseDataStoreTdTS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ20ValidSparseDataStore246__T6__ctorTxS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTxS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ6__ctorMFKxS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionKxS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZC4aias4cube7storage12valid_sparse261__T20ValidSparseDataStoreTdTS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ20ValidSparseDataStore

    Conclusion

    • Overall the library and its usage of D has been a success
    • Significantly improved...
      • Error checking
      • Processing speed
      • Development speed




    Questions?

    dconf talk

    By Jonathan Crapuchettes

    dconf talk

    • 1,466