Templates in the Wild
A Postmortem
Jonathan Crapuchettes
Senior Infrastructure Developer
Who am I?
- Produced economic models for US, UK, and Canada
- Helped pushed D into usage at EMSI
- Been developing with D since version 1
What is EMSI?
- Software and data company
- Specializes in labor market data and economics
- Started in Moscow, ID around 2000
- Recently purchased by CareerBuilder
- ~100 employees in the US and UK
- Major clients include education and workforce professionals
Labor Market Data
Labor market data forms the basis for understanding the connection between people, economies, and work.
EMSI Data
Jobs, earnings, sales, value added, and population for...
- ~1,100 industries
- ~750 occupations
- 16 demographic groups
- ~3,100 counties
- ~40,000 zip codes
- 51 states
- Years from 2001 to 2024
And that is just for the US!
History of EMSI's Data Processing
History of EMSI's Data Processing
History of EMSI's Data Processing
Why D?
Three main goals for data framework
- Improve developer productivity
- Improve data quality guarantees
- Improve processing efficiency
Why D?
- Accessible to developers with a wide range of skills
- Feature set (e.g., inline assembly, associative arrays)
- Powerful templating capabilities
- Optimized for specific use cases
- Easy C function calling
- Fast compile times
- Built-in unit testing
Data Processing Structures
- Hierarchies
- Dimensions
- Stores
- Cubes
Data Processing Structures
Hierarchies
Root
|
+- Child0
|
+- Child0.0
+- Child0.1
+- Child0.2
+- Child1
|
+- Child1.0
|
+ Child1.0.0
+- Child2
A hierarchy is a connected, directed, acyclic graph.
Data Processing Structures
Dimensions
- Abstract container for hierarchies
- Have templated names that become part of the type
- Used for filtering, grouping, and labeling
Data Processing Structures
Cubes and Stores
- Cubes contain stores and offer common functionality
- Stores
- Combine dimensions to create address spaces
- Store measures
Data Processing Structures
Templated Structures
struct Hierarchy(CodeT) {}
struct Dimension(string Name, NodeT) {}
class DataStore(DataT, Dimensions...) if (Dimensions.length > 0
&& allSatisfy!(isDimension, Dimensions)) {}
struct Cube(StorageT) {}
struct NodeMeasures
{
double employment = 0;
double earnings = 0;
}
alias CT = Cube!(DataStore!(NodeMeasures,
Dimension!("Area", BasicNode!uint),
Dimension!("Industry", BasicNode!uint));
CT cube = denseCube!NodeMeasures(dimArea, dimInd);
Data Framework
- Library of templated structs, classes, and functions
- ~10K LoC
- Used for...
- Mathematical operations
- Dimension manipulation
- Cube serialization
- Data optimizations
- Testing
- And other functions...
1: Data Store Basics
class DenseDataStore(DataT, Dimensions...)
{
//...
alias Address = staticMap!(GetType, Dimensions);
//...
immutable Address rootAddress;
//...
this(Dims...)(Dims dims) { //Keep mutability of dimensions
foreach (I, dim; dims)
rootAddress[I] = dim.root.id;
}
//...
DataT opIndex(const Address addr);
//...
}
template GetType(DimT) if (isDimension!DimT)
{
alias GetType = DimT.CodeT;
}
auto cube = denseCube!double(dimArea, dimInd);
double node = cube["Latah", "Industrial Construction"]; //error
double node = cube[16057, 236220];
2: Dense Cube Addition
Left Cube
Right Cube
New Cube
=
+
2: Dense Cube Addition
auto denseAdd(LeftCube, RightCube)(const LeftCube lCube, const RightCube rCube)
if (isCube!LeftCube && isCube!RightCube)
{
static assert(is(LeftCube.Dimensions == RightCube.Dimensions),
"The two cubes must have the same dimensions.");
//...
alias T = GetCubeScalarDataType!LeftCube;
//...
auto newCube = denseCube!(LeftCube.MeasureT)(rCube.dimensions);
(cast(T[])newCube.data)[] = (cast(T[])lCube.data)[] + (cast(T[])rCube.data)[];
//...
return newCube;
}
3: Getting Measures
template GetMeasures(DataT)
{
alias GetMeasures = Filter!(IsMeasure, MemberNames!DataT);
}
template IsMeasure(string Name)
{
enum LowerName = Name.toLower();
static if (LowerName.endsWith("_min") ||
LowerName.endsWith("_max") ||
LowerName.endsWith("_disc") ||
LowerName.endsWith("_conf"))
enum IsMeasure = false;
else
enum IsMeasure = true;
}
DataT ret;
foreach (Measure; GetMeasures!DataT)
mixin(`ret.` ~ Measure ~ ` = 0;`);
4: N-Dimensional Tree Traversal
struct ValidSparseNode
{
Address addr; //TypeTuple of hierarchy codes
DataT data; //User defined data type
RepeatType!(ValidSparseNode*, Dims.length) parent;
RepeatType!(ValidSparseNode*, Dims.length) child;
RepeatType!(ValidSparseNode*, Dims.length) sibling;
}
template RepeatType(Type, size_t Times)
{
static if (Times == 0)
alias RepeatType = TypeTuple!();
else
alias RepeatType = TypeTuple!(Type, RepeatType!(Type, Times - 1));
}
void preOrderTraversal(size_t Dim, F)
(ValidSparseNode* node, F callback)
{
static if (Dim < Dimensions.length - 1)
preOrderTraversal!(Dim + 1, F)(node, callback);
else
callback(node);
ValidSparseNode* child = node.child[Dim];
while (child !is null) {
preOrderTraversal!(Dim, F)(child, callback);
child = child.sibling[Dim];
}
}
4: N-Dimensional Tree Traversal
5: Wrap Mutability
template WrapMutability(RefT, WrapT)
{
static if (is(RefT == const))
alias WrapMutability = const(WrapT);
else static if (is(RefT == immutable))
alias WrapMutability = immutable(WrapT);
else
alias WrapMutability = WrapT;
}
struct Cube(StorageT)
{
//...
auto parentOf(string DimName, this This)(const Address address) inout
{
alias RetT = WrapMutability!(This, DataT);
enum DimIndex = GetDimensionIndex!(Dimensions, DimName);
auto addrNode = address[DimIndex] in dimensions[DimIndex];
Address parentAddr = address[0..$];
parentAddr[DimIndex] = addrNode.parent.id;
return cast(RetT)this[parentAddr];
}
}
Issues with D
- Const/Immutable Ranges
- Required templating of constructors to get mutability
- Backtraces
- Can't get member names from Tuples
- Explaining ExpressionTuples vs. TypeTuples vs. Tuples
- GC
- No enhanced getopt
Issues with D
Demangle This!
_D4aias4cube7storage12valid_sparse261__T20ValidSparseDataStoreTdTS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ20ValidSparseDataStore246__T6__ctorTxS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTxS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ6__ctorMFKxS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionKxS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZC4aias4cube7storage12valid_sparse261__T20ValidSparseDataStoreTdTS4aias4cube9dimension63__T9DimensionVAyaa4_46495053TS4aias4cube7testing4fips8FipsCodeZ9DimensionTS4aias4cube9dimension101__T9DimensionVAyaa13_436c6173734f66576f726b6572TS4aias4cube7storage12valid_sparse17ClassOfWorkerNodeZ9DimensionZ20ValidSparseDataStore
Conclusion
- Overall the library and its usage of D has been a success
- Significantly improved...
- Error checking
- Processing speed
- Development speed
Questions?