Daniela Engert - CppCon 2021

Modules demystified and applied

A (short) Tour of C++ Modules

     About me    

  • Electrical engineer
  • Build computers and create software for 40 years
  • Develop hardware and software in the field of applied digital signal processing for 30 years
  • Member of the C++ committee (learning novice)

     
  • employed by 

Overview

  • Modules Foundations
    • C++20 Modules, a short recap
    • Module unit types and Module composition
    • Visibility of Identifiers vs. Reachability of Declarations
    • Relationships, linkage and linker symbols
  • Modules in practice
    • Moving towards modules (by example)
    • Imports are different!
    • Is it worth it? (a case study)
    • The state of the ecosystem

C++20 Modules

a short recap

source.cpp

some header.h

library.h

translation unit

object file

library.h

library interface

library.cpp

library.h

library implementation

library object file

 other header.h

program

 other header.h

 other header.h

Barrier→║

traditional library

library.cpp

source.cpp

library.h

other header.h

declarations

macros

compiler options

predefined,

commandline

defaults,

commandline

none

translation unit

files

object file

discarded

discarded

some header.h

source.cpp

import library;

translation unit

object file

export module library;

library interface unit

module library;

library implementation

library implementation object file

program

library interface object file

export ...;

export ...;

BMI

BMI

Barrier→║

Architected→→

modularized library

some

all

interface.cpp

declarations

macros

compiler options

predefined,

commandline

defaults,

commandline

none

module interface unit

files

object file

BMI file

discarded

module;

#include <vector>

export module my.first_module;

export import other.module;
#include "mod.h"
constexpr int beast = 666;

export std::vector<int> frob(S);
module;

#include <vector>

module my.first_module;

std::vector<int> frob(S s) {
  return {s.value + beast};
}
// mod.h
#pragma once;

struct S {
  int value = 1;
}

module purview

global module fragment

module declaration without a name

global module

default name 'domain'

(primary) module interface unit

The name of this module can be referred to only in

  • module declaration
  • import declaration

module implementation unit

  • no declarations
  • only preprocessor directives

exported

"exportedness" applies to names

  • not a namespace
  • separate name 'domain'

Module TU Types & Features

Interface unit
Implementation unit
Interface partition
Implementation partition
Private module fragment
Header unit

Defines interface

contributes to  interface

implicitly imports  interface

part of module purview

part of global module

exports MACROs

creates BMI

contributes to BMI

fully isolated

✅ unconditionally      ❎ if a GMF exists in the TU / if TU's BMI is imported into the primary module interface

Module COMposition

This zoo of module TU types allow for various module structures:

  • simple module: primary module interface unit + 1 ... n module implementation units
module; // GMF

#include <vector>

export module SimpleModule;

// non-exported declarations
struct detail {
  int answer = 42;
};

export
namespace SimpleModule {

void f(const detail &);
std::vector<detail> g();

}
module SimpleModule;

namespace SimpleModule {

void f(const detail & D) {
// do something with D
}

}
module;

#include <vector>

module SimpleModule;

namespace SimpleModule {

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}

formerly the interface header             formerly implementation sources for f() and g()

Module COMposition

  • large module: primary module interface unit + 1 ... n module partitions
export module LargeModule;

export import : iface.f;
export import : iface.g;
module LargeModule : impl.f;
import : iface.f;

namespace LargeModule {

void f(const detail & D) {
// do something with D
}

}
module;
#include <vector>

module LargeModule : impl.g;
import : iface.g;

namespace LargeModule {

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}
export
module LargeModule : iface.f;
import : detail;

namespace LargeModule {

export
void f(const detail & D);

}
module;
#include <vector>

export
module LargeModule : iface.g;
import : detail;

namespace LargeModule {

export
std::vector<detail> g();

}
module LargeModule : detail;

// non-exported declarations
struct detail {
  int answer = 42;
};

Module COMposition

  • large module: hierarchy of primary module interface unit + 1 ... n related modules
export module HierModule;

export import HierModule.f;
export import HierModule.g;
module HierModule.f;
import HierModule.detail;

namespace HierModule {

void f(const detail & D) {
// do something with D
}

}
module;
#include <vector>

module HierModule.g;
import HierModule.detail;

namespace HierModule {

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}
export
module HierModule.f;
import HierModule.detail;

namespace HierModule {

export
void f(const detail & D);

}
module;
#include <vector>

export
module HierModule.g;
import HierModule.detail;

namespace HierModule {

export
std::vector<detail> g();

}
export module HierModule.detail;

struct detail {
  int answer = 42;
};

Module COMposition

  • small module: only primary module interface unit
module;

#include <vector>

export module SmallModule;

struct detail {
  int answer = 42;
};

export
namespace SmallModule {

void f(const detail & D) {
  // do something with D;
}

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}

Module COMposition

  • single file  module: only primary module interface unit with private partition
module;
#include <vector>

export module SingleFileModule;

struct detail {   // not exported but reachable
  int answer = 42;
};

export namespace SingleFileModule {
void f(const detail & D);
std::vector<detail> g();
}

module : private; // neither exported nor reachable!

namespace SingleFileModule {
void f(const detail & D) {
  // do something with D;
}

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}
}

Module COMposition

  • multiple independent header units with common imported detail header

    all three headers are compiled as header units
// header 'detail.h'

#pragma once;

struct detail {
  int answer = 42;
};
// header 'header_f.h'

#pragma once;

#import "detail.h"

namespace IndependentHeader {

void f(const detail & D) {
// do something with D
}

}
// header 'header_g.h'

#pragma once;

#include <vector>
#import "detail.h"

namespace IndependentHeader {

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}

Module COMposition

  • single precomposed header unit
// header 'detail.h'

#pragma once;

struct detail {
  int answer = 42;
};
// header 'header_f.h'

#pragma once;

#include "detail.h"

namespace COmposedHeader {

void f(const detail & D) {
// do something with D
}

}
// header 'header_g.h'

#pragma once;

#include <vector>
#include "detail.h"

namespace ComposedHeader {

std::vector<detail> g() {
  return {{ 42 }, { 43 }};
}

}
// header 'composed.h'

#pragma once;

#include "header_f.h"
#include "header_g.h"

Visibility

hide and seek

Visibility of names

  • as soon as a named entity is declared within a given scope, it may become subject to name lookup
  • name lookup can only find names that are visible
  • the visibility of a particular named entity is not a general property but the result of
    • the point and scope of its first declaration
    • the point and scope from where it is looked-up
    • the lookup rules
      • unqualified lookup
      • qualified lookup
      • argument dependent lookup
// translation unit 1

int i;	   // point of declaration (POD), introduces entity 'i'

int j = i; // POD, introduces entity 'j'
           // point of look-up (POL), names visible entity 'i'
           
int k = l; // POD, introduces entity 'k'
           // POL, names invisible entity 'l'
           // entity 'l' is not yet declared
           // (relative invisibility)

int l;	   // point of declaration (POD), introduces entity 'l'

int m = n; // POD, introduces entity 'm'
           // POL, names invisible entity 'n'
           // entity 'n' is declared in a different TU
           // (total invisibility)
// translation unit 2

int n;	   // POD, introduces entity 'n'

Starting simple

global scope

Lookup of entities at global scope

 

relative to their point of declaration

template <typename T>
int foo(T t) {
  return t.value_;  // POL ?, names not yet visible entity 'value'
}                   // at class scope of dependent, visible entity 't'

struct S {
  int value_ = 1;   // POD, introduces entity 'value'
};                  // at class scope of 'S'

int x = S{}.value_; // POL, names visible entity 'value' at
                    // class scope of visible entity 'S
                    
x = foo(S{});       // POL !, names visible entity 'value' at
                    // class scope of visible entity 'S'

Less obvious

struct S {
  int foo() {
    return value;   // POL ?, names not yet declared entity 'value'
  }                 // at class scope of visible entity 'S'

  int value = 1;    // POD, introduces entity 'value'
                    // at class scope of 'S'

};                  // POL !, names visible entity 'value'
                    // at class scope of visible entity 'S'

class scope

Lookup of entities at class scope

from outside the class

Lookup of entities at class scope

from within the class

namespace N {
int n = 1;            // POD, introduces entity 'N::n'

class S {
  int j_ = 0;
  friend int f(S s) { // POD, introduces entity 'N::f', declared as friend from within
    return s.j_;      // class scope 'N::S', → so-called "hidden friend"
  }
};

namespace M {
void n();             // POD, introduces entity 'N::M::n'

int x = n;            // POL, names entity 'n', unqualified lookup (UL)
                      // FAIL: entity 'N::M::n' hides entity 'N::n' from UL
auto y = &f;          // POL, names entity 'f', UL
                      // FAIL: entity 'N::f' is invisible to UL
auto z = &S::f;       // POL, names entity 'f' in scope 'S', class member lookup (CML)
                      // FAIL: entity 'N::f' is not member of class S
auto z = &N::f;       // POL, names entity 'f' in scope 'N', qualified lookup (QL)
                      // FAIL: entity 'N::f' was not declared in scope 'N' but scope 'S'

int a = N::n;         // POL, names entity 'n' in scope 'N', QL
int b = f(S{});       // POL, names entity 'n' using argument of type S, ADL
} // namespace M
} // namespace N

Lookup Algorithms

visibility, it depends on how you look

Entities may become hidden (i.e. invisible to lookup) by

  • names introduced in scopes nearer to the point of lookup
  • hidden friends

but become visible again by selecting the appropriate lookup algorithm

auto foo(int x) {
  struct S {     // POD, introduces entity 'S' in function block scope
    int s_;
  };
  return S{x};   // POL, names entity 'S', part of the function's signature
}

auto what = foo(1);
assert(what.s_ == 1);

static_assert(std::is_same_v<decltype(what), ❓::S>); // POL, names invisible entity 'S'

Even weirder

near total invisibility

Even though name 'S' is visible at function block scope and it is the function's return type, it is totally invisible from outside the function.

 

This is the best you can achive in terms of name hiding in a TU. Alas, it denies foo forward-declarability from another TU despite having external linkage.

Selective Visibility

  • without modules, total invisibility of entities declared within a TU is impossible
  • moving declarations from headers into modules makes them totally invisible from the outside
  • exporting names from a module and importing them controls the extent to which names become visible in the translation unit importing the module's interface.
// client TU
import M;          // introduces entity 'foo' by BMI, exported from module M

auto y = foo(1);   // POL, names entity 'foo'
                   // the name of result type of 'foo' is totally invisible!
// module interface TU
export module M;

struct S {         // POD, introduces entity 'S', not exported
  int s_ = 1;
};

export S foo(int); // POD, introduces entity 'foo' and exports name 'foo'
                   // POL, names visible entity 'S'
// module implementation TU
module M;

S foo(int x) {     // POL, names visible entities 'S' and 'foo'
  return S{x};
}

ReachAbility

of declarations

export module mod;  // may become "necessarily reachable" if the interface of 'mod' is imported
import stuff;       // not exported, implementation detail, not part of module interface
                    // creates "interface dependency" to 'stuff' which is "necessarily reachable"

struct S {          // not exported, not meant to be used elsewhere outside
  S(const char * msg) : sth_{ msg } {}
  auto what() const { return sth_.message(); }
  something sth_;   // there must be 'something' exported from module 'stuff'
};

export              // exports name 'foo'
S foo(const char * msg) {
  return { msg };
}

An Example

#include <type_traits>
import mod;                          // creates "interface dependency" to 'mod' and 'stuff'
                                     // makes 'mod' "necessarily reachable"

int main(int, char * argv[]) {
  const auto result = foo(argv[0]);  // so far, so good
  
  const auto huh = result.what();    // why is entity 'what' nameable? 🤔
  using mystery  = decltype(huh);    // it was never exported from 'mod'
                                     // -> it's a reachable semantic property of 'S'!
  
  static_assert(std::is_class_v<mystery>);                    // compiles
  static_assert(sizeof(mystery::value_type) == sizeof(char)); // compiles
  return huh.empty();                                         // compiles
}
module;

#include <string>

export module stuff;

namespace detail {       // not exported
struct base {            // not exported
  base(const char * msg) : msg_{ msg } {}
  std::string message() const { return msg_; }

  const char * msg_;
};
} // namespace detail

export
struct something : detail::base {
  something(const char * msg) : detail::base{ msg } {}
};

// stuff that's not used in this particular example

namespace detail {       // not exported
struct expendable {      // totally unused and never was
  void get_rid_of_me() { /* TBD */ }
};
} // namespace detail

export
struct other_stuff : detail::base, detail::expendable {
  bool doit() const noexcept;  // implemented elsewhere
};

An Example

An Example

There is a dependency chain between TUs:

  • client TU ⇒ module 'mod'
  • module 'mod' ⇒ module 'stuff' (includes <string>)

And there's a dependency chain between multiple declarations / definitions:

  • 'main' names function 'foo' exported from module 'mod'
  • 'foo' @ 'mod' returns a 'MyStuff' @ 'mod' object that's not exported
  • 'MyStuff' @ 'mod'
    - contains an instance of 'something' from module 'stuff'
    - has a member function 'what' that returns the return value of member function 'message' in that instance
  • 'something' @ 'stuff' is derived from class 'base' @ 'stuff' that's not exported
  • 'base' @ 'stuff' has a member function 'message' that returns an instance of 'std::string'
  • 'std::string' is declared in the included header <string>

Reachability of Declarations

All declarations along such dependency chains and their semantic properties are required to give meaning to the source code.

Therefore it is necessary that the compiler can reach all these declarations (and possibly their definitions) regardless of the source / header file where they textually exist.

 

These declarations are said to be "necessarily reachable".

 

This also implies that the BMIs of imported module interfaces, imported module partitions, and imported header modules containing these declarations most be (recursively) available.

Other declarations and semantic properties contained in these BMIs that are not part of dependency chains established during the compilation of the current TU might be reachable, too.

The reachability of non-necessary declarations is implementation-defined!

Semantic Properties

Depending on the entity that is declared, a declaration brings a set of semantic properties with it:

  • some are mandatory
  • some are optional
  • some are defaulted

 

If a declaration is also a definition, the set of semantic properties is augmented by e.g. a function body or a class body that may contain declarations themselves.

Semantic Properties

// header file "some.h"

extern "C++" {                    // default
  using func = int(int)           // mandatory
                 noexcept(false); // default
  struct D                        // mandatory
  {                               // optional or mandatory
    operator int() const { return v_; }
    int v_ = 1;
  };
  
  extern "C" {                    // optional
    namespace N {                 // mandatory
      func bar;                   // mandatory
    }
    
    [[nodiscard]]                 // optional
    extern                        // default
    inline                        // optional or mandatory
    int N::bar(int                // mandatory
                 x = D{})         // optional or mandatory
    {                             // optional or mandatory
      return x + D{2};
    }
  }
}

Semantic Properties

extern "C++" {
}
}
}
using func =
extern "C" {
[[nodiscard]]
int(int)
noexcept(false)
struct D
}
;
{
operator int() { return v_; }
int v_ = 1;
namespace N {
func bar;
}
extern
int N::bar(int x
inline
= D()
return x + D{2};
)
{

// the function's mangled name is 'bar', not 'N::bar(int) !

Linkage

about relationships between TUs

Linkage

linkage determines the relationship between named entities within scopes of

  • a single translation unit
  • multiple translation units

non-modular C++ knows three kinds of linkage

  • no linkage: entities at function block scope are not related to any other entities with same name. They live in solitude within this scope
  • internal: entities at namespace or class scope that are not related to any entities with the same name in other TUs. There may be multiple of them in the program
  • external: entities that are related to entities with the same name in all other TUs. They are the same thing and there is only one incarnation  in the final program

Modules add a fourth kind of linkage

  • module linkage: effectively the same as external linkage, but confined to TUs of the same module

name isolation

export module mod1;

int foo(); // module linkage

export namespace A { // external

int bar() { // external linkage
  return foo();
}

} // namespace A
import mod1;
import mod2;

using namespace A;

int main(){
  return bar() + baz();
}

no clash

export module mod2;

int foo(); // module linkage

namespace A { // external linkage

export int baz() { // external link.
  return foo();
}

} // namespace A

same namespace ::A

name '::foo' is attached to module 'mod1', i.e. '::foo@mod1',

exported name '::A::bar' is also attached to the module

name '::foo' is attached to module 'mod2', i.e. '::foo@mod2'',

exported name '::A::baz' is also attached to the module

namespace name '::A' is attached to the global module, as it is oblivious of module boundaries

Linker Symbols

export module mod3;

int foo();         // module linkage, attached to module 'mod3'

namespace A {      // external linkage, attached to global module

export int bar() { // external linkage, attached to module 'mod3'
  return foo();
}

} // namespace A

msvc 16.11 name mangling:

  ?foo@mod3@@YAHXZ::<!mod3> (🤔)
  ?bar@A@@YAHXZ::<!mod3>

clang13 & gcc(modules) name mangling:

  _ZW4mod3E3foov
      _ZN1A3barEv

The module name will be encoded into the linker symbol if an entity has module linkage, and may be encoded if it is exported and therefore has external linkage

Ownership

export module mod3;

namespace A {

export int bar() { // external linkage, attached to module 'mod3'
  return foo();
}

} // namespace A

Strong ownership model, the linker symbols of exported names contain the module name they are attached to

    e.g. msvc

   ?bar@A@@YAHXZ::<!mod3>

 

Benefit:

Identical names can be exported from multiple modules and used in separate TUs without causing linker errors.

Weak ownership model, the linker symbols of exported names are oblivious of module attachment

  e.g. clang & gcc

   _ZN1A3barEv

 

Benefit:

Exported names can be moved freely between modules or from headers into modules.

But Platforms...

export module awesome.v1;

// other stuff, not exported,
// must go here because reasons

export namespace libawesome {
  // implemented elsewhere
  int deep_thought(...);
} // namespace libawesome
// Poor customer's application
#include "oem1/interface.h"
#include "oem2/interface.h"

int main(){
  return OEM1::doit() +
         OEM2::makeit();
}

compatible

export module awesome.v2;

// other stuff, not exported,
// must go here because reasons

export namespace libawesome {
  // for compatibility
  int deep_thought(...);
  // implemented elsewhere
  int much_deeper_thought(...);
} // namespace libawesome
// OEM 1, traditional header
// implementation calls 'deep_thought'

namespace OEM1 {
  int doit();
} // namespace OEM1
// OEM 2, traditional header
// impl. calls 'much_deeper_thought()'

namespace OEM2 {
  int makeit();
} // namespace OEM2

used in implementations of OEM1 and OEM2, not exposed

perfectly tested™

perfectly tested,

distributed as static libraries & header

Compiles on platform A 😁
Links on platform A 😊

Profit! 🤑

Compiles on platform B 😁
Linker error on platform B 😱

but why? 🤔(weak ownership)

Detaching Names

export module mod4;

int foo();               // module linkage, attached to module 'mod4'
extern "C" int bar();    // external linkage, attached to global module

extern "C++" int baz() { // external linkage, attached to global module
  return foo() + bar();
}

msvc 16.11 name mangling:

  ?foo@A@mod4@@YAHXZ::<!mod4>(🤔)
  _bar
  ?baz@@YAHXZ

gcc(modules) name mangling:

  _ZW4mod4E3foov
            bar
  _ZW4mod4E3bazv  (wat? 🤔)

Giving explicit language linkage specifications reattaches the names to the global module and mangles them accordingly into linker symbols

Recap

Using an interface in compiled form rather than by textual inclusion

  • isolates its meaning from the compiler state at the point of use
  • doesn't taint TUs unwittingly
  • reduces the chance of ODR violations

 

Exporting names selectively makes visibility an architectural decision rather than being technically unavoidable

 

The representation of declarations in an BMI and making them reachable guarantees identical interface semantics everywhere.

 

Augmenting linker symbols with their provenance make TUs less promiscuous.

Transitioning to Modules

The road forward

Transitioning to Modules

Options available and advice on how to proceed into the modules world:

  • If the interface of a library is (mostly) separate from the implementation
    • consider a named module by turning the interface headers into a module interface unit with optionally some interface partitions (see slide 9.1)
    • consider refactoring the interface to make this happen
    • think about macros in the interface and how to get rid of them
  • If macros are indispensable
    • consider a named module like above plus a header file which imports the module and augments it with the macros
  • Otherwise, consider using the existing headers as header units (discouraged)
     
  • If a library must still be usable as a non-modular, traditional library
    consider a dual-mode library which can - by default - be #included as before, or alternatively be provided through a module interface unit.

Dual-mode Library

Case study: The {fmt} library

For the most part, the code is located in 12 headers defining the API

  • core.h
  • format.h
  • compile.h
  • printf.h
    ...

Plus 2 source files that can be precompiled into a static or shared library

  • format.cc ( + format-inl.h )
  • os.cc

The {fmt} library

Question: how can this traditional library become a full-fledged module of the same name, i.e. become a dual-mode library?

 

Requirements:

  • a lot of macros are used in the implementation to support as many platforms, compilers and language standards as possible. This must still work.
  • there are even two macros as API features in the interface.
  • the unrestricted usability as a traditional library as before must be maintained.
  • all implementation details must be hidden when the "Named Module" option is selected in the configuration.

The {fmt} library

Answer: this set of requirements is unattainable!

 

Neither a header module nor a named module has all necessary properties:

  • header modules can't hide the implementation details
  • named modules can't export macros

C++20 to the rescue: screw macros and support only the modern alternatives (i.e. UDLs) provided by the latest version of {fmt} !

The {fmt} library

Question: which implementation strategy?   (refer to slides 9.x)

 

There is a lot of coupling between most headers because of

  • the vast amount of macros used internally
  • the liberal use of implementation details from other headers

 

And this applies to the compilable sources just as well.

 

This is not bad per se if the library is seen as a whole and therefore it is not untypical in traditional libraries. If a clean, layered module structure is the primary goal, untangling that 'mess' becomes necessary. 

The {fmt} library

Answer: restructuring a dual-mode library is probably not worth the effort as long as the details are clearly separatable from the API!

 

Strategy:

  • Wrap the existing headers and source files into a single-file module
  • Mark the exported API with some opt-in syntax
  • Make everything in the source files strictly invisible and unreachable


In other words:

  • Apply preprocessor gymnastics to separate the API from details
  • Move the contents of the source files into the private module fragment

The {fmt} ModulE Interface Unit

module;               // start of the 'global module fragment' (GMF)

// put *all* non-library headers (STL, platform headers, etc.) here
// to prevent further inclusion into the module purview!
#include <algorithm>
#include <sys/stat.h>
...
                      // end of external code attached to the 'global module'
export module fmt;    // start of the 'module purview'

#define macros to differentiate between interface and details

// put *all* library headers here to become
//  * the exported interface (API)
//  * the non-exported, but reachable implementation details
#include "format.h"
#include "chrono.h"
...
                      // end of declarations affecting other TUs
module : private;     // start of the 'private module fragment' (PMF)

// put *all* library sources here to become part of the compiled object file
// all required macros are available, yay!
#include "format.cc"
#include "os.cc"

The {fmt} Module Interface Unit

export module fmt;          // part of the module purview that affects other TUs

// macro definitions to differentiate between exported
// and non-exported sections in the code

#define FMT_MODULE_EXPORT export         // these expand to nothing
#define FMT_MODULE_EXPORT_BEGIN export { // if used outside of module interface
#define FMT_MODULE_EXPORT_END }          // in traditional library

#define FMT_BEGIN_DETAIL_NAMESPACE \     // expands to 'namespace detail {'
  }                                \     // outside of module interface
  namespace detail {
#define FMT_END_DETAIL_NAMESPACE \       // expands to '}'
  }                              \       // outside of module interface
  export {

#include "fmt/args.h"                    // the *full* API
#include "fmt/chrono.h"
#include "fmt/color.h"
#include "fmt/compile.h"
#include "fmt/os.h"
#include "fmt/locale.h"
#include "fmt/printf.h"
#include "fmt/xchar.h"
#include "fmt/ostream.h"
#include "fmt/ranges.h"

The {fmt} library

This single module interface TU compiles into two artefacts:

  • the compiled interface (a.k.a. BMI)
  • the compiled implementation (a single object file)

 

This is basically a unity build of the whole library that provides the full API.

The object file may then optionally be wrapped into a static  or shared library.

The {fmt} library

Benefits of a dual-mode library:

  • usable as both a traditional library and a named module from identical sources
  • has the same properties as a named library, i.e.
    • total isolation from other sources changing the compile environment
    • (hopefully) cleaner interface free of implementation details being visible
  • enables gradual transitioning into the modules world depending on the maturity of compilers
  • doesn't require changes to existing tests
  • doesn't require re-architecting the inner dependencies

Lessions learned
The potential pitfalls

On the journey to making {fmt} a full-fledged named module, I've encountered a couple of stumbling blocks that needed to be adressed.

The properties of module interfaces require special care when implementing them. Stuff that never had to be taken into consideration with headers becomes really important now!


Most of them are due to the separation of visibility of names when

  • compiling the interface TU (unrestricted visibility)
  • compiling TUs that import the module (restricted visibility)

 

This applies to both named modules and header units!

The potential pitfalls

Instantiations of templates in user code perform name lookup of dependent entities from outside of the module. Non-exported names are invisible now and may cause compile failures.

export module M;

namespace detail {

template <typename T>
void baz(T) {}

template <typename T>
void bar(T t) {
  baz(t);                     // ok while compiling the module interface
}                             // fails to find 'baz' when 'bar' is implicitly instantiated

} // namespace detail

export template <typename T>
void foo(T t) {
  detail::bar(t);             // ok, fully qualified name lookup is done at module compilation time
}

The potential pitfalls

Two potential solutions:

export module M;

namespace detail {
template <typename T> void baz(T) {}

template <typename T>
void bar(T t) {
  detail::baz(t);       // do fully-qualified name lookup if you *really* mean
}                       // to call 'detail::baz' only (i.e. disable ADL)
} // namespace detail
...
export module M;

namespace detail {

void baz(int) {}

template <typename T>
void bar(T t) {
  using detail::baz;    // "symbolic link" 'detail::baz' (looked up at module compilation time)
  baz(t);               // into the function body (thereby available at template instantiation time)
}                       // if you want to make the call to 'baz' a customization point (i.e. enable ADL)

} // namespace detail
...

The potential pitfalls

beware of entities with internal linkage at namespace-scope when importing headers as header-units.
This is quite common when defining constants.

// header file 'some.h'

static const int the_answer = 42;  // internal linkage  ->  not exported from header unit

namespace {
  constexpr int the_beast = 666;   // internal linkage  ->  not exported from header unit
}
import "some.h"                    // import rather than #include!

int main() {
  return the_answer;               // name 'the_answer' is unknown because
}                                  // it wasn't eligible to be exported from 'some.h'

The potential pitfalls

Solution:

  • make them entities with external linkage
  • or wrap them into other entities
// header file 'some.h'

inline const int the_answer = 42; // define variable with external linkage

enum int_constants : int {        // enum definition has external linkage
  the_beast = 666;
};

struct constants {                // struct definition has external linkage
  static constexpr int      no_answer = 0;
  static constexpr unsigned pi = 4;
};
import "some.h"                    // import rather than #include!

int main() {
  return the_answer;               // name 'the_answer' is known now
}

The potential pitfalls

Within the purview of a module, the 'inline' specifier gets its original meaning back!
Member bodies with function definitions at class scope are no longer implicitly 'inline'. You may give 'inline' hints if you mean it.

export module M;

struct S {
  int foo() { return bar(); }         // no longer implicitly 'inline',
                                      // the function call might be
                                      // invalid in module context!
  
  inline int bar() { return 42; }     // safe to inline
};

The potential pitfalls

Beware of entities that are local to the TU.
Do not expose them e.g. by naming them in non-TU-local inline functions!

 

Learn more at [basic.link]#14 in the standard

export module M;

static void foo();                    // TU-local

static inline void bar() { foo(); }   // ok, TU-local

inline void baz() { bar(); }          // error, 'baz()' has module linkage
                                      // must not "expose" TU-local 'bar()'!

From header to module

A reality check

Usage scenarios

  1. Use {fmt} in traditional way by #including the required {fmt} headers
  2. As before, but with a modularized standard library and #include translation for all standard library headers included by {fmt}
  3. Use existing {fmt} headers as header units and import them
  4. Use {fmt} as named module

 

Let's examine them in detail.

Usage scenarios

Test scenario

// #include <...>, import <...>, import fmt; go here
// The fictitious code requires at least the API from fmt/format.h and fmt/xchar.h

int main {
  /* empty main to zoom in on making the API available to TU
  fictitious call:
  
  fmt::print(L"The answer is {}", 42);
  */
}

Baseline, no #include or import:

 

compile time    31 ms
 

all configurations taken on an AMD Ryzen 9 5900X, compiled with msvc 16.11.3, release mode

Usage scenarios

#include the headers

#include <fmt/format.h>
#include <fmt/xchar.h>

int main {
  /* empty main to zoom in on making the API available to TU
  fictitious call:
  
  fmt::print(L"The answer is {}", 42);
  */
}

Two configurations

header-only:  compile time   944 ms     (baseline + 913 ms), 6896 non-blank {fmt} code lines, 59'430 lines after preprocessing

static library: compile time   562 ms     (baseline + 531 ms), 4685 non-blank {fmt} code lines, 42'735 lines after preprocessing

Usage scenarios

modularized standard library + #include translation
[cpp.include]#7

#include <fmt/format.h>
#include <fmt/xchar.h>

int main {
  /* empty main to zoom in on making the API available to TU
  fictitious call:
  
  fmt::print(L"The answer is {}", 42);
  */
}

Two configurations

header-only:  compile time   511 ms     (baseline + 480 ms)

static library: compile time   304 ms     (baseline + 273 ms)

Total std lib BMI size 41 MB (461 MB if std lib user-compiled from original headers)

Usage scenarios

import the headers

import <fmt/format.h>;
import <fmt/xchar.h>;

int main {
  /* empty main to zoom in on making the API available to TU
  fictitious call:
  
  fmt::print(L"The answer is {}", 42);
  */
}

Two configurations

header-only:  compile time   64 ms     (baseline + 33 ms), BMI size 22 MB

static library: compile time   64 ms     (baseline + 33 ms), BMI size 16 MB

Usage scenarios

make The comparison fair!

#include <fmt/args.h>                    // provide the *full* API
#include 8 more {fmt} headers here          just as the named module does!
#include <fmt/xchar.h>

int main {
}

#include (header-only):      1599 ms     (baseline + 1568 ms), 90'431 lines preprocessed

#include (static library):     1422 ms     (baseline + 1391 ms), 88'576 lines preprocessed

Mod. STL (header-only):       658 ms     (baseline +    627 ms), 10'249 code lines

Mod. STL (static library):      430 ms     (baseline +    399 ms),    8'038 code lines

import (header-only):             160 ms     (baseline +    129 ms), BMI size 117 MB

import (static library):            155 ms     (baseline +    124 ms), BMI size    91 MB

Usage scenarios

import named module

import fmt;

int main {
  /* empty main to zoom in on making the API available to TU
  fictitious call:
  
  fmt::print(L"The answer is {}", 42);
  */
}

Sorry, only one configuration with everything that {fmt} can provide!

Module interface unit: 10'672 non-blank code lines from {fmt}, 128'431 lines after preprocessing

 

compile time about   31 ms
There is no measurable difference to baseline!

Usage scenarios

The final comparison result

#include <fmt/*.h>                    // provide the *full* API
import <fmt/*.h>                      // provide the *full* API
import fmt;                           // provide the *full* API

int main {
}

#include (header-only):      1599 ms     (baseline + 1568 ms), 90'431 lines preprocessed

#include (static library):     1422 ms     (baseline + 1391 ms), 88'576 lines preprocessed

Mod. STL (header-only):       658 ms     (baseline +    627 ms), 10'249 code lines

Mod. STL (static library):      430 ms     (baseline +    399 ms),    8'038 code lines

import (header-only):             160 ms     (baseline +    129 ms), BMI size  117 MB

import (static library):            155 ms     (baseline +    124 ms), BMI size     91 MB

named module:                              31 ms     (baseline +       <1 ms), BMI size        8 MB
  This is the way!                                                                                                              128'431 lines preprocessed

#include

import

Implementation Status

bumpy roads ahead ...

language / Library Features

            gcc (trunk)                    clang                    msvc
Syntax specification C++20 <= 8.0: Modules TS
>= 9.0: TS and C++20
<= 19.22: Modules TS
>= 19.23: C++20
Named modules
Module partitions
Header modules 🟩 (undocumented) 🟩 (undocumented)
Private mod. fragment
Name attachment ✅ weak model ✅ weak model ✅ strong model
#include → import
__cpp_modules  ⚠  201810L ✅ 201907L
Modularized std library 🟩 (experimental)

Build Systems

Build systems with support for C++ modules are rare

  • build2 (by Boris Kolpackov, build2.org)
    • supports clang, gcc, and msvc
  • Evoke (by Peter Bindels, GitHub)
    • clang only
  • MSBuild (by Microsoft, since msvc16.8, Visual Studio)
    • msvc only
  • make
    • bring your own build rules, f.e. like Bloomberg's P2473
       
  • more ?

Resources

Contact

  • dani@ngrt.de
  • @DanielaKEngert on Twitter

Images: Bayeux Tapestry, 11th century, world heritage

source: WikiMedia Commons, public domain

Questions?

Ceterum censeo ABI esse frangendam

A (short) Tour of C++ Modules

By dani@ngrt.de

A (short) Tour of C++ Modules

A (short) Tour of C++ Modules ©2021 Daniela Engert, distribution allowed under the terms of CC BY SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)

  • 440