Máté Cserép
December 2022, Budapest
Multithreading is just one damn thing after, before, or simultaneous with another.
Andrei Alexandrescu
Beyond the errors which can occur in single-threaded programs,
multithreaded environments are subject to additional errors:
Moreover testing and debugging of multithreaded programs are harder. Multithreaded programs are non-deterministic. Failures are often non-repeatable. Debugged code can produce very different results then non-debugged ones. Testing on single processor hardware may produce different results than testing on multiprocessor hardware.
int x, y;
// thread 1
x = 1;
y = 2;
Since C++11 this is undefined behaviour, in C++98/03 not even that.
// thread 2
std::cout << y << ", ";
std::cout << x << std::endl;
int x, y;
std::mutex x_mutex, y_mutex;
// thread 1
x_mutex.lock();
x = 1;
x_mutex.unlock();
y_mutex.lock();
y = 2;
y_mutex.unlock();
Workaround with mutexes:
// thread 2
y_mutex.lock();
std::cout << y << ", ";
y_mutex.unlock();
x_mutex.lock();
std::cout << x << std::endl;
x_mutex.unlock();
std::atomic<int> x, y;
// thread 1
x.store(1);
y.store(2);
Workaround with atomic:
// thread 2
std::cout << y.load() << ", ";
std::cout << x.load() << std::endl;
namespace std
{
class thread
{
public:
typedef native_handle /* ... */;
typedef id /* ... */;
thread() noexcept; // does not represent a thread
thread(thread&& other) noexcept; // move constructor
~thread(); // if joinable() calls std::terminate()
template <typename Function, typename... Args> // copies args to thread local
explicit thread(Function&& f, Arg&&... args); // then execute f with args
thread(const thread&) = delete; // no copy
thread& operator=(thread&& other) noexcept; // move
void swap(thread& other); // swap
bool joinable() const; // thread object owns a physical thread
void join(); // blocks current thread until *this finish
void detach(); // separates physical thread from the thread object
std::thread::id get_id() const; // std::this_thread
static unsigned int hardware_concurrency(); // supported concurrent threads
native_handle_type native_handle(); // e.g. thread id
};
}
void f(int i, const std::string& s);
std::thread t(f, 3, "Hello");
Creates a new thread of execution with t, which calls f(3, "hello"), where arguments are copied (as is) into an internal storage (even if the function takes them as reference).
If an exception occurs, it will be thrown in the hosting thread.
class f
{
public:
f(int i = 0, std::string s = "") : _i(i), _s(s) { }
void operator()() const
{
// background activity
}
int _i;
std::string _s;
};
std::thread t(f()); // Most vexing parse (Scott Meyers: Effective STL) prior C++11
std::thread t((f())); // OK
std::thread t((f(3, "Hello"))); // OK
f can be any callable function, e.g. operator():
void f( int i, const std::string&)
{
std::cout << "Hello Concurrent World!" << std::endl;
}
int main()
{
int i = 3;
std::string s("Hello");
// Will copy both i and s
//std::thread t(f, i, s);
// We can prevent the copy by using reference wrapper
std::thread t(f, std::ref(i), std::ref(s));
// If the thread destructor runs and the thread is joinable, then
// std::system_error will be thrown.
// Use join() or detach() to avoid that.
t.join();
return 0;
}
By default all arguments are copied by value, even if the function takes them as reference.
struct func
{
int& i;
func(int& i_) : i (i_) { }
void operator()()
{
for(unsigned int j = 0; j < 1000000; ++j)
{
do_something(i); // i may refer to a destroyed variable
}
}
};
int main()
{
int some_local_state = 0;
func my_func(some_local_state);
std::thread my_thread(my_func);
my_thread.detach(); // don't wait the thread to finish
return 0;
} // some_local_state is destroyed, but the thread is likely still running.
Still, there is possible to make wrong code (of course, this is C++).
Better to avoid pointers and references, or join().
class scoped_thread
{
std::thread t;
public:
explicit scoped_thread(std::thread t_): t(std::move(t_))
{
if(!t.joinable())
throw std::logic_error(“No thread”);
}
~scoped_thread()
{
t.join();
}
scoped_thread(scoped_thread const&)=delete;
scoped_thread& operator=(scoped_thread const&)=delete;
};
struct func;
void f()
{
int some_local_state;
scoped_thread t(std::thread(func(some_local_state)));
do_something_in_current_thread();
}
Source: C++ Concurrency In Action, by Anthony Williams
Implementation can also be found in the Boost Library: scoped_thread
And also in the standard since C++20: std::jthread
void f(int i, const std::string& s);
std::thread t(f, 3, "Hello");
void f(int i, const std::string& s);
void bad(int some_param)
{
char buffer[1024];
sprintf(buffer, "%i", some_param);
std::thread t(f, 3, buffer);
t.detach();
}
void good(int some_param)
{
char buffer[1024];
sprintf(buffer,"%i",some_param);
std::thread t(f, 3, std::string(buffer));
t.detach();
}
"Hello" is passed to f as const char * and converted to std::string in the new thread. This can lead to problems, e.g.:
void do_work(unsigned id);
int main()
{
std::vector<std::thread> threads;
for(unsigned i=0;i<20;++i)
{
threads.push_back(std::thread(do_work,i));
}
std::for_each(threads.begin(), threads.end(),
std::mem_fn(&std::thread::join)); // generates functor for function
return 0;
}
sth::thread is compatible with the STL containers.
std::mutex m;
int sh; // shared data
void f()
{
/* ... */
m.lock();
// manipulate shared data:
sh += 1;
m.unlock();
/* ... */
}
Mutex:
Recursive mutex:
std::recursive_mutex m;
int sh; // shared data
void f(int i)
{
/* ... */
m.lock();
// manipulate shared data:
sh += 1;
if (--i > 0) f(i);
m.unlock();
/* ... */
}
std::timed_mutex m;
int sh; // shared data
void f()
{
/* ... */
if (m.try_lock_for(std::chrono::seconds(10))) {
// we got the mutex, manipulate shared data:
sh += 1;
m.unlock();
}
else {
// we didn't get the mutex; do something else
}
}
void g()
{
/* ... */
if (m.try_lock_until(midnight)) {
// we got the mutex, manipulate shared data:
sh+=1;
m.unlock();
}
else {
// we didn't get the mutex; do something else
}
}
Timed mutex:
std::list<int> l;
std::mutex m;
void add_to_list(int value);
{
// lock acquired - with RAII style lock management
std::lock_guard<std::mutex> guard(m);
l.push_back(value);
} // lock released
Locks support the Resource Allocation Is Initialization (RAII) idiom.
Pointers or references pointing out from the guarded area can be an issue!
Since C++14 we have synchronization objects that can be used to protect shared data from being simultaneously accessed by multiple threads. In contrast to basic mutex and lock types which facilitate exclusive access, shared mutexes and locks has two levels of access:
If one thread has acquired the exclusive lock, no other threads can acquire the lock (including the shared).
If one thread has acquired the shared lock, no other thread can acquire the exclusive lock, but can acquire the shared lock.
New types are: shared_mutex, shared_timed_mutex, shared_lock
std::shared_mutex m;
my_data d;
void reader()
{
std::shared_lock<std::shared_mutex> rl(m);
read_only(d);
}
void writer()
{
std::lock_guard<std::shared_mutex> wl(m);
write(d);
}
std::shared_mutex m;
my_data d;
void reader()
{
m.lock_shared();
read_only(d);
m.unlock_shared();
}
void writer()
{
m.lock();
write(d);
m.unlock();
}
template <class T>
bool operator<(const T& lhs, const X& rhs)
{
if (&lhs == &rhs)
return false;
lhs.m.lock(); rhs.m.lock();
bool result = lhs.data < rhs.data;
lhs.m.unlock(); rhs.m.unlock();
return result;
}
The code below can result in a deadlock when a < b and b < a are simultaneously evaluated on 2 threads.
Avoid deadlocks
template <class T>
bool operator<(T const& lhs, X const& rhs)
{
if (&lhs == &rhs)
return false;
// std::lock - locks two or more mutexes
std::lock(lhs.m, rhs.m);
// std::adopt_lock - assume the calling thread already has ownership
std::lock_guard<std::mutex> lock_lhs(lhs.m, std::adopt_lock);
std::lock_guard<std::mutex> lock_rhs(rhs.m, std::adopt_lock);
return lhs.data < rhs.data;
}
A correct solution to avoid deadlock:
With the lock guards, mutexes are released with RAII.
template <class T>
bool operator<(T const& lhs, X const& rhs)
{
if (&lhs == &rhs )
return false;
// std::unique_locks constructed with defer_lock can be locked
// manually, by using lock() on the lock object ...
std::unique_lock<std::mutex> lock_lhs(lhs.m, std::defer_lock);
std::unique_lock<std::mutex> lock_rhs(rhs.m, std::defer_lock);
// lock_lhs.owns_lock() now false
// ... or passing to std::lock
std::lock(lock_lhs, lock_rhs); // designed to avoid dead-lock
// also there is an unlock() member function
// lock_lhs.owns_lock() now true
return lhs.data < rhs.data;
}
Another correct solution with different approach:
std::unique_lock can be locked and unlocked, in comparison to std::lock_guard.
(It is also moveable, but not copyable, but that is not a factor here.)
template <class T>
bool operator<(T const& lhs, X const& rhs)
{
if (&lhs == &rhs )
return false;
// designed to avoid dead-lock
std::scoped_lock(lhs.m, rhs.m);
return lhs.data < rhs.data;
}
C++17 simplifies the problem with the introduction of scoped_lock, specifically designed for locking (and releasing) multiple mutexes and the same time in RAII style:
template <typename T>
class MySingleton
{
public:
static std::shared_ptr<T> instance()
{
if (!resource_ptr)
{
resource_ptr.reset(new MySingleton(/* ... */)); // lazy initialization
}
return resource_ptr;
}
private:
static std::shared_ptr<T> resource_ptr;
static mutable std::mutex resource_mutex;
};
Problem: in case of multiple threads, a race condition can occur, where the resource_ptr gets initialized more than once!
template <typename T>
class MySingleton
{
public:
static std::shared_ptr<T> instance()
{
std::unique_lock<std::mutex> lock(resource_mutex);
if (!resource_ptr)
resource_ptr.reset(new MySingleton(/* ... */));
lock.unlock();
return resource_ptr;
}
private:
static std::shared_ptr<T> resource_ptr;
static mutable std::mutex resource_mutex;
};
Problem: while the problematic race condition is connected only to the initialization of the Singleton instance, the critical section is executed for every calls of the instance() method. Such an excessive usage of the locking mechanism may cause serious overhead which could not be acceptable.
template <typename T>
class MySingleton
{
public:
static std::shared_ptr<T> instance()
{
if (!resource_ptr)
{
std::lock_guard<std::mutex> guard(resource_mutex);
resource_ptr.reset(new MySingleton(/* ... */));
}
return resource_ptr;
}
private:
static std::shared_ptr<T> resource_ptr;
static mutable std::mutex resource_mutex;
};
Problem: to address the previous problem, the critical section is narrowed down to the construction of the object. This means checking the resource_ptr to be initialized is not guarded, resulting in possible multiple object creation.
template <typename T>
class MySingleton
{
public:
std::shared_ptr<T> instance()
{
if (!resource_ptr) // 1
{
std::unique_lock<std::mutex> lock(resource_mutex);
if (!resource_ptr)
resource_ptr.reset(new T(/* ... */)); // 2
lock.unlock();
}
return resource_ptr;
}
private:
static std::shared_ptr<T> resource_ptr;
static mutable std::mutex resource_mutex;
};
Problem: load in (1) and store in (2) is not synchronized:
An overly-agressive compiler can optimize by caching the pointer in some way (e.g. storing it in a register) or by removing the second check of the pointer. (Possible fix: volatile.)
Double-Checked Locking Pattern
template <typename T>
class MySingleton
{
public:
static std::shared_ptr<T> instance()
{
std::call_once(resource_init_flag, init_resource);
return resource_ptr;
}
private:
void init_resource()
{
resource_ptr.reset(new T(/* ... */));
}
static std::shared_ptr<T> resource_ptr;
static std::once resource_init_flag; // can't be moved or copied
};
std::call_once is guaranteed to execute its callable parameter exactly once, even if called from several threads.
Source: Effective C++, by Scott Meyers
class MySingleton;
MySingleton& MySingletonInstance()
{
static MySingleton _instance;
return _instance;
}
C++11 guarantees local static is initialized in a thread safe way!
std::mutex my_mutex;
std::queue<data_t> my_queue;
std::conditional_variable data_cond; // conditional variable
void producer() {
while (more_data_to_produce())
{
const data_t data = produce_data();
std::lock_guard<std::mutex> prod_lock(my_mutex); // guard the push
my_queue.push(data);
data_cond.notify_one(); // notify the waiting thread to evaluate cond.
}
}
void consumer() {
while (true)
{
std::unique_lock<std::mutex> cons_lock(my_mutex); // not lock_guard
data_cond.wait(cons_lock, // returns if lamdba returns true
[&my_queue]{return !my_queue.empty();}); // else unlocks and waits
data_t data = my_queue.front(); // lock is hold here to protect pop...
my_queue.pop();
cons_lock.unlock(); // ... until here
consume_data(data);
}
}
Classical producer-consumer example:
int f(int);
void do_other_stuff();
int main()
{
std::future<int> the_answer = std::async(f, 1);
do_other_stuff();
std::cout<< "The answer is " << the_answer.get() << std::endl;
return 0;
}
The std::async() executes the task either in a new thread or on get().
// starts in a new thread
auto fut1 = std::async(std::launch::async, f, 1);
// run in the same thread on wait() or get()
auto fut2 = std::async(std::launch::deferred, f, 2);
// default: implementation chooses
auto fut3 = std::async(std::launch::deferred | std::launch::async, f, 3);
// default: implementation chooses
auto fut4 = std::async(f, 4);
If no wait() or get() is called, then the task may not be executed at all.
double square_root(double x)
{
if (x < 0)
{
throw std::out_of_range("x is negative");
}
return sqrt(x);
}
int main()
{
std::future<double> fut = std::async(square_root, -1);
double res = fut.get(); // f becomes ready on exception and rethrows
return 0;
}
double long_calculation(int n)
{
/* ... */
}
int main()
{
std::async(std::launch::async, long_calculation, 42); // ~future blocks
std::async(std::launch::async, long_calculation, 100); // ~future blocks
}
int main()
{
std::future<double> fut1 = std::async(std::launch::async, long_calculation, 42);
// no blocking
std::future<double> fut2 = std::async(std::launch::async, long_calculation, 100);
// no blocking
}
Keep in mind that the futures has a special shared state, which demands that future::~future blocks.
For real asynchronous you need to keep the returned future:
A promise is a tool for passing the return value (or exception) from a thread executing a function to the thread that consumes the result using future.
void asyncFun(std::promise<int> myPromise)
{
int result;
try
{
// calculate the result
myPromise.set_value(result);
}
catch (...)
{
myPromise.set_exception(std::current_exception());
}
}
int main()
{
std::promise<int> intPromise;
std::future<int> intFuture = intPromise.getFuture();
std::thread t(asyncFun, std::move(intPromise));
// do other stuff here, while asyncFun is working
int result = intFuture.get(); // may throw exception
return 0;
}
void asyncFun(std::promise<int> myPromise)
{
int result;
try
{
// calculate the result
myPromise.set_value(result);
}
catch (...)
{
myPromise.set_exception(std::current_exception());
}
}
double square_root(double x)
{
if ( x < 0 )
{
throw std::out_of_range("x<0");
}
return sqrt(x);
}
int main()
{
double x = 4.0;
std::packaged_task<double(double)> tsk(square_root);
std::future<double> fut = tsk.get_future(); // future will be ready when task completes
std::thread t(std::move(tsk), x); // make sure, task starts immediately
// on different thread
// thread can be joined, detached
double res = fut.get(); // using the future
return 0;
}
A higher level tool than promises.
template <typename> class my_task;
template <typename R, typename ...Args>
class my_task<R(Args...)>
{
std::function<R(Args...)> fn;
std::promise<R> pr;
public:
template <typename ...Ts>
explicit my_task(Ts&&... ts) : fn(std::forward<Ts>(ts)...) { }
template <typename ...Ts>
void operator()(Ts&&... ts)
{
pr.set_value(fn(std::forward<Ts>(ts)...));
}
std::future<R> get_future() { return pr.get_future(); }
// disable copy, default move
};
In the end a std::packaged_task is just a lower level feature for implementing std::async (which is why it can do more than std::async if used together with other lower level stuff, like std::thread).
Simply spoken a std::packaged_task is a std::function linked to a std::future and std::async wraps and calls a std::packaged_task (possibly in a different thread).
C++17 brings us parallel algorithms, so the well known STL algorithms (std::find_if, std::for_each, std::sort, etc.) get a support for parallel (or vectorized) execution.
vector<int> v = { /* ... */ };
// standard sequential sort
std::sort(v.begin(), v.end());
// sequential execution
std::sort(std::parallel::seq, v.begin(), v.end());
// permitting parallel execution
std::sort(std::parallel::par, v.begin(), v.end());
// permitting vectorized execution (only since C++20)
std::sort(std::parallel::unseq, v.begin(), v.end());
// permitting parallel and vectorized execution
std::sort(std::parallel::par_unseq, v.begin(), v.end());
What is vectorized (or unsequenced) execution?
Parallel STL is only implemented in modern compilers, so keep in mind where you can use this new feature.
Support is not necessarily complete immediately, e.g. MSVC only implements the vectorized execution policy since MSVC 19.28 (VS 2019 16.8+)
Follow current state at:
https://en.cppreference.com/w/cpp/compiler_support
Look for "Parallel algorithms and execution policies".