Networked Games

Problem Statement


We'd like to have more than one computer,
connected together via some kind of network,
and use this to provide a shared, interactive,
real-time simulation...

...to an arbitrary set of users, who are
possibly dispersed world-wide.

Easy Peasy, Lemon Squeezy?




NO! It's Difficult Difficult, Lemon Difficult.

Why So Difficult?


  • Networks are intrinsically unreliable.
    (i.e. your messages may not arrive)
  • They have limited capacity.
    (i.e. yr msgs wll hv 2b cmprsd)
  • They are slow.
    (...+)
  • They are hard to test.
    (i.e. you never step into the same river twice)

*...or even out of order!
+...i.e. sometimes the messages arrive quite late*.

Why Are Networks Unreliable?



So, in short, "lots of reasons".

Protocols


Networks are so complicated, unreliable and diverse that, nowadays at least, we rarely program to them "directly".

Instead, we access them via system-level libraries which implement various somewhat abstract "protocols"
that are intended to make the whole process
slightly less awful and maddening.

...they partially succeed.

Which is to say, it's even worse without them!

The UDP Protocol


UDP (User Datagram Protocol) is part of the
Internet Protocol Suite (aka TCP/IP), and it
provides a relatively "raw" form of access
between an Application and The Internet.

As such, it exposes the App to all the craziness
of the underlying network, and requires the App
to be written to handle unreliability/congestion/etc.

On the plus side, it is simple, low-overhead and "fast".

The TCP Protocol


Unlike UDP, TCP (Transmission Control Protocol) is a
non-trivial protocol which aims to insulate Applications
from the unreliability of the network, by having various
built-in error checking and correction mechanisms.

This allows it to provide a reliable, ordered and
error-corrected communication stream between Apps.

...well, up to a point...

It still can't work miracles!

IP: Internet Postcards


The Internet (and many other networks today) is
what's known as a "packet-switched network"

In essence, this approach means that all messages on the network are split up into smallish units (e.g. "packets", "datagrams", "frames" or whatever) which are then sent individually by whatever route is available at the time.

It's like putting your "electronically mailed letter" onto
a series of postcards, and sending those individually.

Your Postcards In The Tubes

...and, with UDP, that is all you get!

UDP is 100% "fire and forget" --- your postcard may arrive,
or it may not. If it arrives, it might be smudged to illegibility when it eventually gets there. Even if it was sent as part of
an ordered batch, it may not arrive in any particular order,
or by any particular route.

TCP, on the other hand, puts sequence numbers on each postcard and sends replies to acknowledge their receipt
(or to request the resending of missing ones)

Which Protocol To Use?


You might (naively) assume that, because TCP
does a lot of useful "corrective" work for you,
you should always adopt it.

Unfortunately, the downside of all that "fake reliability" is
that it has to do a lot of buffering and, where required,
re-sending of the data... which uses memory and,
perhaps more importantly, takes time.

i.e. it makes the "latency" even worse.
And latency is frequently The Enemy in games.

Simultaneity?


Recall that, in a networked game, we're usually trying
to create the sense of a shared, real-time simulation ---
this implies that the actions of all participants should,
ideally, be treated as happening simultaneously, and
be made visible to other participants instantaneously.

The reality of network latency gets in the way of this,
and it's usually up to us to minimise its effect.

Typical internet latencies might be around
100 milliseconds (for international round-trips).

Let's Look At Latency


The speed of light presents an interesting theoretical limit on latency. If you assume no packet switching or routing delays, you could calculate the theoretical minimum signalling time from Berkeley (CA) to Boston (MA) as follows:

Distance from Berkeley to Boston: 4983 km
Speed of light in a vacuum: 3 * 10^8 m/s
Speed of light in fibre: .66 * 3 * 10^8 m/s = 2 * 10^8 m/s
Time to go from Berkeley to Boston: 4983 km / (2 * 10^8 m/s) = 24.9 ms
Round trip time: 2 * 24.9 ms = 49.8 ms

Plus Delays...


Realistically, you can expect to add a fudge factor of
about 20 ms for switching delays, imperfect transmission media, cosmic rays, etc., so you'd typically see round-trip
times of maybe 70 ms or so across the U.S.A.,
from the West Coast and the East Coast.

I've found some online sources which suggest that these numbers are in the right ball-bark, at least.

A Diagram


"Simple" Cases


Some combinations of game and network are simpler
to deal with than others. For example, a non-"twitch"
turn-based game (e.g. Chess) is relatively insensitive
to latency, and requires very little bandwidth.

Also, a game being played over a LAN or, better yet,
between two computers directly connected by a cable
(e.g. a "null modem" link) is much less of a challenge than
one which has to deal with the full chaos of the Internet.

Real-time, twitchy, internet games are the Hard Cases.

What To Send


One of the central questions when designing the network
architecture for a game, is "what do we send?". There are
several possible answers. You could potentially send:

  • raw user-inputs (e.g. key-presses)
  • logical "actions" (e.g. each "move")
  • states (e.g. where things are)
  • state-deltas (e.g. how things change)
  • pictures (e.g. actual rendered images!)

How To Send


Some of the practicalities are also dictated by the expected network infrastructure that your game will be running on. You don't always have much choice over this, but options include:

  • Direct cabling
  • Direct wireless
  • A Local Area Network
  • A Campus Area Network
  • A Wide Area Network

Topology


There is also the matter of "Network Topology" i.e. the "geometry" of how the nodes in a network are actually connected. e.g.

 

THE INTERNET

A Global Network Of Networks!

Case Study: Stunt Car Racer


A fun little game from 1989.
Supported 2-player "networked" play,
via a "null modem cable".

It probably just sent control inputs along the wire.

This requires fully deterministic game logic, to ensure that both sides agree on the consequences of the known actions.

It also requires agreement on the "delta time"
values for each frame. 

Case Study: Big Red Racing


This game from 1996 supported up to 6 players over a
LAN-type connection, using a peer-to-peer topology.

Each "peer" computes and sends local state information
(e.g. position, velocity etc.) about the entities over which
it has authority. Remote entities are interpolated/
extrapolated from the received network inputs.

i.e. loosely-coupled "Distributed Simulation"!

Lots of problems: big corrections, even divergent outcomes!

Lock Step


The standard approach to limiting peer-to-peer divergence
is to use a "lock-step" system to keep everyone in sync.

i.e. each machine waits for the others to take their turn, and send their results, before any further computation is done.

This guarantees sync... but at the expense of imposing
"weakest link" latency on all participants. The resulting
system is also very susceptible to connection problems.

It doesn't scale well (maybe up to 4 or 6 players)

Local Buffered Input


One of the problems with lock-stepping is the need to wait
for the messages from all the other machines to be received before you can proceed.

It's possible to work around this by delaying the local inputs (by approx. the network latency) and having everyone agree
to run the simulation based on "old data" in the meantime.

This might sound a bit mad, but it can be made to work,
and provides a nice, simple, low-bandwidth solution
in some cases e.g. Crackdown (2007)

(End of Part One)

Authoritative Server


Peer-to-peer networking definitely has its limits though
(e.g. security, scalability). So, in many cases, it's generally
more practical to use a client-server model instead.

The idea here is to put one machine "in charge" of the
overall true state of the simulation, and have it act as
a comms hub which tells all the clients what to do.

In its simplest variant, the clients are entirely "dumb" and simply send control inputs up to the server, and then
let it inform them of what to render, and where.

Partial Information


The C-S model is often used in simulations where the
players are somewhat dispersed "in game", and do not
generally have perfect information about each other.

It is also used in scenarios where the individual clients
may not be powerful enough to compute the entire
simulation state themselves.

As such, they rely on the server to tell them what is going on, rather than calculating it locally from all the user-inputs...

Sending State


The lack of "perfect information" on the part of clients
means that they need to be explicitly informed about
the state of relevant entities, which is a big shift from
our early ideas of "broadcasting inputs".

However, although sending states is, in principle, more bandwidth-intensive than sending inputs, if the subset
of relevant entities is small enough compared to the total,
this approach can actually turn out to be a saving overall.

(especially when combined with the careful use of deltas)

Client-Server Downsides

One downside of the C-S model is that we now have to deal with round-trip latency (from client to server and back) instead of the direct one-way latency of Peer-to-Peer.

This can also introduce a degree of "unfairness", where
players with faster links to the server enjoy a significant advantage (especially the player on the server).

Also, by having the server do "all" of the work,
the simulation is no longer distributed, and the
server itself becomes a significant bottleneck
(and a Single Point Of Failure).

Client-Side Prediction


One approach to the C-S latency problem is to allow the
clients to make local predictions about the future state
of the simulation, thereby "hiding" the network delay
for any locally predictable actions.

Of course, the clients have incomplete/imperfect data about the simulation and (usually) no "authority" over it, so these predictions are merely "educated guesses", which can potentially be over-ruled by the server.

Among other things, it's hard to predict the other players.

Prediction is Difficult

...difficult, lemon difficult
...especially about the future

One of the consequences of client-side prediction is the creation of a weird time-warp effect, in which the state of locally predicted entities runs ahead of the true situation on the server, while the state of unpredicted remote/foreign entities lags behind the truth by a similar amount.

This means that interactions between local and foreign entities tends to be unreliable e.g. collision with other players can be problematic (as can line-of-sight, aiming, shooting). :-(

When Predictions Go Wrong


If a client-side prediction turns out to be wrong,
it must somehow be corrected.

The corrections often involve a significant change in the apparent state of some entities and, if these happen
abruptly, the result is noticeably ugly.

To address this, clients will typically blend towards the corrected state... in fact, to avoid compounding the latency problem, they must actually blend towards an extrapolated future state! This can be very "lemon difficult" indeed...

Prediction Buffers


A particular difficulty with implementing prediction-correction is handling the knock-on consequences of everything that has happened since the incorrect
prediction was made!

Nowadays, a common approach to this is to have the client retain a buffer of the recent history of local player inputs, such that it can "replay" these inputs on top of any corrections that
it receives, to provide a kind of semi-reliable extrapolation.

Have I mentioned that this is difficult?

The Limits Of Prediction


Because a prediction can potentially be wrong,
and therefore be corrected or revoked at a later stage,
it's not possible (or, at least, not advisable) to predict
things which cannot be cleanly undone again.

For example, it is usually unwise to "predict"
something such as a death or a pick-up event.

MMOGS!

OMG!

Massively-Multiplayer Online Games take most of
the previous "network gaming" difficulties...
and extend them to ludicrous extremes.

Generally speaking, the architecture used is a client-server one, typically with strong server-side authority (for multiple reasons: including in-game "anti-cheating" and, quite frankly,
real-world "anti-piracy"). The resulting centralisation creates significant server-side burdens, which can be ameliorated by some client-side trickery and clever optimisation techniques.

MMO Scaling


When you have 100s or 1000s of players on a single "server", the per-player costs (for CPU, RAM and Bandwidth usage) start to become very significant, even if they are
relatively small on an individual basis.

Also, any kind of worse-than-linear scaling becomes a real problem. A big part of designing the engine-tech for an MMO involves tackling all those O(N^2) logic hotspots, which can be found in "naive" collision-detection and object-relevance computations: smart spatial data-structures, selective
object activation and clever caching can all help here.

Sharding


Another standard approach in MMOs is to split-up
the (potentially very large) player-base into a number of
"shards"/"parallel universes", each of a limited capacity.

This allows most of the "scaling" to be done in simple linear terms... just by added more shards --- which, in practice, means buying (or, these days, renting) more servers.

This is, for the most part, a nice, simple, economical way to handle the large-scale scaling issues... albeit at the expense of splitting-up the player base, in a somewhat limiting manner.

Case Study: APB


We had to support 100 players (each with a uniquely customised character mesh and a unique vehicle) in
single shared shard, using the Unreal 3 Engine ---
which, in its native form, struggles to deal with
around 16 (generally non-unique) players.

We killed all the O(N^2) logic, overhauled the "octree", rewrote the "relevance" system, and added prediction, interpolation, compression and a bunch of other stuff.

...it's a pity that no-one actually bought it, really. :-(

Case Study: EVE Online


EVE has somewhere around half a million active accounts world-wide, with a PCU (peak concurrent users)
count of approximately 65 thousand.

And they are all in a single logical shard!

The "EVE Way" is very unusual: The CCP system operates as
a Distributed Computing Cluster, comprised of a "mesh" of many high-powered physical and logical "server" machines, among which the overall load is distributed at runtime.

A Picture of EVE


What data does EVE send?

(from client to server)

  • Not user-inputs
  • Not states
  • Not state-deltas
  • It sends function calls!

i.e. EVE is actually an RPC (Remote Procedure Call) system.
Such RPCs have significant latency though, so what should the client do while it's waiting for the reply from the server?

The answer is, "it should do something else". ;-)

Something Else


The "something else" is achieved by switching over to some other independent task -- a bit like what your operating system does when a process is "blocked" waiting on I/O.

In the EVE code-base, this is done via something called "Tasklets", which are a kind of lightweight thread-like facility (sometimes called "green threads") that is managed by the application itself (instead of the operating system).

Only one Tasklet is actually running at any given time, but they allow us to keep active during our "async" RPC delays.

EVE Server-to-Client


In addition to responses to RPCs, the other main traffic
from the EVE server to the client is a description of what's happening in the physics simulation of nearby entities
(a region known internally as the "ball park").

This takes the form of a relatively low-bandwidth
"command stream", containing info such as
"Ship <x> is now orbiting Object <y>"

This info is sent reliably, and is used to implement a deterministic simulation on both server and client(s).

"Cloud Gaming"


Back at the start, one of the possible answers on the
"What To Send" slide was "actual rendered images"...
and we treated this proposal with the derision
it so clearly deserves.

BUT!

This may be the seed of a plan so crazy that it
Just Might Work!

The Cloud In Your Pocket?


Today, in 2021, we have lots of cute little network-equipped mobile devices with pretty decent graphics -- but, in certain respects, limited overall power. Certainly, they aren't really able to run "full quality" console-like experiences.

Also, even if they were, the practical difficulties of producing games for all of these different (and fragmented) platforms, including the large memory requirements of many games,
and the hassles associated with installation and upgrading, makes the prospect unattractive in many ways.

A Universal Client


...but what if you only needed a single, rather simple, app
to be installed on your device (essentially a kind of custom video-streaming thing) to let you play any game, by having
all the real work take place on servers in the "The Cloud".

i.e. You'd have a "dumb client" which sent control inputs up
to the server, and received rendered images as a result!

It sounds sort-of-cool, but also seems very impractical...

Surely this sucks, for both bandwidth and latency?

Not Necessarily!


Most devices (and networks) can already handle
pretty decent real-time video streaming, and modern
super-powered server hardware can perform the
necessary encoding and compression in real-time too!

So, the real problem is latency.

...which can sometimes be hidden, at least partially.

Latency Hiding


One of the really cool ideas about latency hiding comes from the realisation that much of what happens in a simulation doesn't depend directly or immediately on the user's input,
so a little bit of "cause and effect lag" is often invisible.

What you really notice is the lag in your own movements...

And, clearly, your own movements do have an immediate effect, but it's actually quite a narrow one... it just changes your "point of view" (aka "view matrix").

Time Warping!


The details of this are complicated, but the basic idea is that you can draw an image based on an old position and then,
at the last minute, "warp" it to match a new position.

So, the server can render, and send, an image based on your "lagged" position, but your client can then tweak it a little to match your instantaneous local position -- lag begone! (ish)

(This requires sending the scene's depth information in addition to the standard colour components, and it isn't trivial, but "smart people" are looking into it).

The Time War


Notably, John Carmack worked on techniques of this kind
as part of the War On Latency that he's been waging, especially in the context of 3D "Virtual Reality" gaming
(which is particularly sensitive to latency problems),
during his work at Oculus (now "Meta").

Will any of this pan out? Is "Cloud Gaming"
(with or without crazy 3D stuff) The Future?

I dunno! -- it could go either way, but it's not as completely insane as it first appears, and I thought it might interest you.

Networked Games

By Pat Kerr

Networked Games

It's "difficult, difficult, lemon difficult"

  • 2,836