Finding a Needle in a Haystack

JVM logging guide

Avishai Ish-Shalom (@nukemberg)

Software Fairy @ Wix.com

Logging is easy, right?

Only two log levels

Too much bullshit
I'm f*cking blind

Information theory protip:

Different log events do not have the same value

Hard to use

Hard to parse
Hard to correlate
No context

Performance & stability issues

Too many logs can kill performance
Clogged logger can block application
Resource consumption

Robots need love too

What do we want from a good logging library?

Structured logs
Integrate with pipeline
Be fast
Graceful degradation
Add context
Elaborate event filtering

Agenda

Logging libraries overview
Structure & context
Separating signal from noise
Production grade logging

But before we start

Define "log"

JVM Logging

JUL
Log4j 1.x
Logback
Log4j 2
tinylog
slf4j
commons-logging

JUL

Please tell me you're not using it*

Log4j 1.x

Performance	Meh
Extensions	Porn grade
Features	Essentials + a little extra
Degradation	what, u don't like blocking?

Mature, battle tested

Logback

Performance	Good
Extensions	Porn grade
Features	Carnaval
Degradation	Good

Mature, battle tested, featureful

Log4j 2.x

Performance	Insane
Extensions	HBO
Features	Carnaval
Degradation	Very good

Complete redesign, featureful

Most awesome logger yet

TinyLog

Performance	batshit crazy
Extensions	What extensions?
Features	This is Sparta
Degradation	Very good

The logger no one knows about

Fast, minimal

SLF4j, commons-logging

Abstraction
Lowest common denominator
Designed for use in libraries

TLDR: use SLF4J for libraries

Structure & Context

MDC (logback)

ThreadContext (log4j2)

Thread local
Set once at transaction start, clear at end
Strings only
Useful for user ID, transaction ID, etc
Not passed to thread pools
Hard to use with async frameworks (e.g. Akka)

// log4j 2.x
ThreadContext.put("transactionID", UUID.RandomUUID().toString());

// log4j 1.x, logback
MDC.put("userID", userID);

Structured messages

Native support in Log4j 2.x
Can "fake" in logback and log4j 1.x
SLF4J extensions EventLogger

Map m = HashMap();
m.put("extra_data", "extra_value");
logger.info(new StructuredDataMessage("id", "message", "type", m));

m.put("more_data", "more_value");
logger.info(new MapMessage(m));

Logs may still need secondary parsing

Log event identifiers

Local timestamp (UTC, standard format)
Host ID
container ID
App ID
Transaction ID
Request ID
User ID
Tenant/environment ID
App version/build

Separating Signal from Noise

Why?

Logs are the primary telemetry channel from production

There are 2 types of logs

Stats telemetry
Debug telemetry

Ideally we would have close to zero debug telemetry

But we don't know in advance what we'll need

Log levels

TRACE, DEBUG, INFO, WARN, ERROR
+ FATAL on Log4j
Custom levels on Log4j
TRACE, FATAL rarely used

Fastest filtering mechanism

A normal error is not

Logger hierarchy

Configure only top loggers
Inherits minimal level
Custom names

// logback, log4j 1.x
Logger logger = LoggerFactory.getLogger(this.class);
Logger logger = LoggerFactory.getLogger("audit." + this.class);
Logger rootLogger = LoggerFactory.getRootLogger();

// log4j 2.x
Logger logger = LogManager.getLogger(this.class);

Tags, Markers

import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.LogManager;
import java.util.Map;
 
public class MyApp {
 
    private Logger logger = LogManager.getLogger(MyApp.class.getName());
    private static final Marker SQL_MARKER = MarkerManager.getMarker("SQL");
    private static final Marker UPDATE_MARKER = 
        MarkerManager.getMarker("SQL_UPDATE").setParents(SQL_MARKER);
    private static final Marker QUERY_MARKER = 
        MarkerManager.getMarker("SQL_QUERY").setParents(SQL_MARKER);
 
    public String doQuery(String table) {
        logger.traceEntry(param);
        logger.debug(QUERY_MARKER, "SELECT * FROM {}", table);
        ...
        return logger.traceExit(ret);
    }
 
    public String doUpdate(String table, Map<String, String> params) {
        logger.traceEntry(param);
        logger.debug(UPDATE_MARKER, "UPDATE {} SET {}", table, formatCols());
        ...
        return logger.traceExit(ret);
    }

}

Flow tracing

.exit, .enter, .catching, .throwing
TRACE log level with markers
markers inherit from FLOW

public int exampleException(String arg) {
    logger.traceEntry(arg);
    int n;
    try {
        String msg = messages[messages.length];
        n = someFunction(msg); 
        logger.error("An exception should have been thrown");
    } catch (Exception ex) {
        logger.catching(ex);
    }
    return logger.traceExit(n);
}

Probably a bad idea for high performance servers

Log sampling

Log 1/N transactions
Get same transaction for all events

ThreadContext.put("userFraction", Random.nextInt(100));

<ThreadContextMapFilter onMatch="ACCEPT" onMismatch="NEUTRAL" operator="or">
   <KeyValuePair key="userFraction" value="1"/>
</ThreadContextMapFilter>

Alternatively, log only userID XXX

Log sampling

Finer logs of specific transactions
Change log level dynamically

ThreadContext.put("userId", requestContext.getUserID());
// trace suspicious requests
if (requestContext.isTrace)
  ThreadContext.put("trace", "true");

<DynamicThresholdFilter key="UserId" defaultThreshold="ERROR"
       onMatch="ACCEPT" onMismatch="NEUTRAL">
   <KeyValuePair key="avishai" value="DEBUG"/>
</DynamicThresholdFilter>

<DynamicThresholdFilter key="trace" defaultThreshold="ERROR" 
       onMatch="ACCEPT" onMismatch="NEUTRAL">
   <KeyValuePair key="true" value="TRACE"/>
</DynamicThresholdFilter>

Retroactive logging

Log debug to ringbuffer
Drop if success
Flush to logger if fail

void Constructor() {
    CircularFifoBuffer logBuffer = new CircularFifoBuffer(1000);
}

void someMethod() {
    logBuffer.add("some message");
    logBuffer.add(() -> "lazy debug message");
    
    try {
        ...
    } catch (Exception ex) {
        flushBuffer(logBuffer);
    }
    logBuffer.clear();
}

void flushBuffer(CircularFifoBuffer buffer) {
    for (Object o: buffer) {
        logger.debug(o);
    }
}

Production grade logging

Log rotation

On disk logs are a buffer
Use sized based rotation
Prefer a separate partition

Serialized Logs

Faster than message formatters
Structured data, no parsing
Easier to integrate with pipelines

Common: GELF, json, msgpack, thrift

Async appenders

huge performance boost, fault tolerance

IMPORTANT: properly shutdown logging subsystem

Handling load

AsyncAppender

Wraps another appender
Events placed in queue and consumed by worker thread
Logback - Drop TRACE, DEBUG and INFO when 80% full
Log4j2 - Fail over to secondary error appender

BurstFilter (log4j 2.x)

Drop events when rate exceeded
Allow bursts

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="warn" name="MyApp" packages="">
  <Appenders>
    <RollingFile name="RollingFile" fileName="logs/debug.log"
                 filePattern="logs/debug-%d{MM-dd-yyyy}.log.gz">
      <BurstFilter level="INFO" rate="100" maxBurst="1000"/>
      <PatternLayout>
        <pattern>%d %p %c{1.} [%t] %m%n</pattern>
      </PatternLayout>
      <TimeBasedTriggeringPolicy />
    </RollingFile>
  </Appenders>
  <Loggers>
    <Root level="debug">
      <AppenderRef ref="RollingFile"/>
    </Root>
  </Loggers>
</Configuration>

IsDebugEnabled

if (logger.isDebugEnabled()) {
    logger.debug("some debug message, value: " + object.toString());
}

What's wrong with this picture?

logger.debug("some debug message, value: {}", object);

// or with log4j 2.x and java 8
logger.debug("some debug message, value: {}", () -> object.expensiveMethod());

When level>=DEBUG: 2 functions calls per log message

Better*:

Runtime reconfig

Auto config file reload
JMX (log4j 2.x)
Programatic

Questions?

jobs@wix.com

Happy logging!

jobs@wix.com

Finding a needle in a haystack - JVM logging guide

By Avishai Ish-Shalom

Finding a needle in a haystack - JVM logging guide

5,818

Avishai Ish-Shalom

nukemberg

Finding a Needle in a Haystack

JVM logging guide

Logging is easy, right?

Only two log levels

Hard to use

Performance & stability issues

Robots need love too

What do we want from a good logging library?

Agenda

But before we start

Define "log"

JVM Logging

JUL

Please tell me you're not using it*

Log4j 1.x

Logback

Log4j 2.x

TinyLog

SLF4j, commons-logging

Structure & Context

MDC (logback)

ThreadContext (log4j2)

Structured messages

Log event identifiers

Separating Signal from Noise

There are 2 types of logs

Log levels

A normal error is not

Logger hierarchy

Tags, Markers

Flow tracing

Log sampling

Log sampling

Retroactive logging

Production grade logging

Log rotation

Serialized Logs

Async appenders

Handling load

AsyncAppender

BurstFilter (log4j 2.x)

IsDebugEnabled

Runtime reconfig

Questions?

Happy logging!

Finding a needle in a haystack - JVM logging guide

More from Avishai Ish-Shalom