Finding a Needle in a Haystack

JVM logging guide

Avishai Ish-Shalom (@nukemberg)

Software Fairy @ Wix.com

Logging is easy, right?

Only two log levels

  • Too much bullshit
  • I'm f*cking blind

Information theory protip:

Different log events do not have the same value

Hard to use

  • Hard to parse
  • Hard to correlate
  • No context

Performance & stability issues

  • Too many logs can kill performance
  • Clogged logger can block application
  • Resource consumption

Robots need love too

What do we want from a good logging library?

  • Structured logs
  • Integrate with pipeline
  • Be fast
  • Graceful degradation
  • Add context
  • Elaborate event filtering

Agenda

  • Logging libraries overview
  • Structure & context
  • Separating signal from noise
  • Production grade logging

But before we start

Define "log"

JVM Logging

  • JUL
  • Log4j 1.x
  • Logback
  • Log4j 2
  • tinylog
  • slf4j
  • commons-logging

JUL

Please tell me you're not using it*

Log4j 1.x

Performance Meh
Extensions Porn grade
Features Essentials + a little extra
Degradation what, u don't like blocking?

Mature, battle tested

Logback

Performance Good
Extensions Porn grade
Features Carnaval
Degradation Good

Mature, battle tested, featureful

Log4j 2.x

Performance Insane
Extensions HBO
Features Carnaval
Degradation Very good

Complete redesign, featureful

 

Most awesome logger yet

TinyLog

Performance batshit crazy
Extensions What extensions?
Features This is Sparta
Degradation Very good

The logger no one knows about

 

Fast, minimal

SLF4j, commons-logging

  • Abstraction
  • Lowest common denominator
  • Designed for use in libraries

 

TLDR: use SLF4J for libraries

Structure & Context

MDC (logback)

ThreadContext (log4j2)

  • Thread local
  • Set once at transaction start, clear at end
  • Strings only
  • Useful for user ID, transaction ID, etc
  • Not passed to thread pools
  • Hard to use with async frameworks (e.g. Akka)
// log4j 2.x
ThreadContext.put("transactionID", UUID.RandomUUID().toString());

// log4j 1.x, logback
MDC.put("userID", userID);

Structured messages

  • Native support in Log4j 2.x
  • Can "fake" in logback and log4j 1.x
  • SLF4J extensions EventLogger
Map m = HashMap();
m.put("extra_data", "extra_value");
logger.info(new StructuredDataMessage("id", "message", "type", m));

m.put("more_data", "more_value");
logger.info(new MapMessage(m));

Logs may still need secondary parsing

Log event identifiers

  • Local timestamp (UTC, standard format)
  • Host ID
  • container ID
  • App ID
  • Transaction ID
  • Request ID
  • User ID
  • Tenant/environment ID
  • App version/build

Separating Signal from Noise

Why?

Logs are the primary telemetry channel from production

There are 2 types of logs

 

  • Stats telemetry
  • Debug telemetry

 

Ideally we would have close to zero debug telemetry

But we don't know in advance what we'll need

Log levels

  • TRACE, DEBUG, INFO, WARN, ERROR

  • + FATAL on Log4j

  • Custom levels on Log4j
  • TRACE, FATAL rarely used

 

Fastest filtering mechanism

A normal error is not

Logger hierarchy

  • Configure only top loggers
  • Inherits minimal level
  • Custom names
// logback, log4j 1.x
Logger logger = LoggerFactory.getLogger(this.class);
Logger logger = LoggerFactory.getLogger("audit." + this.class);
Logger rootLogger = LoggerFactory.getRootLogger();

// log4j 2.x
Logger logger = LogManager.getLogger(this.class);

Tags, Markers

import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.LogManager;
import java.util.Map;
 
public class MyApp {
 
    private Logger logger = LogManager.getLogger(MyApp.class.getName());
    private static final Marker SQL_MARKER = MarkerManager.getMarker("SQL");
    private static final Marker UPDATE_MARKER = 
        MarkerManager.getMarker("SQL_UPDATE").setParents(SQL_MARKER);
    private static final Marker QUERY_MARKER = 
        MarkerManager.getMarker("SQL_QUERY").setParents(SQL_MARKER);
 
    public String doQuery(String table) {
        logger.traceEntry(param);
        logger.debug(QUERY_MARKER, "SELECT * FROM {}", table);
        ...
        return logger.traceExit(ret);
    }
 
    public String doUpdate(String table, Map<String, String> params) {
        logger.traceEntry(param);
        logger.debug(UPDATE_MARKER, "UPDATE {} SET {}", table, formatCols());
        ...
        return logger.traceExit(ret);
    }

}

Flow tracing

  • .exit, .enter, .catching, .throwing
  • TRACE log level with markers
  • markers inherit from FLOW
public int exampleException(String arg) {
    logger.traceEntry(arg);
    int n;
    try {
        String msg = messages[messages.length];
        n = someFunction(msg); 
        logger.error("An exception should have been thrown");
    } catch (Exception ex) {
        logger.catching(ex);
    }
    return logger.traceExit(n);
}

Probably a bad idea for high performance servers

Log sampling

  • Log 1/N transactions
  • Get same transaction for all events
ThreadContext.put("userFraction", Random.nextInt(100));
<ThreadContextMapFilter onMatch="ACCEPT" onMismatch="NEUTRAL" operator="or">
   <KeyValuePair key="userFraction" value="1"/>
</ThreadContextMapFilter>

Alternatively, log only userID XXX

Log sampling

  • Finer logs of specific transactions
  • Change log level dynamically
ThreadContext.put("userId", requestContext.getUserID());
// trace suspicious requests
if (requestContext.isTrace)
  ThreadContext.put("trace", "true");
<DynamicThresholdFilter key="UserId" defaultThreshold="ERROR"
       onMatch="ACCEPT" onMismatch="NEUTRAL">
   <KeyValuePair key="avishai" value="DEBUG"/>
</DynamicThresholdFilter>

<DynamicThresholdFilter key="trace" defaultThreshold="ERROR" 
       onMatch="ACCEPT" onMismatch="NEUTRAL">
   <KeyValuePair key="true" value="TRACE"/>
</DynamicThresholdFilter>

Retroactive logging

  • Log debug to ringbuffer
  • Drop if success
  • Flush to logger if fail
void Constructor() {
    CircularFifoBuffer logBuffer = new CircularFifoBuffer(1000);
}

void someMethod() {
    logBuffer.add("some message");
    logBuffer.add(() -> "lazy debug message");
    
    try {
        ...
    } catch (Exception ex) {
        flushBuffer(logBuffer);
    }
    logBuffer.clear();
}

void flushBuffer(CircularFifoBuffer buffer) {
    for (Object o: buffer) {
        logger.debug(o);
    }
}

Production grade logging

Log rotation

  • On disk logs are a buffer
  • Use sized based rotation
  • Prefer a separate partition

Serialized Logs

  • Faster than message formatters
  • Structured data, no parsing
  • Easier to integrate with pipelines

 

Common: GELF, json, msgpack, thrift

Async appenders

huge performance boost, fault tolerance

IMPORTANT: properly shutdown logging subsystem 

Handling load

AsyncAppender

  • Wraps another appender

  • Events placed in queue and consumed by worker thread

  • Logback - Drop TRACE, DEBUG and INFO when 80% full

  • Log4j2 - Fail over to secondary error appender

BurstFilter (log4j 2.x)

  • Drop events when rate exceeded
  • Allow bursts
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="warn" name="MyApp" packages="">
  <Appenders>
    <RollingFile name="RollingFile" fileName="logs/debug.log"
                 filePattern="logs/debug-%d{MM-dd-yyyy}.log.gz">
      <BurstFilter level="INFO" rate="100" maxBurst="1000"/>
      <PatternLayout>
        <pattern>%d %p %c{1.} [%t] %m%n</pattern>
      </PatternLayout>
      <TimeBasedTriggeringPolicy />
    </RollingFile>
  </Appenders>
  <Loggers>
    <Root level="debug">
      <AppenderRef ref="RollingFile"/>
    </Root>
  </Loggers>
</Configuration>

IsDebugEnabled

if (logger.isDebugEnabled()) {
    logger.debug("some debug message, value: " + object.toString());
} 

What's wrong with this picture?

logger.debug("some debug message, value: {}", object);

// or with log4j 2.x and java 8
logger.debug("some debug message, value: {}", () -> object.expensiveMethod()); 

When level>=DEBUG: 2 functions calls per log message

 

Better*:

Runtime reconfig

  • Auto config file reload
  • JMX (log4j 2.x)
  • Programatic

Questions?

jobs@wix.com

Happy logging!

jobs@wix.com

Finding a needle in a haystack - JVM logging guide

By Avishai Ish-Shalom

Finding a needle in a haystack - JVM logging guide

  • 5,748