Finding a Needle in a Haystack
JVM logging guide
Avishai Ish-Shalom (@nukemberg)
Software Fairy @ Wix.com
Logging is easy, right?
Only two log levels
- Too much bullshit
- I'm f*cking blind
Information theory protip:
Different log events do not have the same value
Hard to use
- Hard to parse
- Hard to correlate
- No context
Performance & stability issues
- Too many logs can kill performance
- Clogged logger can block application
- Resource consumption
Robots need love too
What do we want from a good logging library?
- Structured logs
- Integrate with pipeline
- Be fast
- Graceful degradation
- Add context
- Elaborate event filtering
Agenda
- Logging libraries overview
- Structure & context
- Separating signal from noise
- Production grade logging
But before we start
Define "log"
JVM Logging
- JUL
- Log4j 1.x
- Logback
- Log4j 2
- tinylog
- slf4j
- commons-logging
JUL
Please tell me you're not using it*
Log4j 1.x
Performance | Meh |
Extensions | Porn grade |
Features | Essentials + a little extra |
Degradation | what, u don't like blocking? |
Mature, battle tested
Logback
Performance | Good |
Extensions | Porn grade |
Features | Carnaval |
Degradation | Good |
Mature, battle tested, featureful
Log4j 2.x
Performance | Insane |
Extensions | HBO |
Features | Carnaval |
Degradation | Very good |
Complete redesign, featureful
Most awesome logger yet
TinyLog
Performance | batshit crazy |
Extensions | What extensions? |
Features | This is Sparta |
Degradation | Very good |
The logger no one knows about
Fast, minimal
SLF4j, commons-logging
- Abstraction
- Lowest common denominator
- Designed for use in libraries
TLDR: use SLF4J for libraries
Structure & Context
MDC (logback)
ThreadContext (log4j2)
- Thread local
- Set once at transaction start, clear at end
- Strings only
- Useful for user ID, transaction ID, etc
- Not passed to thread pools
- Hard to use with async frameworks (e.g. Akka)
// log4j 2.x
ThreadContext.put("transactionID", UUID.RandomUUID().toString());
// log4j 1.x, logback
MDC.put("userID", userID);
Structured messages
- Native support in Log4j 2.x
- Can "fake" in logback and log4j 1.x
- SLF4J extensions EventLogger
Map m = HashMap();
m.put("extra_data", "extra_value");
logger.info(new StructuredDataMessage("id", "message", "type", m));
m.put("more_data", "more_value");
logger.info(new MapMessage(m));
Logs may still need secondary parsing
Log event identifiers
- Local timestamp (UTC, standard format)
- Host ID
- container ID
- App ID
- Transaction ID
- Request ID
- User ID
- Tenant/environment ID
- App version/build
Separating Signal from Noise
Why?
Logs are the primary telemetry channel from production
There are 2 types of logs
- Stats telemetry
- Debug telemetry
Ideally we would have close to zero debug telemetry
But we don't know in advance what we'll need
Log levels
-
TRACE, DEBUG, INFO, WARN, ERROR
-
+ FATAL on Log4j
- Custom levels on Log4j
- TRACE, FATAL rarely used
Fastest filtering mechanism
A normal error is not
Logger hierarchy
- Configure only top loggers
- Inherits minimal level
- Custom names
// logback, log4j 1.x
Logger logger = LoggerFactory.getLogger(this.class);
Logger logger = LoggerFactory.getLogger("audit." + this.class);
Logger rootLogger = LoggerFactory.getRootLogger();
// log4j 2.x
Logger logger = LogManager.getLogger(this.class);
Tags, Markers
import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.LogManager;
import java.util.Map;
public class MyApp {
private Logger logger = LogManager.getLogger(MyApp.class.getName());
private static final Marker SQL_MARKER = MarkerManager.getMarker("SQL");
private static final Marker UPDATE_MARKER =
MarkerManager.getMarker("SQL_UPDATE").setParents(SQL_MARKER);
private static final Marker QUERY_MARKER =
MarkerManager.getMarker("SQL_QUERY").setParents(SQL_MARKER);
public String doQuery(String table) {
logger.traceEntry(param);
logger.debug(QUERY_MARKER, "SELECT * FROM {}", table);
...
return logger.traceExit(ret);
}
public String doUpdate(String table, Map<String, String> params) {
logger.traceEntry(param);
logger.debug(UPDATE_MARKER, "UPDATE {} SET {}", table, formatCols());
...
return logger.traceExit(ret);
}
}
Flow tracing
- .exit, .enter, .catching, .throwing
- TRACE log level with markers
- markers inherit from FLOW
public int exampleException(String arg) {
logger.traceEntry(arg);
int n;
try {
String msg = messages[messages.length];
n = someFunction(msg);
logger.error("An exception should have been thrown");
} catch (Exception ex) {
logger.catching(ex);
}
return logger.traceExit(n);
}
Probably a bad idea for high performance servers
Log sampling
- Log 1/N transactions
- Get same transaction for all events
ThreadContext.put("userFraction", Random.nextInt(100));
<ThreadContextMapFilter onMatch="ACCEPT" onMismatch="NEUTRAL" operator="or">
<KeyValuePair key="userFraction" value="1"/>
</ThreadContextMapFilter>
Alternatively, log only userID XXX
Log sampling
- Finer logs of specific transactions
- Change log level dynamically
ThreadContext.put("userId", requestContext.getUserID());
// trace suspicious requests
if (requestContext.isTrace)
ThreadContext.put("trace", "true");
<DynamicThresholdFilter key="UserId" defaultThreshold="ERROR"
onMatch="ACCEPT" onMismatch="NEUTRAL">
<KeyValuePair key="avishai" value="DEBUG"/>
</DynamicThresholdFilter>
<DynamicThresholdFilter key="trace" defaultThreshold="ERROR"
onMatch="ACCEPT" onMismatch="NEUTRAL">
<KeyValuePair key="true" value="TRACE"/>
</DynamicThresholdFilter>
Retroactive logging
- Log debug to ringbuffer
- Drop if success
- Flush to logger if fail
void Constructor() {
CircularFifoBuffer logBuffer = new CircularFifoBuffer(1000);
}
void someMethod() {
logBuffer.add("some message");
logBuffer.add(() -> "lazy debug message");
try {
...
} catch (Exception ex) {
flushBuffer(logBuffer);
}
logBuffer.clear();
}
void flushBuffer(CircularFifoBuffer buffer) {
for (Object o: buffer) {
logger.debug(o);
}
}
Production grade logging
Log rotation
- On disk logs are a buffer
- Use sized based rotation
- Prefer a separate partition
Serialized Logs
- Faster than message formatters
- Structured data, no parsing
- Easier to integrate with pipelines
Common: GELF, json, msgpack, thrift
Async appenders
huge performance boost, fault tolerance
IMPORTANT: properly shutdown logging subsystem
Handling load
AsyncAppender
-
Wraps another appender
-
Events placed in queue and consumed by worker thread
-
Logback - Drop TRACE, DEBUG and INFO when 80% full
-
Log4j2 - Fail over to secondary error appender
BurstFilter (log4j 2.x)
- Drop events when rate exceeded
- Allow bursts
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="warn" name="MyApp" packages="">
<Appenders>
<RollingFile name="RollingFile" fileName="logs/debug.log"
filePattern="logs/debug-%d{MM-dd-yyyy}.log.gz">
<BurstFilter level="INFO" rate="100" maxBurst="1000"/>
<PatternLayout>
<pattern>%d %p %c{1.} [%t] %m%n</pattern>
</PatternLayout>
<TimeBasedTriggeringPolicy />
</RollingFile>
</Appenders>
<Loggers>
<Root level="debug">
<AppenderRef ref="RollingFile"/>
</Root>
</Loggers>
</Configuration>
IsDebugEnabled
if (logger.isDebugEnabled()) {
logger.debug("some debug message, value: " + object.toString());
}
What's wrong with this picture?
logger.debug("some debug message, value: {}", object);
// or with log4j 2.x and java 8
logger.debug("some debug message, value: {}", () -> object.expensiveMethod());
When level>=DEBUG: 2 functions calls per log message
Better*:
Runtime reconfig
- Auto config file reload
- JMX (log4j 2.x)
- Programatic
Questions?
jobs@wix.com
Happy logging!
jobs@wix.com
Finding a needle in a haystack - JVM logging guide
By Avishai Ish-Shalom
Finding a needle in a haystack - JVM logging guide
- 5,748