Log Level Per Request

For applications in production, setting breakpoints is usually not a viable option. Logging is the preferred approach to troubleshoot problems in production. For runtime efficiency, storage and analysis we often want to log only the most important messages. However, when an error occurs, a more detailed log is needed to find the root cause. But turning on full debug or trace level logging can result in huge logs and slow down analysis.

The idea of log level per request is to increase the level of logging only for that request that's causing the error. Executions triggered by other requests will continue to be logged in the default manner. This approach is particularly useful for microservices architecture where a single request can involve multiple microservices, all of which are possibly handling hundreds of requests per second.

Discussion

  • Why is log level per request important for microservices?
    A single request may involve many microservices. Source: Arsov 2017.
    A single request may involve many microservices. Source: Arsov 2017.

    Complex large-scale applications are moving away from monolithic architecture to microservices architecture. But failures are hard to troubleshoot in microservices. A single HTTP request can trigger a sequence of calls and responses across many microservices. This distributed nature of microservices makes it hard to debug problems such as malformed data. It's harder to track down performance issues.

    In monolithic applications, it's easy to put breakpoints and follow the sequence of function calls to understand the problem, at least in a development environment. This isn't possible for microservices, particularly in production. Log level per request therefore helps us obtain detailed logs pertaining to a particular problem. In general, logging, monitoring and measuring comes under instrumentation.

    For example, a microservice may be called in dozens of user scenarios only one of which is causing an error. Detailed debug logging can be turned on for this single scenario. We avoid overwhelming the system with debug level logging for all other high-level user requests.

  • What should I expect from a good implementation of log level per request?

    To change log level on a per request basis, you shouldn't be asked to recompile the application. It should be possible to set the log level easily for different environments: development, production, user acceptance testing, non-functional testing, etc. Log level should be configurable for each application or component, and independent of event-raising code.

    It's recommended that you log entry and exit of functions. Log enough to set the context, including name of function, user, or even IP address. Logs should be complete in the sense that the information should help troubleshoot a problem. Past incident reports can help you decide what, where and how much to log. In time, you can even develop logging patterns. Your development pipeline should catch code where logs are missing. Deployment may flag functions that are not logging.

    In general, for microservices logging, centralize the logging from different hosts. Log structured data, such as in JSON format. Log asynchronously. Make logs searchable, for instance, by using ELK Stack.

  • What are some possible techniques for implementing log level per request?
    Use of correlation IDs in AWS. Source: Cui 2017.
    Use of correlation IDs in AWS. Source: Cui 2017.

    One suggested approach is to use a Correlation ID, passed from service to service. This ID is unique to a particular transaction or a high-level request. With this ID, we can follow the execution journey of a transaction in the logs. Moreover, we can set the desired log level for this transaction.

    To change log levels without compiling code, we could have a definitive list of all events/requests and a configurable mapping between these and the log levels. Such a mapping could be in an XML file.

    A correlation ID can be used whenever we wish to group together errors due to a request. Some examples of grouping are scheduled or background tasks; synchronized operations; all activities pertaining to a single request.

    A correlation ID can be generated by an "edge" microservice that then triggers internal microservices. In addition, any microservice that doesn't see an ID can generate one for subsequent tracing. In fact, an API Gateway in AWS generates these IDs that are passed along in HTTP headers. Preferably, the ID is assigned as early as possible.

  • Isn't log level per request same as event tracing?

    They are related but not the same. Event tracing is about tracing an event and its effect across systems and services. It's supported by tools such as Sentry. Not just events or requests, even messages in a queue can carry the trace ID. However, traditionally event tracing was not really about dynamically setting the log level on a per request basis. It was about correlating log entries based on a trace ID.

  • Could you name some tools that support log level per request?

    Tracing frameworks might support this feature. Some of them may be based on the OpenTracing standard. Zipkin, Datadog, and Dynatrace are tools that accept OpenTracing format.

    Logging frameworks are tied to languages: Log4j (Java), NLog (.NET), Node-Loggly (Node), etc. Select a framework based on code cleanliness, performance and familiarity.

    If using Spring Cloud for implementing microservices, use Spring Sleuth along with Zipkin for using correlation IDs. You can use Spring Boot admin to change log levels dynamically.

    In .NET framework, namespace System.Diagnostics has the property CorrelationManager.ActivityId. This is set per request. Thus, all errors thrown during one request will have the same ID.

    ModSecurity, a web application firewall, supports log level per request and also allows us to change the level at runtime.

Milestones

2010
Debug and trace levels incur exponentially more processing time. Source: Pardal 2010, fig. 16.
Debug and trace levels incur exponentially more processing time. Source: Pardal 2010, fig. 16.

At the University of Lisbon, researchers extend a framework called STEP for monitoring and performance analysis. STEP is a Java-based enterprise application framework created for educational purpose. They use Apache Log4j for logging and note that debug and trace levels are impractical for production environments. They propose activating such logging for a subset of request such as requests from a specific user.

2012

Matthew Skelton writes about tuning logging levels in production without recompiling code. He notes that often logging API suffer from leaky abstraction because application selects the log level at the point of calling. Thus, we would need to recompile to effect new logging levels. Apache log4net allows change of log levels without recompiling but it can't do this for specific event types.

Jul
2014

Apache Log4j 2 (version 2.0) is released. Using ThreadContext and DynamicThresholdFilter, we can control logging level on a per-request basis. Therefore, we can "debug batch jobs which have a certain flag, WebSocket connections or any other thread-based processing". This is in addition to debugging HTTP requests by setting a flag such as x-debug-enabled: true in the HTTP header.

Dec
2016
Correlation ID is passed across multiple services. Source: Rapid7 2016.
Correlation ID is passed across multiple services. Source: Rapid7 2016.

A Rapid7 blog post notes that the use of correlation IDs is a well-known enterprise integration pattern called Correlation Pattern. Such an ID is a non-standard HTTP header and it's part of Java Messaging Service (JMS). An example is a travel reservation system. Once an ID is assigned to a reservation request, the same ID is passed to hotel, car rental and airline systems/services. Timestamps are not enough in logs and a correlation ID "provides the glue that is needed to create a coherent, understandable audit of events".

Jan
2017

One developer named Reik shows how to use Twitter's Finagle framework to dynamically set per-request log level. The idea is to use Finagle's "broadcast context to transport a log level override through an RPC graph". HTTP headers are used and Finagle filters can read them.

May
2018

ThoughtWords includes log level per request in its quarterly Technology Radar. It notes its relevance towards debugging microservices.

References

  1. Apache Logging Services. 2019. "Changes." Apache Log4j 2. Accessed 2019-10-03.
  2. Arsov, Kristijan. 2017. "What Are Microservices, Actually?" DZone, May 17. Accessed 2019-10-02.
  3. Cui, Yan. 2017. "Capture and forward correlation IDs through different Lambda event sources." The Burning Monk, September 04. Accessed 2019-10-02.
  4. Gómez, Maria. 2019. "From Monolith to Observable Microservices Using DDD." QCon, Sao Paulo, August 15. Accessed 2019-10-24.
  5. Heusingfeld, Alexander. 2015. "Per request debugging with Log4j 2 filters." INNOQ, May 08. Accessed 2019-10-02.
  6. Holtermann, Markus. 2019. "Logging Rethought." April 12. Accessed 2019-10-02.
  7. Janapati, Siva Prasad Rao. 2017. "Distributed Logging Architecture for Microservices." DZone, August 01. Accessed 2019-10-02.
  8. Pardal, Miguel L. 2010. "Performance assessment of the STEP Framework." Technical University of Lisbon, via SemanticScholar. Accessed 2019-10-02.
  9. Progress Sitefinity Docs. 2019. "Enable debugging and tracing of applications." Sitefinity CMS Documentation, Version 12.1. Accessed 2019-10-02.
  10. Rapid7. 2016. "The Value of Correlation IDs." Blog, Rapid7, December 23. Accessed 2019-10-02.
  11. Reik. 2017. "Dynamic Per-Request Log Level." Java Code Splitter, January. Accessed 2019-10-02.
  12. Ristić, Ivan. 2018. "Chapter 4: Logging." ModSecurity Handbook: Getting Started, May 07. Accessed 2019-10-02.
  13. Skelton, Matthew. 2012. "Tune logging levels in Production without recompiling code." Blog, December 05. Accessed 2019-10-02.
  14. Swersky, David. 2018. "Best practices for tracing and debugging microservices [2019]." Blog, Raygun, October 11. Updated 2019-01-01. Accessed 2019-10-02.
  15. ThoughtWorks. 2018. "Log level per request." Techniques, Technology Radar, ThoughtWorks, May. Accessed 2019-10-02.

Further Reading

  1. Swersky, David. 2018. "Best practices for tracing and debugging microservices [2019]." Blog, Raygun, October 11. Updated 2019-01-01. Accessed 2019-10-02.
  2. Cui, Yan. 2017. "Capture and forward correlation IDs through different Lambda event sources." The Burning Monk, September 04. Accessed 2019-10-02.
  3. Reik. 2017. "Dynamic Per-Request Log Level." Java Code Splitter, January. Accessed 2019-10-02.
  4. Elliot, Steve. 2018. "Dynamic Logging - Log Level Per Request." Tegud.net, May 16. Accessed 2019-10-02.
  5. Yakimov, Ivan. 2019. "Log Level Per Request." CodeProject, February 01. Accessed 2019-10-02.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
2
0
1123
2
0
20
1270
Words
3
Likes
5389
Hits

Cite As

Devopedia. 2019. "Log Level Per Request." Version 4, October 24. Accessed 2024-06-25. https://devopedia.org/log-level-per-request
Contributed by
2 authors


Last updated on
2019-10-24 13:36:08
  • Distributed Tracing
  • OpenTracing
  • Event Correlation
  • Event Streaming
  • Microservices
  • Log Analytics