Improving stdlib integration #395

Tinche · 2022-01-23T17:03:41Z

This is more of a discussion than an issue to be fixed.

I have recently refactored how our services log. Other folks might choose to do so too (and we could do some marketing to nudge them), so it might be useful to polish up this use case.

All our modules use structlog to perform logging calls. Before, the logs would pass through structlog and then enter the standard library logging system, where they would be printed out by a handler.

The flow was: structlog -> logging -> stdout.

Now, we still use structlog, but the messages never touch the standard library - they get emitted by the PrintLogger. As you yourself noted, it's faster and there's 0 reason for us to go through the stdlib.

So now the flow is: structlog -> stdout.

While doing this, I encountered two issues. One is that the standard library loggers support name, and the PrintLogger doesn't. I solved that one by subclassing and sticking a name attribute on it. I think logger names are very useful and maybe we could improve the situation somehow. Not necessarily adding a name attribute to PrintLogger, maybe we can be more clever somehow (maybe inject the 'name' key into a context somewhere when creating the logger? Not sure). But this required a little boilerplate and it got done.

The second issue is: other libraries still use the standard library to log. This was actually a problem before as well, but I just realized it as I was doing the transition. There's nothing we can really do about that. But I would like those logs to honor my structlog configuration.

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

The flow for 3rd party logs would be: logging -> structlog -> stdout.

The main benefit would be configuring the logging output only once.

The text was updated successfully, but these errors were encountered:

hynek · 2022-01-24T07:48:08Z

name should be easy, but isn't the second problem almost there with https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging? Might be just a matter of documentation? 🤔

Tinche · 2022-01-24T23:06:28Z

Hm, close. All of these terminate in a stdlib handler. It's also making the flow complex - the last diagram is user -> structlog -> stdlib -> structlog -> stdlib.

My thought was that if we write a class StructlogHandler(logging.Handler) we can have a better and faster flow.

So my code would be user -> structlog (the StructlogHandler isn't even used here), and 3rd party code would be 3rd party -> stdlib -> structlog (StructlogHandler is used here).

hynek · 2022-01-29T17:12:56Z

Yeah I guess it could be kept simpler. Just processors and whatever is printing.

sscherfke · 2024-12-01T20:16:12Z

We’ve been using structlog for a few projects since several years now and I’ve been wanting to write an article about this for some time.

Our requirements (wich are quite similar to @Tinche’s:

Use structlog alongside stdlib logging (which is still used by many other internal and 3rd party libs.
Have the same formatting for structlog and stdlib based loggers
Allow multiple named loggers (like stdlib) to have different log levels (current app logs on info, everything else on warning by default. Allow to set individual loggers to info or debug for debugging purposes)
Configure everything from external settings, exactly once on startup (with option to re-configure) (not shown in the example below)
Add additional fields
Add additional log levels (for b/w compat with legacy code, now shown below)

Here is a (nearly) minimal example for how we setup logging:

import logging
import functools
import sys
import logging.config
from datetime import UTC, datetime

import orjson
import structlog
import structlog._log_levels
from structlog.processors import _json_fallback_handler as orjson_fallback_handler
from structlog.tracebacks import ExceptionDictTransformer
from structlog.types import EventDict, Processor, WrappedLogger

import structlog
import structlog._log_levels


def render_orjson(
    dict_tracebacks: structlog.processors.ExceptionRenderer,
    /,
    logger: WrappedLogger,
    name: str,
    event_dict: EventDict,
) -> str:
    """
    Render log records as JSON with ``orjson`` and use dict tracebacks.

    This function must be wrapped with :func:`functools.partial()` and the
    *dict_tracebacks* function has to be passed to it::

        "processor": functools.partial(render_orjson, dict_tracebacks),

    We cannot use :class:`structlog.processors.JSONRenderer` because it returns bytes
    and the :class:`structlog.stdlib.LoggerFactory` needs strings. There is also the
    :class:`structlog.BytesLoggerFactory` but we cannot combine them.

    We also do the dict traceback formatting here, which allows us to simplify the
    stdlib log hander config::

        "()": structlog.stdlib.ProcessorFormatter,
        "processor": functools.partial(render_orjson, dict_tracebacks),

    vs::

        "()": structlog.stdlib.ProcessorFormatter,
        "processors": [
            structlog.stdlib.ProcessorFormatter.remove_processors_meta,
            structlog.processors.dict_tracebacks,
            render_orjson,
        ],
    """

    event_dict = dict_tracebacks(logger, name, event_dict)
    if event_dict.get("exception"):
        e = event_dict["exception"][0]
        event_dict["exc_type"] = e.get("exc_type", "?")
        event_dict["exc_value"] = e.get("exc_value", "?")

    try:
        # Create a new event dict with improved key sorting
        # because even JSON logs will also be viewed by humans. ;-)
        sorted_dict: EventDict = {
            "timestamp": event_dict["timestamp"],
            "level": event_dict["level"],
            "event": event_dict["event"],
            **event_dict,
        }
    except KeyError:
        # pytest-structlog removes the "timestamp" key.
        # In that case, just use the original event dict.
        sorted_dict = event_dict
    return orjson.dumps(sorted_dict, default=orjson_fallback_handler).decode()


def add_extra_fields(
    logger: logging.Logger, method_name: str, event_dict: EventDict
) -> EventDict:
    """
    A structlog processor that adds extra info fields to the event dict.

    This stems largely from structlog but this way, it is more customizable and maybe
    a bit faster, too.
    """
    # ISO UTC tiemstamp
    # structlog.processors.TimeStamper(fmt="iso", utc=True),
    event_dict["timestamp"] = datetime.now(tz=UTC).isoformat().replace("+00:00", "Z")

    # Logger name
    # structlog.stdlib.add_logger_name,
    record = event_dict.get("_record")
    if record is None:
        event_dict["logger"] = logger.name
    else:
        event_dict["logger"] = record.name

    # Log level in UPPER CASE
    # structlog.stdlib.add_log_level,
    if method_name == "warn":
        # The stdlib has an alias
        method_name = "warning"
    elif method_name == "exception":
        # exception("") method is the same as error("", exc_info=True)
        method_name = "error"
    event_dict["level"] = method_name.upper()

    # Log level number
    # structlog.stdlib.add_log_level_number,  # not needed right now
    # event_dict["level_number"] = structlog._log_levels._NAME_TO_LEVEL[method_name]

    return event_dict


def setup_logging() -> None:
    shared_processors: list[Processor] = [
        add_extra_fields,
        structlog.stdlib.ExtraAdder(),
    ]
    # Setup structlog
    #
    # The performance guide (https://www.structlog.org/en/stable/performance.html)
    # suggest to not pump structlog log entries through stdlib and use
    # a BytesLoggerFactory instead.
    #
    # However, that would not work for us, because structlog would than only use single
    # logger instance in all libs and apps.  But we want to be able to configure
    # different log levels for our app and the various libs.  This only works if we
    # use the stdlib logger factory that can create different loggers with different
    # names.
    pre_chain: list[Processor] = [
        structlog.stdlib.filter_by_level,
        structlog.contextvars.merge_contextvars,
    ]
    post_chain: list[Processor] = [
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.StackInfoRenderer(),
        structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
    ]
    structlog.configure(
        cache_logger_on_first_use=True,
        # The filtering_bound_logger is fast, but it does not allow us to change the
        # log level at a later stage since the filtering is hard code in the logger
        # instances themselves.
        # In addition, we'd have to compute the minimum log level for all loggers and
        # use this, which might drastically reduce the positive performance impact.
        # wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        wrapper_class=structlog.stdlib.BoundLogger,
        processors=pre_chain + shared_processors + post_chain,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )

    dict_tracebacks = structlog.processors.ExceptionRenderer(
        ExceptionDictTransformer(
            # ...  # Config from settings
        )
    )
    rich_tracebacks = structlog.dev.RichTracebackFormatter(
        # ...  # Config from settings
    )

    # Setup stdlib
    cfg = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "json": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": functools.partial(render_orjson, dict_tracebacks),
                "foreign_pre_chain": shared_processors,
            },
            "colored": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(
                    colors=True, exception_formatter=rich_tracebacks
                ),
                "foreign_pre_chain": shared_processors,
            },
        },
        "handlers": {
            "stream": {
                "formatter": "colored" if sys.stdout.isatty() else "json",
                "class": "logging.StreamHandler",
                "stream": "ext://sys.stderr",
            },
        },
        "root": {
            "handlers": ["stream"],
            "level": logging.INFO,  # Acutally loaded from settings
        },
        # Configure log levels for individual loggers from settings
        # "loggers": {name: {"level": level.value} for name, level in levels.items()},
    }
    logging.config.dictConfig(cfg)


def main() -> None:
    setup_logging()
    logging.getLogger("app").info("logging")
    structlog.get_logger("app").info("structlog")


if __name__ == "__main__":
    main()

cjw296 · 2025-02-13T18:38:20Z

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

I came looking for StructlogHandler too, now that I finally have some time to devote to "actually learning structlog" rather than just telling folks how good it is an not using it myself 😅

I can't see if I'm just missing this in @sscherfke's comment above, but if not, would a contribution of a StructlogHandler that basically took a stdlib logging event and munged it into a structlog event, and that could be plugged into the root stdlib logger, or additionally to a specific logger, if you knew that got heavy traffic that you care about?

(of course, there's no reason this couldn't actually just be a third-party library, would that be better / preferred?)

hynek · 2025-02-15T07:25:57Z

I'm open to anything that will make the number of confused people about standard library integration go down, since that's been a bane of my existence since structlog was born.

I would suggest to offer some kind of implementation plan / design document that we can discuss first!

sscherfke · 2025-02-15T23:07:54Z

My constraints:

some code uses structlog
older internal and most 3rd party packages use stdlib logging

My requirements:

Output from structlog and stdlib looks the same
Switch between JSON and Colored depending on the environment
Make use of stdlib's tree structore for loggers and dynamic log levels depending on the logger's position in the tree

My code

Pipes all struclog events through various processors and finally prepares them for the stdlib ProcessorFormatter.
Uses the ProcessorFormatter as stdlib handler and applies some shared processors to stdlib log records as well (as I am writing this, I am wondering whether these processors are applied to the already processed structlog events, too 🤔 )
Uses structlog to render all records

I'm not sure if this is the same (or at least a similar) use cases as @cjw296 's? I can imagine that many "structlog + stdlib = ❤ " use cases slightly vary which would make it difficult to find a proper abstraction. I will think about this.

cjw296 · 2025-02-16T18:58:28Z

My requirement is basically to get stdlib logging events out of stdlib-land as soon as possible and into structlog land, and then just do everything I can in structlog land.

Back in 2016, I bounced off structlog as "too big and too hard to configure" and started writing my own thing called shoehorn, I might see if I can re-purpose that now to solve this problem, and if it feels good, @hynek can merge whatever I end up with into structlog itself.

hynek added the stdlib label Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving stdlib integration #395

Improving stdlib integration #395

Tinche commented Jan 23, 2022

hynek commented Jan 24, 2022

Tinche commented Jan 24, 2022

hynek commented Jan 29, 2022

sscherfke commented Dec 1, 2024

cjw296 commented Feb 13, 2025 •

edited

Loading

hynek commented Feb 15, 2025

sscherfke commented Feb 15, 2025

cjw296 commented Feb 16, 2025

Improving stdlib integration #395

Improving stdlib integration #395

Comments

Tinche commented Jan 23, 2022

hynek commented Jan 24, 2022

Tinche commented Jan 24, 2022

hynek commented Jan 29, 2022

sscherfke commented Dec 1, 2024

cjw296 commented Feb 13, 2025 • edited Loading

hynek commented Feb 15, 2025

sscherfke commented Feb 15, 2025

cjw296 commented Feb 16, 2025

cjw296 commented Feb 13, 2025 •

edited

Loading