Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving stdlib integration #395

Open
Tinche opened this issue Jan 23, 2022 · 8 comments
Open

Improving stdlib integration #395

Tinche opened this issue Jan 23, 2022 · 8 comments
Labels

Comments

@Tinche
Copy link
Contributor

Tinche commented Jan 23, 2022

This is more of a discussion than an issue to be fixed.

I have recently refactored how our services log. Other folks might choose to do so too (and we could do some marketing to nudge them), so it might be useful to polish up this use case.

All our modules use structlog to perform logging calls. Before, the logs would pass through structlog and then enter the standard library logging system, where they would be printed out by a handler.

The flow was: structlog -> logging -> stdout.

Now, we still use structlog, but the messages never touch the standard library - they get emitted by the PrintLogger. As you yourself noted, it's faster and there's 0 reason for us to go through the stdlib.

So now the flow is: structlog -> stdout.

While doing this, I encountered two issues. One is that the standard library loggers support name, and the PrintLogger doesn't. I solved that one by subclassing and sticking a name attribute on it. I think logger names are very useful and maybe we could improve the situation somehow. Not necessarily adding a name attribute to PrintLogger, maybe we can be more clever somehow (maybe inject the 'name' key into a context somewhere when creating the logger? Not sure). But this required a little boilerplate and it got done.

The second issue is: other libraries still use the standard library to log. This was actually a problem before as well, but I just realized it as I was doing the transition. There's nothing we can really do about that. But I would like those logs to honor my structlog configuration.

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

The flow for 3rd party logs would be: logging -> structlog -> stdout.

The main benefit would be configuring the logging output only once.

@hynek
Copy link
Owner

hynek commented Jan 24, 2022

name should be easy, but isn't the second problem almost there with https://www.structlog.org/en/stable/standard-library.html#rendering-using-structlog-based-formatters-within-logging? Might be just a matter of documentation? 🤔

@Tinche
Copy link
Contributor Author

Tinche commented Jan 24, 2022

Hm, close. All of these terminate in a stdlib handler. It's also making the flow complex - the last diagram is user -> structlog -> stdlib -> structlog -> stdlib.

My thought was that if we write a class StructlogHandler(logging.Handler) we can have a better and faster flow.

So my code would be user -> structlog (the StructlogHandler isn't even used here), and 3rd party code would be 3rd party -> stdlib -> structlog (StructlogHandler is used here).

@hynek
Copy link
Owner

hynek commented Jan 29, 2022

Yeah I guess it could be kept simpler. Just processors and whatever is printing.

@hynek hynek added the stdlib label Apr 2, 2024
@sscherfke
Copy link
Contributor

We’ve been using structlog for a few projects since several years now and I’ve been wanting to write an article about this for some time.

Our requirements (wich are quite similar to @Tinche’s:

  • Use structlog alongside stdlib logging (which is still used by many other internal and 3rd party libs.
  • Have the same formatting for structlog and stdlib based loggers
  • Allow multiple named loggers (like stdlib) to have different log levels (current app logs on info, everything else on warning by default. Allow to set individual loggers to info or debug for debugging purposes)
  • Configure everything from external settings, exactly once on startup (with option to re-configure) (not shown in the example below)
  • Add additional fields
  • Add additional log levels (for b/w compat with legacy code, now shown below)

Here is a (nearly) minimal example for how we setup logging:

import logging
import functools
import sys
import logging.config
from datetime import UTC, datetime

import orjson
import structlog
import structlog._log_levels
from structlog.processors import _json_fallback_handler as orjson_fallback_handler
from structlog.tracebacks import ExceptionDictTransformer
from structlog.types import EventDict, Processor, WrappedLogger

import structlog
import structlog._log_levels


def render_orjson(
    dict_tracebacks: structlog.processors.ExceptionRenderer,
    /,
    logger: WrappedLogger,
    name: str,
    event_dict: EventDict,
) -> str:
    """
    Render log records as JSON with ``orjson`` and use dict tracebacks.

    This function must be wrapped with :func:`functools.partial()` and the
    *dict_tracebacks* function has to be passed to it::

        "processor": functools.partial(render_orjson, dict_tracebacks),

    We cannot use :class:`structlog.processors.JSONRenderer` because it returns bytes
    and the :class:`structlog.stdlib.LoggerFactory` needs strings. There is also the
    :class:`structlog.BytesLoggerFactory` but we cannot combine them.

    We also do the dict traceback formatting here, which allows us to simplify the
    stdlib log hander config::

        "()": structlog.stdlib.ProcessorFormatter,
        "processor": functools.partial(render_orjson, dict_tracebacks),

    vs::

        "()": structlog.stdlib.ProcessorFormatter,
        "processors": [
            structlog.stdlib.ProcessorFormatter.remove_processors_meta,
            structlog.processors.dict_tracebacks,
            render_orjson,
        ],
    """

    event_dict = dict_tracebacks(logger, name, event_dict)
    if event_dict.get("exception"):
        e = event_dict["exception"][0]
        event_dict["exc_type"] = e.get("exc_type", "?")
        event_dict["exc_value"] = e.get("exc_value", "?")

    try:
        # Create a new event dict with improved key sorting
        # because even JSON logs will also be viewed by humans. ;-)
        sorted_dict: EventDict = {
            "timestamp": event_dict["timestamp"],
            "level": event_dict["level"],
            "event": event_dict["event"],
            **event_dict,
        }
    except KeyError:
        # pytest-structlog removes the "timestamp" key.
        # In that case, just use the original event dict.
        sorted_dict = event_dict
    return orjson.dumps(sorted_dict, default=orjson_fallback_handler).decode()


def add_extra_fields(
    logger: logging.Logger, method_name: str, event_dict: EventDict
) -> EventDict:
    """
    A structlog processor that adds extra info fields to the event dict.

    This stems largely from structlog but this way, it is more customizable and maybe
    a bit faster, too.
    """
    # ISO UTC tiemstamp
    # structlog.processors.TimeStamper(fmt="iso", utc=True),
    event_dict["timestamp"] = datetime.now(tz=UTC).isoformat().replace("+00:00", "Z")

    # Logger name
    # structlog.stdlib.add_logger_name,
    record = event_dict.get("_record")
    if record is None:
        event_dict["logger"] = logger.name
    else:
        event_dict["logger"] = record.name

    # Log level in UPPER CASE
    # structlog.stdlib.add_log_level,
    if method_name == "warn":
        # The stdlib has an alias
        method_name = "warning"
    elif method_name == "exception":
        # exception("") method is the same as error("", exc_info=True)
        method_name = "error"
    event_dict["level"] = method_name.upper()

    # Log level number
    # structlog.stdlib.add_log_level_number,  # not needed right now
    # event_dict["level_number"] = structlog._log_levels._NAME_TO_LEVEL[method_name]

    return event_dict


def setup_logging() -> None:
    shared_processors: list[Processor] = [
        add_extra_fields,
        structlog.stdlib.ExtraAdder(),
    ]
    # Setup structlog
    #
    # The performance guide (https://www.structlog.org/en/stable/performance.html)
    # suggest to not pump structlog log entries through stdlib and use
    # a BytesLoggerFactory instead.
    #
    # However, that would not work for us, because structlog would than only use single
    # logger instance in all libs and apps.  But we want to be able to configure
    # different log levels for our app and the various libs.  This only works if we
    # use the stdlib logger factory that can create different loggers with different
    # names.
    pre_chain: list[Processor] = [
        structlog.stdlib.filter_by_level,
        structlog.contextvars.merge_contextvars,
    ]
    post_chain: list[Processor] = [
        structlog.stdlib.PositionalArgumentsFormatter(),
        structlog.processors.StackInfoRenderer(),
        structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
    ]
    structlog.configure(
        cache_logger_on_first_use=True,
        # The filtering_bound_logger is fast, but it does not allow us to change the
        # log level at a later stage since the filtering is hard code in the logger
        # instances themselves.
        # In addition, we'd have to compute the minimum log level for all loggers and
        # use this, which might drastically reduce the positive performance impact.
        # wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        wrapper_class=structlog.stdlib.BoundLogger,
        processors=pre_chain + shared_processors + post_chain,
        context_class=dict,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )

    dict_tracebacks = structlog.processors.ExceptionRenderer(
        ExceptionDictTransformer(
            # ...  # Config from settings
        )
    )
    rich_tracebacks = structlog.dev.RichTracebackFormatter(
        # ...  # Config from settings
    )

    # Setup stdlib
    cfg = {
        "version": 1,
        "disable_existing_loggers": False,
        "formatters": {
            "json": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": functools.partial(render_orjson, dict_tracebacks),
                "foreign_pre_chain": shared_processors,
            },
            "colored": {
                "()": structlog.stdlib.ProcessorFormatter,
                "processor": structlog.dev.ConsoleRenderer(
                    colors=True, exception_formatter=rich_tracebacks
                ),
                "foreign_pre_chain": shared_processors,
            },
        },
        "handlers": {
            "stream": {
                "formatter": "colored" if sys.stdout.isatty() else "json",
                "class": "logging.StreamHandler",
                "stream": "ext://sys.stderr",
            },
        },
        "root": {
            "handlers": ["stream"],
            "level": logging.INFO,  # Acutally loaded from settings
        },
        # Configure log levels for individual loggers from settings
        # "loggers": {name: {"level": level.value} for name, level in levels.items()},
    }
    logging.config.dictConfig(cfg)


def main() -> None:
    setup_logging()
    logging.getLogger("app").info("logging")
    structlog.get_logger("app").info("structlog")


if __name__ == "__main__":
    main()

@cjw296
Copy link

cjw296 commented Feb 13, 2025

So I was thinking we should write a StructlogHandler class that users can register with the stdlib logging system. It'd essentially just forward the logs into structlog. I want my exceptions to be a nice little JSON line instead of a multiline stack trace like it is by default.

I came looking for StructlogHandler too, now that I finally have some time to devote to "actually learning structlog" rather than just telling folks how good it is an not using it myself 😅

I can't see if I'm just missing this in @sscherfke's comment above, but if not, would a contribution of a StructlogHandler that basically took a stdlib logging event and munged it into a structlog event, and that could be plugged into the root stdlib logger, or additionally to a specific logger, if you knew that got heavy traffic that you care about?

(of course, there's no reason this couldn't actually just be a third-party library, would that be better / preferred?)

@hynek
Copy link
Owner

hynek commented Feb 15, 2025

I'm open to anything that will make the number of confused people about standard library integration go down, since that's been a bane of my existence since structlog was born.

I would suggest to offer some kind of implementation plan / design document that we can discuss first!

@sscherfke
Copy link
Contributor

My constraints:

  • some code uses structlog
  • older internal and most 3rd party packages use stdlib logging

My requirements:

  • Output from structlog and stdlib looks the same
  • Switch between JSON and Colored depending on the environment
  • Make use of stdlib's tree structore for loggers and dynamic log levels depending on the logger's position in the tree

My code

  • Pipes all struclog events through various processors and finally prepares them for the stdlib ProcessorFormatter.
  • Uses the ProcessorFormatter as stdlib handler and applies some shared processors to stdlib log records as well (as I am writing this, I am wondering whether these processors are applied to the already processed structlog events, too 🤔 )
  • Uses structlog to render all records

I'm not sure if this is the same (or at least a similar) use cases as @cjw296 's? I can imagine that many "structlog + stdlib = ❤ " use cases slightly vary which would make it difficult to find a proper abstraction. I will think about this.

@cjw296
Copy link

cjw296 commented Feb 16, 2025

My requirement is basically to get stdlib logging events out of stdlib-land as soon as possible and into structlog land, and then just do everything I can in structlog land.

Back in 2016, I bounced off structlog as "too big and too hard to configure" and started writing my own thing called shoehorn, I might see if I can re-purpose that now to solve this problem, and if it feels good, @hynek can merge whatever I end up with into structlog itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants