Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardise (and publish?) cache handling in standard library #53642

Closed
ncoghlan opened this issue Jul 28, 2010 · 12 comments
Closed

Standardise (and publish?) cache handling in standard library #53642

ncoghlan opened this issue Jul 28, 2010 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 9396
Nosy @rhettinger, @ncoghlan, @merwok, @bitdancer, @florentx
Dependencies
  • bpo-9567: Add attribute pointing to wrapped function in functools.update_wrapper
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2010-08-22.08:06:14.022>
    created_at = <Date 2010-07-28.11:27:10.412>
    labels = ['type-feature', 'library']
    title = 'Standardise (and publish?) cache handling in standard library'
    updated_at = <Date 2010-08-22.08:51:32.047>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2010-08-22.08:51:32.047>
    actor = 'eric.araujo'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2010-08-22.08:06:14.022>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2010-07-28.11:27:10.412>
    creator = 'ncoghlan'
    dependencies = ['9567']
    files = []
    hgrepos = []
    issue_num = 9396
    keywords = []
    message_count = 12.0
    messages = ['111790', '113273', '113300', '113374', '113378', '113382', '113383', '113385', '113570', '113580', '113791', '114647']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'ncoghlan', 'eric.araujo', 'r.david.murray', 'flox']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9396'
    versions = ['Python 3.2']

    @ncoghlan
    Copy link
    Contributor Author

    The standard library has several cache implementations (e.g. in re, fnmatch and ElementTree) with different cache size limiting strategies. These should be standardised and possibly even exposed for general use.

    Refer to python-dev discussion:
    http://mail.python.org/pipermail/python-dev/2010-July/102473.html

    @ncoghlan ncoghlan added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jul 28, 2010
    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 8, 2010

    With Raymond adding functools.lru_cache and functools.lfu_cache, it should be possible to use those for the various caches in the standard library.

    My only point of concern is that the standard lib caches tend to allow dynamic modification of the max cache size, while the new cache decorators appear to be fixed at the size specified when defining the decorator.

    @rhettinger
    Copy link
    Contributor

    I will take a look at the various caches and see if some of their features can be consolidated. It is okay if some need to have their own strategies or situation specific approaches, but you're right that common features should be looked at to see if they make sense in the functools caches.

    Am not sure that dynamically changing the maxsize is a generally useful feature (most are set to the largest size that makes sense and not revisited latter), but I will take a look to whether that complicates the code (if it's simple and fast, it may be worth doing).

    @rhettinger rhettinger self-assigned this Aug 8, 2010
    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 9, 2010

    Yeah, I'm not sure the dynamic resizing makes sense in general. I was just pointing it out as something supported by the existing caches that could complicate a consolidation effort.

    @rhettinger
    Copy link
    Contributor

    Applied the lru_cache() to fnmatch and re.
    See r83874 r83871.

    I did find a simple way to dynamically resize the maxcache, but did not apply it yet. Will look at more applications to see if it is really needed.

    Nick, thanks for the great ideas. These changes simplified the code where they were applied and resulted in a smarter caching strategy.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 9, 2010

    On Mon, Aug 9, 2010 at 2:29 PM, Raymond Hettinger
    <[email protected]> wrote:

    I did find a simple way to dynamically resize the maxcache, but did not apply it yet.   Will look at more applications to see if it is really needed.

    Nick, thanks for the great ideas.  These changes simplified the code where they were applied and resulted in a smarter caching strategy.

    The reason I mentioned the dynamic sizing specifically was that the
    discussions where we realised we had all these different caches
    floating around had to do with tuning the caching strategy to a
    particular application based on speed vs memory trade-offs. While we
    can pick a number that is a reasonable default, a server deployment
    may want to turn the dial towards faster response times with higher
    memory consumption, while an embedded device may want to push the dial
    the other way. The smarter caching strategies you added are likely to
    help far more than the blunt hammer approach of increasing the cache
    size, but the speed/memory trade-off in choosing that size is still a
    question that has no universally correct answer.

    @rhettinger
    Copy link
    Contributor

    I was thinking about the problem of developers wanting a different cache size than that provided in standard lib modules.

    ISTM that now we've offered caching abilities in functools, a developer can easily add another layer of cache around any API they are interested in. For example, if someone is using thousands of recurring fnmatch patterns, they can write something like:

    @functools.lfu_cache(maxsize=10000) # custom fat cache
    def fnmatchcase(*args):
    return fnmatch.fnmatchcase(*args)

    IMO, this beats adding caching controls to lots of APIs that should be focused only on their problem domain. IOW, it is probably not a good idea to add purge() and cache_resize() functions to multiple modules throughout the standard lib.

    ISTM, we should just provide basic caching with reasonable space consumption (i.e. not huge) that gives improvements to common use cases (like I've done with the fnmatch and re module) and let programmers with unusual cases add their own caching options rather that be tied into our choice of lru vs lfu or whatnot.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Aug 9, 2010

    On Mon, Aug 9, 2010 at 3:11 PM, Raymond Hettinger
    <[email protected]> wrote:

    ISTM, we should just provide basic caching with reasonable space consumption (i.e. not huge) that gives improvements to common use cases (like I've done with the fnmatch and re module) and let programmers with unusual cases add their own caching options rather that be tied into our choice of lru vs lfu or whatnot.

    A very good point! Perhaps we should note that somewhere? I'm not sure
    where though, unless we just mention it in the docs for the relevant
    modules..

    Going the other way (using a smaller, or no, cache), perhaps in
    addition to the new hit/miss attributes, the cache decorators should
    expose the original function to allow the cache to be bypassed?

    @rhettinger
    Copy link
    Contributor

    Great minds think alike. I was just about to propose that functools.wraps add a standard attribute to point at the underlying function (on the theory that objects should be introspectable). This would allow a standard way to get to the underlying unwrapped functions.

    @rhettinger rhettinger assigned bitdancer and unassigned rhettinger Aug 10, 2010
    @rhettinger rhettinger reopened this Aug 10, 2010
    @merwok
    Copy link
    Member

    merwok commented Aug 11, 2010

    After discussion with RDM on IRC, I’m opening a new report to track this feature request separately. (It’s also a dependency of this bug.)

    @merwok merwok assigned rhettinger and unassigned bitdancer Aug 11, 2010
    @ncoghlan
    Copy link
    Contributor Author

    Have we had any luck getting this to play nicely with the buildbots yet? (I lost track of where the last checkin got to). The necessary Modules/Setup change to adjust when _collections is built should have propagated through by now.

    @merwok
    Copy link
    Member

    merwok commented Aug 22, 2010

    Raymond, out of curiosity, can you tell why you removed lfu_cache?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants