Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

optimize db reads on homepage #659

Closed
chadwhitacre opened this issue Feb 15, 2013 · 19 comments
Closed

optimize db reads on homepage #659

chadwhitacre opened this issue Feb 15, 2013 · 19 comments

Comments

@chadwhitacre
Copy link
Contributor

This is starting to get hot. I upped the pool size to 50 (#632) as a short-term fix. If you heroku logs -p postgres -t (cf. #626 (comment)) you see that the queries populating the homepage are relatively expensive, in the 100-200ms range, and we do three of them.

@chadwhitacre
Copy link
Contributor Author

Revisiting this after #1098.

chadwhitacre added a commit that referenced this issue Jul 1, 2013
I'm expecting/hoping these show up in the logs we get from heroku for
hot queries.
@chadwhitacre
Copy link
Contributor Author

Top givers: http://explain.depesz.com/s/5bs (delete).

@chadwhitacre
Copy link
Contributor Author

P.S. These queries are up from 100-200ms to 500-700ms.

@mw44118
Copy link

mw44118 commented Jul 1, 2013

also said this on twitter.

Maybe set in postgresql.conf the load_min_duration setting to 500, and then also load the auto explain module and set that up to automatically explain any query that took more than 500 ms.

Relevant auto explain docs: http://www.postgresql.org/docs/9.2/interactive/auto-explain.html

@mw44118
Copy link

mw44118 commented Jul 1, 2013

Will examine that explain paste as well

@chadwhitacre
Copy link
Contributor Author

Here's the query that's explained above:

  SELECT tipper AS username, anonymous, sum(amount) AS amount
    FROM (    SELECT DISTINCT ON (tipper, tippee)
                     amount
                   , tipper
                FROM tips
                JOIN participants p ON p.username = tipper
                JOIN participants p2 ON p2.username = tippee
                JOIN elsewhere ON elsewhere.participant = tippee
               WHERE p.last_bill_result = ''
                 AND p.is_suspicious IS NOT true
                 AND elsewhere.is_locked = false
            ORDER BY tipper, tippee, mtime DESC
            ) AS foo
    JOIN participants p ON p.username = tipper
   WHERE is_suspicious IS NOT true
GROUP BY tipper, anonymous
ORDER BY amount DESC
LIMIT 12
OFFSET 0

@mw44118
Copy link

mw44118 commented Jul 1, 2013

How often does this underlying data change? If you only run the tipping thing once a week, then can you run this big nasty query and save it in a table and then just read that table?

Sorting 28k rows is just inherently costly. If you can delay that sort until the outermost query, when you also apply the limit and offset, then maybe postgresql can go faster.

It doesn't (have to) cost the same to grab the 12 most highest values from a list vs sort the list and then grab the first 12 rows. I took a big ol' class on this topic and can't remember how to do it, but I bet the brainiacs in PG did it right.

@mw44118
Copy link

mw44118 commented Jul 2, 2013

Coincidence: this was on @sqlperftips this morning on twitter. This http://use-the-index-luke.com/sql/partial-results/top-n-queries

maybe throw a clause like

fetch first 12 rows only

and see if that helps

@chadwhitacre
Copy link
Contributor Author

Relevant:

Using the correct syntax is only half the story because efficiently terminating the execution requires the underlying operations to be executed in a pipelined manner. That means the order by clause must be covered by an index.

@chadwhitacre
Copy link
Contributor Author

Looks like a pipelined order by is what we should aim for:

http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by

@chadwhitacre
Copy link
Contributor Author

@chadwhitacre
Copy link
Contributor Author

Actually, though, I think I like the idea of an additional table better. I want to make some big changes to the homepage (#1074), so these specific queries are going to change. If we have an intermediate table then that could give us more flexibility to make changes to the homepage without spending tons of time on individual queries.

@mw44118
Copy link

mw44118 commented Jul 2, 2013

So does the underlying data change only rarely?

@chadwhitacre
Copy link
Contributor Author

We should also note that the limit and offset are configurable:

https://www.gittip.com/?limit=50&offset=50

@chadwhitacre
Copy link
Contributor Author

@mw44118 It changes rarely enough that running this once a minute would be more than adequate.

@mw44118
Copy link

mw44118 commented Jul 2, 2013

Yeah, then while you're in frenzied design mode, I would just memoize the
results every minute or so rather than wrassle with optimizing that mother.

On Tue, Jul 2, 2013 at 10:18 AM, Chad Whitacre [email protected]:

@mw44118 https://github.com/mw44118 It changes rarely enough that
running this once a minute would be more than adequate.


Reply to this email directly or view it on GitHubhttps://github.com//issues/659#issuecomment-20348124
.

W. Matthew Wilson
[email protected]
http://tplus1.com

chadwhitacre added a commit that referenced this issue Jul 17, 2013
This addresses #659 by adding three new tables:

 - homepage_new_participants
 - homepage_top_givers
 - homepage_top_receivers

These are created as empty tables in schema.sql, and then in
configure-aspen.py, a daemon thread is started on server startup that
periodically runs the heavy queries needed to populate those tables. It
does so atomically in such a way that reads from the table are not
blocked while the table is being written to. The periodicity is
configurable with the UPDATE_HOMEPAGE_EVERY envvar (int seconds). The
queries in index.html are updated to use the new tables.
@clone1018
Copy link
Contributor

We'll also probably need to port over community pages, not sure how we want that to be done.

https://www.gittip.com/for/python/

@chadwhitacre
Copy link
Contributor Author

@clone1018 Ideally we'd be able to reuse the intermediate tables. Haven't looked enough yet though.

@chadwhitacre
Copy link
Contributor Author

Community pages reticketed as #1159.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants