-
Notifications
You must be signed in to change notification settings - Fork 309
optimize db reads on homepage #659
Comments
Revisiting this after #1098. |
I'm expecting/hoping these show up in the logs we get from heroku for hot queries.
Top givers: http://explain.depesz.com/s/5bs (delete). |
P.S. These queries are up from 100-200ms to 500-700ms. |
also said this on twitter. Maybe set in postgresql.conf the load_min_duration setting to 500, and then also load the auto explain module and set that up to automatically explain any query that took more than 500 ms. Relevant auto explain docs: http://www.postgresql.org/docs/9.2/interactive/auto-explain.html |
Will examine that explain paste as well |
Here's the query that's explained above: SELECT tipper AS username, anonymous, sum(amount) AS amount
FROM ( SELECT DISTINCT ON (tipper, tippee)
amount
, tipper
FROM tips
JOIN participants p ON p.username = tipper
JOIN participants p2 ON p2.username = tippee
JOIN elsewhere ON elsewhere.participant = tippee
WHERE p.last_bill_result = ''
AND p.is_suspicious IS NOT true
AND elsewhere.is_locked = false
ORDER BY tipper, tippee, mtime DESC
) AS foo
JOIN participants p ON p.username = tipper
WHERE is_suspicious IS NOT true
GROUP BY tipper, anonymous
ORDER BY amount DESC
LIMIT 12
OFFSET 0 |
How often does this underlying data change? If you only run the tipping thing once a week, then can you run this big nasty query and save it in a table and then just read that table? Sorting 28k rows is just inherently costly. If you can delay that sort until the outermost query, when you also apply the limit and offset, then maybe postgresql can go faster. It doesn't (have to) cost the same to grab the 12 most highest values from a list vs sort the list and then grab the first 12 rows. I took a big ol' class on this topic and can't remember how to do it, but I bet the brainiacs in PG did it right. |
Coincidence: this was on @sqlperftips this morning on twitter. This http://use-the-index-luke.com/sql/partial-results/top-n-queries maybe throw a clause like fetch first 12 rows only and see if that helps |
Relevant:
|
Looks like a pipelined http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by |
Heh: https://twitter.com/SQLPerfTips/status/351801458091491328 :-) Points to: http://use-the-index-luke.com/sql/sorting-grouping See also: http://blog.fatalmind.com/2010/09/29/finding-the-best-match-with-a-top-n-query/ Great guy to know about, thanks for the pointer! |
Actually, though, I think I like the idea of an additional table better. I want to make some big changes to the homepage (#1074), so these specific queries are going to change. If we have an intermediate table then that could give us more flexibility to make changes to the homepage without spending tons of time on individual queries. |
So does the underlying data change only rarely? |
We should also note that the limit and offset are configurable: |
@mw44118 It changes rarely enough that running this once a minute would be more than adequate. |
Yeah, then while you're in frenzied design mode, I would just memoize the On Tue, Jul 2, 2013 at 10:18 AM, Chad Whitacre [email protected]:
W. Matthew Wilson |
This addresses #659 by adding three new tables: - homepage_new_participants - homepage_top_givers - homepage_top_receivers These are created as empty tables in schema.sql, and then in configure-aspen.py, a daemon thread is started on server startup that periodically runs the heavy queries needed to populate those tables. It does so atomically in such a way that reads from the table are not blocked while the table is being written to. The periodicity is configurable with the UPDATE_HOMEPAGE_EVERY envvar (int seconds). The queries in index.html are updated to use the new tables.
We'll also probably need to port over community pages, not sure how we want that to be done. |
@clone1018 Ideally we'd be able to reuse the intermediate tables. Haven't looked enough yet though. |
Community pages reticketed as #1159. |
This is starting to get hot. I upped the pool size to 50 (#632) as a short-term fix. If you
heroku logs -p postgres -t
(cf. #626 (comment)) you see that the queries populating the homepage are relatively expensive, in the 100-200ms range, and we do three of them.The text was updated successfully, but these errors were encountered: