optimize db reads on homepage #659

chadwhitacre · 2013-02-15T15:49:57Z

This is starting to get hot. I upped the pool size to 50 (#632) as a short-term fix. If you heroku logs -p postgres -t (cf. #626 (comment)) you see that the queries populating the homepage are relatively expensive, in the 100-200ms range, and we do three of them.

The text was updated successfully, but these errors were encountered:

chadwhitacre · 2013-07-01T13:36:50Z

Revisiting this after #1098.

I'm expecting/hoping these show up in the logs we get from heroku for hot queries.

chadwhitacre · 2013-07-01T14:04:07Z

Top givers: http://explain.depesz.com/s/5bs (delete).

chadwhitacre · 2013-07-01T14:58:56Z

P.S. These queries are up from 100-200ms to 500-700ms.

mw44118 · 2013-07-01T15:05:22Z

also said this on twitter.

Maybe set in postgresql.conf the load_min_duration setting to 500, and then also load the auto explain module and set that up to automatically explain any query that took more than 500 ms.

Relevant auto explain docs: http://www.postgresql.org/docs/9.2/interactive/auto-explain.html

mw44118 · 2013-07-01T15:05:34Z

Will examine that explain paste as well

chadwhitacre · 2013-07-01T15:12:29Z

Here's the query that's explained above:

  SELECT tipper AS username, anonymous, sum(amount) AS amount
    FROM (    SELECT DISTINCT ON (tipper, tippee)
                     amount
                   , tipper
                FROM tips
                JOIN participants p ON p.username = tipper
                JOIN participants p2 ON p2.username = tippee
                JOIN elsewhere ON elsewhere.participant = tippee
               WHERE p.last_bill_result = ''
                 AND p.is_suspicious IS NOT true
                 AND elsewhere.is_locked = false
            ORDER BY tipper, tippee, mtime DESC
            ) AS foo
    JOIN participants p ON p.username = tipper
   WHERE is_suspicious IS NOT true
GROUP BY tipper, anonymous
ORDER BY amount DESC
LIMIT 12
OFFSET 0

mw44118 · 2013-07-01T17:25:03Z

How often does this underlying data change? If you only run the tipping thing once a week, then can you run this big nasty query and save it in a table and then just read that table?

Sorting 28k rows is just inherently costly. If you can delay that sort until the outermost query, when you also apply the limit and offset, then maybe postgresql can go faster.

It doesn't (have to) cost the same to grab the 12 most highest values from a list vs sort the list and then grab the first 12 rows. I took a big ol' class on this topic and can't remember how to do it, but I bet the brainiacs in PG did it right.

mw44118 · 2013-07-02T12:07:06Z

Coincidence: this was on @sqlperftips this morning on twitter. This http://use-the-index-luke.com/sql/partial-results/top-n-queries

maybe throw a clause like

fetch first 12 rows only

and see if that helps

chadwhitacre · 2013-07-02T12:37:47Z

Relevant:

Using the correct syntax is only half the story because efficiently terminating the execution requires the underlying operations to be executed in a pipelined manner. That means the order by clause must be covered by an index.

chadwhitacre · 2013-07-02T12:41:12Z

Looks like a pipelined order by is what we should aim for:

http://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by

chadwhitacre · 2013-07-02T12:44:10Z

Heh: https://twitter.com/SQLPerfTips/status/351801458091491328 :-)

Points to: http://use-the-index-luke.com/sql/sorting-grouping

See also: http://blog.fatalmind.com/2010/09/29/finding-the-best-match-with-a-top-n-query/

Great guy to know about, thanks for the pointer!

chadwhitacre · 2013-07-02T13:35:40Z

Actually, though, I think I like the idea of an additional table better. I want to make some big changes to the homepage (#1074), so these specific queries are going to change. If we have an intermediate table then that could give us more flexibility to make changes to the homepage without spending tons of time on individual queries.

mw44118 · 2013-07-02T13:54:20Z

So does the underlying data change only rarely?

chadwhitacre · 2013-07-02T14:18:17Z

We should also note that the limit and offset are configurable:

https://www.gittip.com/?limit=50&offset=50

chadwhitacre · 2013-07-02T14:18:44Z

@mw44118 It changes rarely enough that running this once a minute would be more than adequate.

mw44118 · 2013-07-02T14:21:45Z

Yeah, then while you're in frenzied design mode, I would just memoize the
results every minute or so rather than wrassle with optimizing that mother.

On Tue, Jul 2, 2013 at 10:18 AM, Chad Whitacre [email protected]:

@mw44118 https://github.com/mw44118 It changes rarely enough that
running this once a minute would be more than adequate.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/659#issuecomment-20348124
.

W. Matthew Wilson
[email protected]
http://tplus1.com

This addresses #659 by adding three new tables: - homepage_new_participants - homepage_top_givers - homepage_top_receivers These are created as empty tables in schema.sql, and then in configure-aspen.py, a daemon thread is started on server startup that periodically runs the heavy queries needed to populate those tables. It does so atomically in such a way that reads from the table are not blocked while the table is being written to. The periodicity is configurable with the UPDATE_HOMEPAGE_EVERY envvar (int seconds). The queries in index.html are updated to use the new tables.

clone1018 · 2013-07-17T01:36:17Z

We'll also probably need to port over community pages, not sure how we want that to be done.

https://www.gittip.com/for/python/

chadwhitacre · 2013-07-17T01:38:04Z

@clone1018 Ideally we'd be able to reuse the intermediate tables. Haven't looked enough yet though.

chadwhitacre · 2013-07-17T02:49:37Z

Community pages reticketed as #1159.

chadwhitacre mentioned this issue May 14, 2013

optimize the community pages #951

Closed

chadwhitacre mentioned this issue Jul 1, 2013

debug slowness #1098

Closed

chadwhitacre added a commit that referenced this issue Jul 1, 2013

Comment-stamp some SQL queries to help with #659

803ebd8

I'm expecting/hoping these show up in the logs we get from heroku for hot queries.

chadwhitacre mentioned this issue Jul 16, 2013

port back to postgres.py #1146

Closed

chadwhitacre mentioned this issue Jul 17, 2013

Add an intermediate table for homepage queries #1155

Merged

chadwhitacre mentioned this issue Jul 17, 2013

optimize db reads on community pages #1159

Closed

chadwhitacre closed this as completed Jul 17, 2013

chadwhitacre mentioned this issue Jul 17, 2013

we got resig'd again #1151

Closed

chadwhitacre mentioned this issue Sep 23, 2013

Request for /on/github/github takes over 8s #1491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize db reads on homepage #659

optimize db reads on homepage #659

chadwhitacre commented Feb 15, 2013

chadwhitacre commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

mw44118 commented Jul 1, 2013

mw44118 commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

mw44118 commented Jul 1, 2013

mw44118 commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

mw44118 commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

mw44118 commented Jul 2, 2013

clone1018 commented Jul 17, 2013

chadwhitacre commented Jul 17, 2013

chadwhitacre commented Jul 17, 2013

optimize db reads on homepage #659

optimize db reads on homepage #659

Comments

chadwhitacre commented Feb 15, 2013

chadwhitacre commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

mw44118 commented Jul 1, 2013

mw44118 commented Jul 1, 2013

chadwhitacre commented Jul 1, 2013

mw44118 commented Jul 1, 2013

mw44118 commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

mw44118 commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

chadwhitacre commented Jul 2, 2013

mw44118 commented Jul 2, 2013

clone1018 commented Jul 17, 2013

chadwhitacre commented Jul 17, 2013

chadwhitacre commented Jul 17, 2013