Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Commit

Permalink
Eager-load participant when querying elsewhere
Browse files Browse the repository at this point in the history
  • Loading branch information
chadwhitacre committed Oct 5, 2013
1 parent ca9c7f4 commit 1ec5522
Show file tree
Hide file tree
Showing 4 changed files with 74 additions and 16 deletions.
44 changes: 44 additions & 0 deletions branch.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
-------------------------------------------------------------------------------
-- https://github.com/gittip/www.gittip.com/pull/1369


-- The following lets us cast queries to elsewhere_with_participant to get the
-- participant data dereferenced and returned in a composite type along with
-- the elsewhere data. Then we can register orm.Models in the application for
-- both participant and elsewhere_with_participant, and when we cast queries
-- elsewhere.*::elsewhere_with_participant, we'll get a hydrated Participant
-- object at .participant. Woo-hoo!


BEGIN;

CREATE TYPE elsewhere_with_participant AS
( id integer
, platform text
, user_id text
, user_info hstore
, is_locked boolean
, participant participants
);

CREATE OR REPLACE FUNCTION load_participant_for_elsewhere (elsewhere)
RETURNS elsewhere_with_participant
AS $$

SELECT $1.id
, $1.platform
, $1.user_id
, $1.user_info
, $1.is_locked
, participants.*::participants
FROM participants
WHERE participants.username = $1.participant
;

$$ LANGUAGE SQL;


CREATE CAST (elsewhere AS elsewhere_with_participant)
WITH FUNCTION load_participant_for_elsewhere(elsewhere);

END;

This comment has been minimized.

Copy link
@zbynekwinkler

zbynekwinkler Oct 5, 2013

Contributor

@whit537 However this may seem very cool at first sight I think it violates the principle of least surprise. To get people less afraid about touching the code base we should try to make it simpler, more obvious, more readable. Getting people less afraid should be our priority IMHO. I'd would pick straight SQL over all the cool hacks only the pros know anytime.

29 changes: 21 additions & 8 deletions gittip/elsewhere/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def __init__(self, db):


def load(self, username):
"""Given a unicode, return an AccountElsewhere object.
"""Given a username on the other platform, return an AccountElsewhere object.
"""
typecheck(username, UnicodeWithParams)
try:
Expand All @@ -61,16 +61,29 @@ def load(self, username):


def load_from_db(self, username):
return self.db.one( "SELECT elsewhere.*::elsewhere "
"FROM elsewhere "
"WHERE platform=%s "
"AND user_info->%s = %s"
, (self.name, self.username_key, username)
, default=UnknownAccountElsewhere
)
"""Given a username on the other platform, return an AccountElsewhere object.
If the account elsewhere is unknown to us, we raise UnknownAccountElsewhere.
"""
return self.db.one("""
SELECT elsewhere.*::elsewhere_with_participant
FROM elsewhere
WHERE platform=%s
AND user_info->%s = %s
""", (self.name, self.username_key, username), default=UnknownAccountElsewhere)


def load_from_api(self, username):
"""Given a username on the other platform, return an AccountElsewhere object.
The first thing we do is hit the API of the other platform, then we use
that to upsert our own elsewhere table, before handing back off to
load_from_db.
"""

# Hit the platform's API to get user info.
# ========================================
Expand Down
2 changes: 1 addition & 1 deletion gittip/models/account_elsewhere.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

class AccountElsewhere(Model):

typname = "elsewhere"
typname = "elsewhere_with_participant"


def get_html_url(self):
Expand Down
15 changes: 8 additions & 7 deletions tests/test_elsewhere_twitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,17 @@ def test_twitter_resolve_resolves(self):
assert actual == expected


def test_get_user_info_gets_user_info(self):
twitter.TwitterAccount("1", {'screen_name': 'alice'}).opt_in('alice')
expected = {"screen_name": "alice"}
actual = twitter.get_user_info('alice')
assert actual == expected


@mock.patch('gittip.elsewhere.twitter.Twitter.hit_api')
def test_can_load_account_elsewhere_from_twitter(self, hit_api):
hit_api.return_value = {"id": "123", "screen_name": "alice"}

alice_on_twitter = self.elsewhere.twitter.load(UnicodeWithParams('alice', {}))
assert alice_on_twitter.user_id == "123"


@mock.patch('gittip.elsewhere.twitter.Twitter.hit_api')
def test_account_elsewhere_has_participant_object_on_it(self, hit_api):
hit_api.return_value = {"id": "123", "screen_name": "alice"}
alice_on_twitter = self.elsewhere.twitter.load(UnicodeWithParams('alice', {}))
import pdb; pdb.set_trace()
assert alice_on_twitter.participant.username == 'alice'

5 comments on commit 1ec5522

@mw44118
Copy link

@mw44118 mw44118 commented on 1ec5522 Oct 6, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create a view rather than a function and a type?

You can register composite types that correspond to views just like tables (or types).

@zbynekwinkler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still rather let everyone see the all the joins in the code. It is far easier to see what is going on.

@mw44118
Copy link

@mw44118 mw44118 commented on 1ec5522 Oct 7, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I agree with your point of view -- simple is best. I would rather have redundant code that is intuitive than some brain-bendy gibberish that's super terse but inscrutable.

The tricky thing about normalized databases is that one "object" may have its guts scattered out across a dozen tables.

Views are a nice way to link all those components back together. The app code can just say

select *
from view_of_a_dozen_joined_tables
where object_id = 99

and then the app code gets something back that looks like the "real" object.

Just my two cents.

@zbynekwinkler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added my point of view to #1549 (care to join the more general discussion there?). The thing with nonmaterialized views is that you can never tell what the performance will be, since the joins are hidden. Everyone then stops to think about the schema, but the creator of the view :(. I don't know enough about materialized views to properly evaluate them for the performance but the hiding point holds. It's like RPC - the easier you make it, the less people know about them, the less they care about how to use them properly.

@chadwhitacre
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on using a view @mw44118, that's cleaner for sure.

I'd still rather let everyone see all the joins in the code. It is far easier to see what is going on.
The thing with nonmaterialized views is that you can never tell what the performance will be, since the joins are hidden.

@zwn As you've mentioned elsewhere (#1492) , we probably want to do more denormalization at write-time anyway. The question here is how to best do read-time denormalization using the postgres.py orm. The principle of the postgres.py orm is that we compose our queries in SQL (not via a Python representation of SQL), and use the return type of the query to trigger hydration of this or that Python object.

In this case, for example, I'm fine to see about moving the denormalization up from the load_participant_for_elsewhere function into the query at line 71. What I don't know off-hand is what the return type of such a nested query would be, but assuming it's something we can register an orm.Model for we should be fine.

Of course, we still have a hidden join: anyone can call load_from_db without thought for its performance characteristics. One way to mitigate this is to add documentation to load_from_db about the complexity of the call. What I'm hearing you saying is that the line between Python and Postgres is a good one to go by. Part of me wants to say, "Meh. We can't be afraid of Postgres." Otoh it's probably a good design principle to keep the nesting of queries to a minimum.

Again, the deeper mitigation is to denormalize on write.

Please sign in to comment.