Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Capabilities queries #2322

Merged
merged 11 commits into from
Aug 30, 2021
Merged

Add Capabilities queries #2322

merged 11 commits into from
Aug 30, 2021

Conversation

tomayac
Copy link
Member

@tomayac tomayac commented Aug 30, 2021

Adds queries for mobile and desktop for all the APIs we have detection for.

Related to #2152.

@tomayac tomayac mentioned this pull request Aug 30, 2021
6 tasks
@tunetheweb
Copy link
Member

@tomayac looks like you’ve lots of linting errors. You can fix most of these automatically if you have your Python env set up in your src directory by running:

sqlfluff fix sql/2021/capabilities

Gimme a shout if you need a hand.

@tunetheweb
Copy link
Member

Oh and similarly you can run:

sqlfluff lint sql/2021/capabilities

to make sure you’re all clean before committng if bored of waiting for GitHub Action to complete.

@tunetheweb tunetheweb added the analysis Querying the dataset label Aug 30, 2021
@tunetheweb tunetheweb added this to the 2021 Analysis milestone Aug 30, 2021
@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Oh and similarly you can run:

sqlfluff lint sql/2021/capabilities

to make sure you’re all clean before committng if bored of waiting for GitHub Action to complete.

All checks green now. I formatted via the BigQuery front-end, but looks like we have other preferences here, which is fine, too. I like the lint rules here better that the front-end's…

@tunetheweb
Copy link
Member

tunetheweb commented Aug 30, 2021

All checks green now. I formatted via the BigQuery front-end, but looks like we have other preferences here, which is fine, too. I like the lint rules here better that the front-end's…

Agree with most of ours but not a big fan of AND at the end of line (makes commenting out more difficult!) and COUNT(0) over COUNT(1) (seems counter intuitive to me!), but it's the consistency that's more important so I'll live with it!

@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Copy link
Member

@tunetheweb tunetheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now. A couple of minor comments/nits.

Also is this the only query for the Capabilities chapter or will you be adding more? Noticed you had some blink counter queries last year. If adding more it would be good to add a checklist to the initial comment (see the other PRs as examples) so we can track progress for this chapter and see how far through we are in writing the queries.

tomayac and others added 2 commits August 30, 2021 13:35
@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Thanks for the nits :-)

This will be the only query then, since the queries based on use counters already exist and are available in an evergreen report. We can reference specific APIs if we need to (example). This year the idea was to focus less on quantitative analysis but focus more on qualitative aspects like how apps use these APIs.

@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Actually, literally hitting the Comment button I recall we wanted to do a fun analysis that would be quantitative: We wanted to determine the most Fugu page on the Internet by ordering pages by the number of Fugu APIs they use. Could you help with this? My SQL foo is a bit rusty (as you have no doubt noticed).

@tunetheweb
Copy link
Member

Actually, literally hitting the Comment button I recall we wanted to do a fun analysis that would be quantitative: We wanted to determine the most Fugu page on the Internet by ordering pages by the number of Fugu APIs they use. Could you help with this? My SQL foo is a bit rusty (as you have no doubt noticed).

OK then think it's good to merge. Can you copy the results into the official sheet for this chapter: https://docs.google.com/spreadsheets/d/1b4moteB9EiLYkH1Ln9qfi1tnU-E4N2UQ87uayWytDKw/edit?usp=sharing

@tunetheweb
Copy link
Member

tunetheweb commented Aug 30, 2021

Actually, literally hitting the Comment button I recall we wanted to do a fun analysis that would be quantitative: We wanted to determine the most Fugu page on the Internet by ordering pages by the number of Fugu APIs they use. Could you help with this? My SQL foo is a bit rusty (as you have no doubt noticed).

What about something like this:

CREATE TEMP FUNCTION getFuguAPIs(data STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
const $ = JSON.parse(data);
return Object.keys($);
''';

SELECT
  _TABLE_SUFFIX AS client,
  url,
  COUNT(DISTINCT fuguAPI) AS fuguAPIs
FROM
  `httparchive.pages.2021_07_01_*`,
  UNNEST(getFuguAPIs(JSON_QUERY(payload, '$."_fugu-apis"'))) AS fuguAPI
WHERE
  JSON_QUERY(payload, '$."_fugu-apis"') != "[]"
GROUP BY
  client,
  url
HAVING
  COUNT(DISTINCT fuguAPI) >= 1
ORDER BY
  fuguAPIs DESC,
  url,
  client
LIMIT 100;

First 15 results are:

client url fuguAPIs
desktop https://whatwebcando.today/ 28
mobile https://whatwebcando.today/ 28
mobile https://polisnotis.se/ 10
desktop https://system-scanner.net/ 9
desktop https://community.emlid.com/ 8
mobile https://community.emlid.com/ 8
desktop https://excalidraw.com/ 8
mobile https://excalidraw.com/ 8
desktop https://fstdesk.com/ 8
mobile https://fstdesk.com/ 8
desktop https://mirea.ninja/ 8
mobile https://mirea.ninja/ 8
desktop https://permission.site/ 8
mobile https://permission.site/ 8

@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Amazing, thanks for that! Just committed this query to the repo.

@tunetheweb
Copy link
Member

This will be the only query then, since the queries based on use counters already exist and are available in an evergreen report. We can reference specific APIs if we need to (example)

BTW, on an only slightly related point, I updated those reports to add the rank lens's recently and also got them working with these blink usage queries. So you can see if the top 1,000 websites use Fugu APIs more than the average internet (they don't), or whatever.

Never bothered fixing it for the CMS lens's (Wordpress, Drupal and Magento) as a bit trickier and, particularly for these APIs it's unlikely to be used on those sites anyway.

@tunetheweb
Copy link
Member

Amazing, thanks for that! Just committed this query to the repo.

Don't see it. Did you push?

@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Ooops, sorry. I pushed, but didn't see that I had to pull first in order for it to go through.

@tunetheweb tunetheweb merged commit 77556e7 into main Aug 30, 2021
@tunetheweb tunetheweb deleted the fugu-queries branch August 30, 2021 12:14
@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Results for both queries added to the official results sheet: https://docs.google.com/spreadsheets/d/1b4moteB9EiLYkH1Ln9qfi1tnU-E4N2UQ87uayWytDKw/edit?usp=sharing.

@tunetheweb
Copy link
Member

Cheers. I added the rest of the 100 since I had that tab open still.

Also added a pivot table. Why is GamePad used sooo much? Does some popular embed (YouTube?) used it?

Anyway think you can tick off chapter item 3 - Validate Results! 🎉

@tomayac
Copy link
Member Author

tomayac commented Aug 30, 2021

Cheers. I added the rest of the 100 since I had that tab open still.

Oh, thanks!

Also added a pivot table. Why is GamePad used sooo much? Does some popular embed (YouTube?) used it?

Looking at ChromeStatus, it looks not as popular (but it's looking at concrete events, not just the API per se.

Anyway think you can tick off chapter item 3 - Validate Results! 🎉

Woohoo!

@tunetheweb
Copy link
Member

Also added a pivot table. Why is GamePad used sooo much? Does some popular embed (YouTube?) used it?

Looking at ChromeStatus, it looks not as popular (but it's looking at concrete events, not just the API per se.

It may be worth investigating these further. For the PWA queries we specifically excluded YouTube embeds for some queries (see #2272 ) as we felt they were incorrectly clouding the stats for events that were never used. @demianrenzulli has more details.

@tomayac
Copy link
Member Author

tomayac commented Aug 31, 2021

I found a better ChromeStatus stats entry (this was hidden in plain sight). It looks like the navigator.getGamepads() API is used in tracking contexts, for example, by Yandex. This being a thing is also suggested by Mozilla disabling the API in their resist fingerprinting mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Querying the dataset
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants