Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Pulling a JSON object from a given Starred repo URL #42

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Mordax
Copy link
Contributor

@Mordax Mordax commented Dec 19, 2018

This PR is a work in progress - I'd love to get some feedback on it. Didn't expect Rust's syntax and style to trip me up so much as well as the compiler pointing out all the stuff that doesn't work. Will eventually close #37 .

Next steps -

  • Figuring out how to cleanly populate the Languages struct with the JSON object found at languages_url in the Repository struct in a way that doesn't make rustc hate me
  • Formatting the JSON object (which is just a non defined key-value pair) into a hash structure that makes logical sense
  • Sorting via most popular language, transforming the bytes data the GH API returns into something logical like pages
  • Displaying the information properly

Copy link
Owner

@0xazure 0xazure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Mordax, I've left some comments for you.

I think what's tripping you up the most is you're trying to follow the existing code as a model too closely; the work you're implementing interacts with the existing code, but there's very little similarity between it and what the new functionality should look like.

A thought for you as well, since you were interested in the CLI design for supernova: the current implementation will always grab the language data, what about adding a flag so the user can control if they are interested in language data or not?

@@ -68,6 +68,7 @@ impl ClientBuilder {
struct Star {
starred_at: DateTime<Utc>,
repo: Repository,
//lang: Languages,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need this field on struct Star, but we don't want serde to look at the JSON response for a key of "lang"; we can tell serde to ignore a field using the skip directive so it won't be seen during (de)serialization.

I think this field should actually go on the Repository type since it's data about the repository and not the star itself.

Note that repositories may also have a(n optionally null) "language" key, which denotes the primary language for the repository.

@@ -83,6 +84,7 @@ struct Repository {
full_name: String,
description: Option<String>,
stargazers_count: i32,
languages_url: Option<String>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to hand on to this URL for the duration of the program and/or output it at some point? Or is it only a temporary variable so we can make a request to the GitHub API? (see more comments below)

value: Option<String>,
}

impl fmt::Display for Languages {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't need to implement your own fmt::Display for Languages unless you're going to be pretty-printing out instances of struct Languages, and #[derive(Debug)] is usually sufficient for debugging, which you can access with the "{:?}" format string like so:

println!("{:?}", lang);

The only reason fmt::Dsiplay is implemented for Repository is because we wanted full control of the desired output format, namely only printing the description if it exists. The derived Debug implementation is pretty good, it's just that it'll print e.g. description: None if Option<String> is None which isn't what we wanted for Repository.

let ref mut next = star.repo.languages_url;
if let Some(ref link) = next{
let mut res = client.get(link).send()?;
next = &mut extract_link_next(res.headers());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to try and extract Next here because there is no Next header in the response:

Access-Control-Allow-Origin: *
Access-Control-Expose-Headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type
Cache-Control: public, max-age=60, s-maxage=60
Content-Encoding: gzip
Content-Security-Policy: default-src 'none'
Content-Type: application/json; charset=utf-8
Date: Wed, 19 Dec 2018 03:21:55 GMT
ETag: W/"d6483279641fa22081ee4bc2044d6739"
Last-Modified: Tue, 18 Dec 2018 22:37:04 GMT
Referrer-Policy: origin-when-cross-origin, strict-origin-when-cross-origin
Server: GitHub.com
Status: 200 OK
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Transfer-Encoding: chunked
Vary: Accept
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-GitHub-Media-Type: unknown, github.v3
X-GitHub-Request-Id: FC2C:6CCE:19C4CD3:4200EB7:5C19B93D
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 52
X-RateLimit-Reset: 1545191215
X-XSS-Protection: 1; mode=block

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Host: api.github.com
Referer: https://api.github.com/
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0

The extract_link_next function is used to handle the pagination on the starred API endpoint:

Status: 200 OK
Link: <https://api.github.com/resource?page=2>; rel="next",
      <https://api.github.com/resource?page=5>; rel="last"

which the languages API endpoint does not have.

@@ -83,6 +84,7 @@ struct Repository {
full_name: String,
description: Option<String>,
stargazers_count: i32,
languages_url: Option<String>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this needs to be an Option, the API always generates this URL. However, we will have to handle the case where there are no languages associated with a given repository (i.e. the json response is an empty object).

@@ -93,11 +95,33 @@ impl fmt::Display for Repository {
write!(f, " - {}", description)?;
}

if let Some(ref languages_url) = self.languages_url {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fmt::Display controls the output format of the Repository, and how these objects are written to stdout or a file. Is the languages_url a piece of data that should be written out along with the repo name, url, and project description?

Ok(())
}
}

pub fn collect_stars(config: Config) -> Result<(), Box<dyn error::Error>> {
#[derive(Debug, Deserialize)]
struct Languages {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're going for here, but unfortunately this isn't how serde works. With this struct definition, serde expects the json response to contain one or more objects of the following structure:

{
    "key": "string data",
    "value": "string data"
}

whereas the data we actually get back from the languages API endpoint is:

{
  "Python": 597727,
  "JavaScript": 38865,
  "CSS": 1229,
  "Batchfile": 838,
  "Shell": 713,
  "HTML": 478
}

Now, because we don't know what languages a given repo will contain, and it would be a lot of work and not very flexible to write a struct that has keys for every language GitHub understands, I would really recommend looking into how serde works with untyped json values; the Value::Object type in particular should be of some use because it acts like a Map and we can iterate over its contents without needing to know anything about its keys. See the serde examples for an explanation of how it works exactly.

@@ -120,6 +144,20 @@ pub fn collect_stars(config: Config) -> Result<(), Box<dyn error::Error>> {
}
}

let mut langs: Vec<Languages> = Vec::new();

for star in stars.iter(){
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've overcomplicated this part. To get the languages for all the starred repositories, we want to:

  1. loop over the stars vector
  2. for each star:
    1. get its languages_url
    2. send a request to that url using client.get(url)
    3. parse the json result (see above about serde and untyped json)
    4. store the parsed data into the star's Repository object

The stars vector is already mutable, so the Repository data can be updated (maybe add a function or two to Repository to handle this?), you'll just have to decide how you want to display this data and any calculations you do with the data to the user.

}
}

pub fn collect_stars(config: Config) -> Result<(), Box<dyn error::Error>> { //Where the json extraction happens
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn collect_stars(config: Config) -> Result<(), Box<dyn error::Error>> { //Where the json extraction happens
pub fn collect_stars(config: Config) -> Result<(), Box<dyn error::Error>> {

nit: extraneous comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using Github API to grab programming language information on starred repos
2 participants