Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MVP Huggingface] Define Request Type model and GET data from HuggingFace repos. #1891

Open
12 of 14 tasks
Tracked by #1890 ...
ryanfchase opened this issue Jan 11, 2025 · 23 comments
Open
12 of 14 tasks
Tracked by #1890 ...
Assignees
Labels
Complexity: Small p-feature: data P-feature: Map ready for dev lead ready for developer lead to review the issue Role: Frontend React front end work size: 2pt Can be done in 7-12 hours
Milestone

Comments

@ryanfchase
Copy link
Member

ryanfchase commented Jan 11, 2025

Overview

We need to create a version of our app that initializes the map using one request table since we are loading multiple tables for each year of data.

Developer Info

We are NOT merging this into develop. This is meant to be a staging branch for the Blank Map feature ONLY

  • I understand

Action Items

Developer Setup

  • designate a feature branch, e.g. staging-feature-blank-map
    • feature branch created as feature-blank-map
  • PM and DEV collaborate to define Request Type model @traycn @ryanfchase

Developer Actions

  • Download DBeaver
    • Setup connection with Huggingface
  • Create a file / obj for the request type model - /db/DBRequest.jsx
    • Add the createRequestTable function
    • Updating the logic to insert 2023 - 2024 ONLY data into a single table. (Prevent crashing your local)
    • create requests table on Map initialization (use data model provided) , however do not load any initial data
    • create a method, fetchData in db/DBRequest.jsx that will be the accessor method for fetching data from external sources (e.g. Socrata, Huggingface)
      • leave a comment describing how we intend to use it, what parameters should be considered for this method
      • to demonstrate that it has access to duckdb, the method should console log a DESCRIBE call as proof of concept
  • Release the dependency on DEV - Update Search/Filter Modal functionality and design #1868 and if there are no other dependencies, move that issue into the new issue approval and add the label ready for dev lead

Resources/Instructions

Notes

  • currently we rely on parquet files (e.g. 2024.parquet) from HuggingFace to define the requests table.
    • obtain the table columns and field types and use them to define the requests table

Resources

Code References

  • createRequestsTable
    • path: components > Map > index.jsx::MapContainer, L69
    • note: SQL on L77 will need to modified. We will not be using datasetFileName and instead we'll simply define the table inline
  • updateHfDataset.py
    • path: scripts > updateHfDataset.py
    • note: use this as a reference to how we are handling timestampformat
@ryanfchase ryanfchase added this to the 04 - Map Page milestone Jan 11, 2025
@ryanfchase ryanfchase self-assigned this Jan 11, 2025
@github-project-automation github-project-automation bot moved this to New Issue Approval in P: 311: Project Board Jan 11, 2025
@ryanfchase ryanfchase added Time sensitive This ticket should be completed ASAP ready for prioritization ready for PMs to consider for prioritized backlog and removed draft labels Jan 12, 2025
@ryanfchase ryanfchase removed their assignment Jan 12, 2025
@ryanfchase ryanfchase moved this from New Issue Approval to Prioritized Backlog in P: 311: Project Board Jan 12, 2025
@ryanfchase ryanfchase removed the ready for prioritization ready for PMs to consider for prioritized backlog label Jan 12, 2025
@DrAcula27 DrAcula27 self-assigned this Jan 12, 2025
@DrAcula27

This comment has been minimized.

@ryanfchase ryanfchase moved this from Prioritized Backlog to In progress in P: 311: Project Board Jan 14, 2025
@traycn

This comment has been minimized.

@traycn traycn changed the title Initialize Duckdb Without Huggingface [MVP Socrata API] Initialize Duckdb Without Huggingface Jan 15, 2025
@traycn traycn changed the title [MVP Socrata API] Initialize Duckdb Without Huggingface [MVP Socrata API] Define Request Type model and GET data from HuggingFace repos. Jan 16, 2025
@traycn

This comment has been minimized.

@DrAcula27

This comment has been minimized.

@ryanfchase ryanfchase moved this from In progress to Questions in P: 311: Project Board Jan 20, 2025
@ryanfchase ryanfchase added Question Further information is requested Discussion Needs to be discussed as a team labels Jan 20, 2025
@ryanfchase

This comment has been minimized.

@ryanfchase ryanfchase moved this from Questions to In progress in P: 311: Project Board Feb 5, 2025
@ryanfchase ryanfchase removed the ready for dev lead ready for developer lead to review the issue label Feb 5, 2025
@ryanfchase
Copy link
Member Author

ryanfchase commented Feb 6, 2025

2025-02-05 ticket check in

4:30p - 5:30p PT

Notes

  • We should be defining the data model (e.g. CREATE OR REPLACE TABLE ... ) in DbRequests.jsx
    • We want the react app to be the source of truth for how data should look when it goes to DuckDb
    • JS says "I don't care what year the data is from, so long as it matches the the DuckDb spec"
  • The cron job is responsible for cleaning each year's data SO THAT it conforms the DuckDb spec

Action Items

  • Create a ticket to modify updateHfDataset.py to clean EVERY year, not just 2024

@ryanfchase
Copy link
Member Author

Hi @DrAcula27 ,

Please leave a comment with the following items:

  • updated ETA
  • blockers, or progress from the last week (if applicable)
  • availability for communications during the week

@ryanfchase
Copy link
Member Author

Hi all, I have provided links to the new repos I created to help start the section labelled Updating Huggingface Repositories - separate into 3 month chunks

@DrAcula27
Copy link
Member

Update

  • ETA: 21 Feb 2025
  • Blockers: learning Saga, Redux; working through errors with different variables not being defined; figuring out where startDate actually is stored so I can pass it properly to DbRequests.jsx
  • Availability: generally weekday afternoons/evenings

@ryanfchase
Copy link
Member Author

Based on discussion with dev lead, and instruction from Bonnie, we will be splitting this ticket up into smaller parts so that you may get your work reviewed, and other team members may pick up the other resulting tickets if needed.

@DrAcula27, I will be splitting this ticket off right up to "Updating Huggingface Repositories - separate into 3 month chunks". Let me know if you have any concerns about this. If you had begun any Action Items from the sections beyond that^ point, please let me know in a comment below. I am beginning to split up the tickets now.

@DrAcula27
Copy link
Member

@ryanfchase That works for me! I think splitting this into multiple tickets will be helpful.

@ryanfchase ryanfchase added size: 2pt Can be done in 7-12 hours Complexity: Small and removed Complexity: Large size: 5pt Can be done in 19-30 hours labels Feb 13, 2025
@ryanfchase
Copy link
Member Author

I've moved the remaining work into this ticket. @DrAcula27 please assess if the complexity and size match the work you have contributed, and work remaining.

@DrAcula27
Copy link
Member

@ryanfchase I agree with small complexity and 2pt in size.

@ExperimentsInHonesty
Copy link
Member

@DrAcula27

Hi Danielle, this is the template I want teams to use for their weekly updates. You can always add more info to the items below 1-4, which are the required minimum.

Instructions
  1. Progress: "What is the current status of your project? What have you completed and what is left to do?"
  2. Blockers: "Difficulties or errors encountered."
  3. Availability: "How much time will you have this week to work on this issue?"
  4. ETA: "When do you expect this issue to be completed?"
  5. Pictures (if necessary): "Add any pictures that will help illustrate what you are working on."
  1. Progress:
  2. Blockers:
  3. Availability:
  4. ETA:
  5. Pictures (if necessary):

@ryanfchase
Copy link
Member Author

@DrAcula27 I've added this section to Action Items in order to tee off the next ticket in the blank map feature release:

  - [ ] create a method, `fetchData` in `db/DBRequest.jsx` that will be the accessor method for fetching data from external sources (e.g. Socrata, Huggingface)
    - [ ] leave a comment describing how we intend to use it, what parameters should be considered for this method
    - [ ] to demonstrate that it has access to duckdb, the method should console log a `DESCRIBE` call as proof of concept

With this action item added, this is ticket is getting locked in. When you complete all action items, please do the following

  • add ready for dev lead
  • move to "In Review" column

@DrAcula27
Copy link
Member

Intended use of the fetchData function

  • fetchData lives in components/db/DbRequests.jsx and is a placeholder function to request data when given a set of filter parameters e.g: date range(s), SR type, SR status, and/or NC.
  • Once fleshed out, it will replace the setData function in components/Map/index.jsx and will be the accessor method for fetching data from external sources (e.g. Socrata, Huggingface).
/**
 * Fetches data from DuckDB using provided filters.
 *
 * @param {Object} conn - The DuckDB connection instance.
 * @param {Object} filters - An object containing any or none of the following filter parameters:
 *    - startDate: (string) Start of date range (YYYY-MM-DD)
 *    - endDate: (string) End of date range (YYYY-MM-DD)
 *    - requestType: (string) Filter by request type
 *    - status: (string) Filter by status
 *    - ncName: (string) Filter by neighborhood council (NC)
 *
 * Example Usage:
 * fetchData({ conn }, { startDate: "2023-01-01", endDate: "2023-12-31", requestType: "Graffiti" });
 */

@DrAcula27
Copy link
Member

DrAcula27 commented Feb 15, 2025

Performance Comparison

Load Time Criteria - blank map on start

  • Platform: local
  • Request Status: Open
  • Commit Hash: de1005a
  • Operating System: Windows
  • Browser: Firefox

Results

  • blank map, no data on initial startup:
    • Data loading time: 107 ms
    • Map loading time: 0 ms

VS

Load Time Criteria - current setup on start

  • Platform: local
  • Request Status: Open
  • Commit Hash: 8c43ecc
  • Operating System: Windows
  • Browser: Firefox

Results

  • 1 week, all request types (what happens on initial startup in current app):
    • Data loading time: 607 ms
    • Map loading time: 440 ms

Blank map is roughly 5.67x faster

@DrAcula27 DrAcula27 mentioned this issue Feb 15, 2025
4 tasks
@DrAcula27 DrAcula27 moved this from In progress to In Review in P: 311: Project Board Feb 15, 2025
@DrAcula27 DrAcula27 added the ready for dev lead ready for developer lead to review the issue label Feb 15, 2025
@ryanfchase
Copy link
Member Author

I'm seeing some merge artifacts appearing in this PR, I'd like to address in breakouts tonight:

@ryanfchase ryanfchase added on agenda: general this ticket will be discussed at the upcoming general meeting or its breakout rooms and removed on agenda: general this ticket will be discussed at the upcoming general meeting or its breakout rooms labels Feb 19, 2025
@ryanfchase ryanfchase moved this from In Review to In progress in P: 311: Project Board Feb 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Small p-feature: data P-feature: Map ready for dev lead ready for developer lead to review the issue Role: Frontend React front end work size: 2pt Can be done in 7-12 hours
Projects
Status: In progress
Development

No branches or pull requests

4 participants