Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research incorporating population data into 311-data reports/dashboards #1229

Closed
1 task done
Tracked by #1291
chelseybeck opened this issue Oct 15, 2021 · 19 comments · Fixed by #1336
Closed
1 task done
Tracked by #1291

Research incorporating population data into 311-data reports/dashboards #1229

chelseybeck opened this issue Oct 15, 2021 · 19 comments · Fixed by #1336
Assignees
Labels
Complexity: Missing This ticket needs a complexity (good first issue, small, medium, or large) Epic feature: guide Milestone: Missing project: 311-data-dashboards Role: Data Science Data management, loading, or analysis size: Missing size: 2pt Can be done in 7-12 hours

Comments

@chelseybeck
Copy link
Member

chelseybeck commented Oct 15, 2021

Overview

Comparing neighborhoods by the number of requests when each neighborhood council differs widely in population size can be misleading.

Action Items

  • Research ways to join 311-data requests with population data

Resources/Instructions

311-Data Project Onboarding
Tech Stack

@ExperimentsInHonesty
Copy link
Member

@chelseybeck each NC serves about 40,000 people. They have subdivided in the past when they larger than that. Is this still needed?

@piotrsan
Copy link
Member

@ExperimentsInHonesty , I found this census data from 2010 by neighborhood council. According to this data, there is quite a large spread in population size of the different neighborhood councils. You mentioned that they are supposed to serve ~40k people, do you know if this is something that changed after 2010? It is not clear to me what neighborhood council boundaries were used in the analysis but it seems that it was updated fairly recently in 2020.

Here is the link to the data:
https://data.lacity.org/Community-Economic-Development/Census-Data-by-Neighborhood-Council/nwj3-ufba

Here is the population size histogram:
image

If there is indeed such a population spread it would be worth normalizing the data by population size.

@piotrsan
Copy link
Member

To adjust the data by population, I intersected the census block 2020 data from LA county ARCgis with the neighborhood councils boundaries to obtain population estimates for each neighborhood council. The analysis yielded table NC_pop_2020.csv, which was used to adjust 311 requests number by population in neighborhood.py dashboard.

@nichhk
Copy link
Member

nichhk commented Jul 19, 2022

To recap, Anupriya and Piero have both been working on ways to get more up-to-date population data per-NC.

Action items for @priyakalyan and @piotrsan:

  • Explain why getting population counts is a difficult problem (new NCs? other things? I don't quite remember).
  • Create write ups of your methodologies to get the population data. You can post them on this issue for now.
  • Post the results of your methodologies. Please create a single Google Sheet that includes methodologies A, B, and C. Also include percentage differences per NC between different methodologies (e.g., we need columns like "Percentage difference between A and B")
  • Based on the comparisons, pick a winner. Obviously, we have no ground truth here, so we'll likely need to make a judgement call. If all methodologies give similar results, I would lean towards choosing the simplest one. The simplest one will be easiest to maintain and explain.
  • If necessary, merge the code to produce our chosen methodology.
  • Find a centralized location to publish our population data and share it with the team so that other data scientists can control for population in their analyses.

@piotrsan
Copy link
Member

There is no available table of population broken down by neighborhood council. The only data available is derived from 2010 census data using old NC boundaries (when there used to only be 97 NCs instead of the current 99).

We have preliminary data from acrGIS derived by intersecting current 99 NC boundaries with 2020 census block population data. It is not clear how this is done in the background of arcGIS.

Anupriya is doing the analysis from start to end. This should be used as the final population estimates table for any normalization needs.

@priyakalyan
Copy link
Member

priyakalyan commented Jul 28, 2022

Here is a notebook detailing the calculation to determine the updated LA neighborhood council population using geospatial analysis.

This notebook compares the 2 methods (arcGIS and geopandas) that were used to evaluate the recent NC population.

Check pop_compare to access the google sheet with all the updated information.

I am attaching the csv file too- in case there is some issue opening the google sheet.
pop_compare.csv

@nichhk
Copy link
Member

nichhk commented Jul 28, 2022

@priyakalyan have you seen this? It's an approximation of population by neighborhood for LA, but the neighborhoods that they are using are different (and more granular) than actual Neighborhood Councils. At the bottom, they explain their methodology. They are taking a winner-take-all approach, meaning that they are not attempting to split census tracts across neighborhood boundaries.

The LATimes also publishes population density stats, but again, the neighborhoods that they use are different from actual NCs. I couldn't find their exact methodology, but I bet they'd be willing to help us if we emailed them ([email protected]).

@priyakalyan
Copy link
Member

@nichhk I did see that long time back when I was extensively searching for pop details. But I did not explore it further back then. I can take a look at it. Thanks for the feedback!

@piotrsan
Copy link
Member

Here is the 2020 estimates for Los Angeles from the Census Bureau.

https://www.census.gov/quickfacts/fact/table/losangelescitycalifornia/PST045221

@priyakalyan
Copy link
Member

@nichhk I emailed LA times last week requesting them to get back to us with their methodology in calculating the LA city population. Still waiting for their response.

Here is an updated notebook to calculate the population of the LA city NCs after adding area and population filter. In this recent version, the total population of the all the NCs is very close to the Census bureau value.

@akhaleghi
Copy link

@priyakalyan were you able to hear back from the LA Times?

@priyakalyan
Copy link
Member

@akhaleghi I did not hear back from them yet. It has been more than 10 days. I wrote a follow up email today.

@akhaleghi
Copy link

Hi @priyakalyan, are there any updates on this issue?

@priyakalyan
Copy link
Member

priyakalyan commented Aug 31, 2022

I have incorporated changes to updated_NC_pop repo based on the feedback given by @salice. I will be creating a PR and add it to the 311 repo by this week.

@akhaleghi
Copy link

Hey @priyakalyan could you provide a brief update on this issue? (I know you're waiting for a review but we just want to keep the status up-to-date here)

@priyakalyan
Copy link
Member

Hi @akhaleghi, sure. I have created a PR to address the NC population issue. Waiting for the review process.

@priyakalyan priyakalyan mentioned this issue Sep 18, 2022
4 tasks
@priyakalyan
Copy link
Member

Here is an update- second round of review done. The review process is still going on.

@priyakalyan
Copy link
Member

priyakalyan commented Nov 15, 2022

This issue can be closed now! Here is a link to the csv file with the updated population, population density and area (in square miles) of all the 99 NCs.

@nichhk
Copy link
Member

nichhk commented Nov 15, 2022

Anupriya, thank you for your persistence on this very challenging issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Missing This ticket needs a complexity (good first issue, small, medium, or large) Epic feature: guide Milestone: Missing project: 311-data-dashboards Role: Data Science Data management, loading, or analysis size: Missing size: 2pt Can be done in 7-12 hours
Projects
Status: Done (without merge)
7 participants