Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The indexing of (very) large packages never completes #362

Closed
2 tasks done
keshav-space opened this issue Apr 1, 2024 · 0 comments · Fixed by #375
Closed
2 tasks done

The indexing of (very) large packages never completes #362

keshav-space opened this issue Apr 1, 2024 · 0 comments · Fixed by #375
Assignees

Comments

@keshav-space
Copy link
Member

keshav-space commented Apr 1, 2024

/api/scan_queue/update_status takes too long to return for large scans

It appears like update_status is doing too many things which takes it forever to return, in some case over an hour. Here is a design, breaking the scan upload process in two parts:

  • First simply collect the scan results (possibly stored in a local temp file), and return immediately, possibly updating the status to some intermediate status. This should accept compressed data.
  • Push the index_package step to Redis queue (we are already using the Redis queue for PURL watch feature) to process asynchronously and update the status when completed

Also see:

JonoYang added a commit that referenced this issue Apr 3, 2024
JonoYang added a commit that referenced this issue Apr 3, 2024
@JonoYang JonoYang self-assigned this Apr 3, 2024
@pombredanne pombredanne changed the title /api/scan_queue/update_status takes too long to return for large scans The indexing of (very) large packages never completes Apr 3, 2024
JonoYang added a commit that referenced this issue Apr 3, 2024
    * Create new volume to store temporary files to share between Docker services

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Apr 4, 2024
    * Add new envvar PURLDB_ASYNC

Signed-off-by: Jono Yang <[email protected]>
@keshav-space keshav-space linked a pull request Apr 5, 2024 that will close this issue
JonoYang added a commit that referenced this issue Apr 8, 2024
    * This is to avoid serializing the entire ScannableURI object when placing the task onto the queue

Signed-off-by: Jono Yang <[email protected]>
JonoYang added a commit that referenced this issue Apr 8, 2024
JonoYang added a commit that referenced this issue Apr 8, 2024
@pombredanne pombredanne moved this to Done in 04-purl2sym May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants