Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedule time with existing users to upgrade local testing #294

Closed
tylerpotts opened this issue Feb 2, 2021 · 8 comments
Closed

Schedule time with existing users to upgrade local testing #294

tylerpotts opened this issue Feb 2, 2021 · 8 comments

Comments

@tylerpotts
Copy link
Contributor

Local testing in #135 will break current clusters. This change will prevent future breakages from happening.

@rsignell-usgs @jkellndorfer Giving you the heads up that this is coming in the next week or two. I'll create a PR on your QHub that will manage this transition when it's time to merge the changes.

@rsignell-usgs
Copy link
Contributor

rsignell-usgs commented Feb 2, 2021

Sounds good @tylerpotts ! I'm wondering whether we are feeling changes already, as I tried adding a package to my env this morning (several hours ago) and the github actions were successful, but no new package appears in my environment. Is it possible that this package pushed my container size over some limit and there was a failure downstream of the github actions? (my "pangeo" environment must be huge -- 377 packages)

Where should I raise this?

@tylerpotts
Copy link
Contributor Author

tylerpotts commented Feb 3, 2021

@rsignell-usgs sounds like either A) conda environment solve is failing, or B) your conda store is out of space. Solutions for those potential problems:

A) check the latest log file from a user instance in the /home/conda/store/.logs directory. Note that it can take a while (>30 mins) to build complex environments. These logs also don't seem to populate until the environment has either built or failed.
B) Increase your storage space for conda store in qhub-config.yaml.

Also, I believe that conda store doesn't currently garbage collect (will double check with @costrouc). If that's still the case, the shared NFS volume can fill up quickly with large environments. If you ssh into the pod, you can manually delete all the old environment builds to get more space. This will be a valid step if B) fixes the problem, and I'll write some docs for that if needed

@rsignell-usgs
Copy link
Contributor

rsignell-usgs commented Feb 3, 2021

Pretty sure it's B).

@tylerpotts , does this log from the conda store pod give any clues?
2021-02-03_15-43-26

@rsignell-usgs
Copy link
Contributor

rsignell-usgs commented Feb 3, 2021

If you ssh into the pod, you can manually delete all the old environment builds to get more space.

Is there an example or doc on how to do that?

I'm hesitant to increase conda_store in the config because I think that's how @jkellndorfer ended up with a non-functioning qhub. Is that right Josef?

storage:
  conda_store: 40Gi
  shared_filesystem: 100Gi

@jkellndorfer
Copy link
Contributor

that's right, @rsignell-usgs . Essentially started over.

@github-actions
Copy link

This issue has been automatically marked as stale because there was no recent activity in 60 days. Remove the stale label or add a comment, otherwise, this issue will automatically be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the status: stale 🥖 Not up to date with the default branch - needs update label May 24, 2021
@viniciusdc
Copy link
Contributor

viniciusdc commented May 28, 2021

I'm removing the stale tag as this issue raises a good faq to handle conda-store common environment errors. (as well how to overcome the permission restrictions on deleting conda-store resources)

@viniciusdc viniciusdc removed the status: stale 🥖 Not up to date with the default branch - needs update label May 28, 2021
@tylerpotts
Copy link
Contributor Author

tylerpotts commented Jun 16, 2021

@viniciusdc The original purpose of this issue is now complete. We were able to help get the USGS cluster destroyed. If you create a new detailing what you mean about conda-store environment errors and permission restrictions we can take it from there.

Closing this one out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants