Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory during plot_phase #192

Closed
avivajpeyi opened this issue Mar 15, 2022 · 7 comments · Fixed by #194
Closed

Out of memory during plot_phase #192

avivajpeyi opened this issue Mar 15, 2022 · 7 comments · Fixed by #194

Comments

@avivajpeyi
Copy link
Collaborator

gosh, these incessant memory errors! It's not even during sampling but during plotting!

nbconvert.preprocessors.execute.DeadKernelError: Kernel died
slurmstepd: error: Detected 1 oom-kill event(s) in step 26320284.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: john16: task 0: Out Of Memory

Occurs at

plot_phase(tic_entry, inference_data, planet_transit_model)
@dfm
Copy link
Owner

dfm commented Mar 15, 2022

Dang - this is probably related to this line:

target=[model.lightcurve_models, model.gp_mu],

and the computation of the GP mean. I wouldn't expect it to be a huge memory cost, but I guess it is. Do you really need the GP prediction for all those samples?

@avivajpeyi
Copy link
Collaborator Author

my_planet_transit_model.gp_mu = gp.predict(

Maybe just remove t and this might just work ^TM

@avivajpeyi
Copy link
Collaborator Author

do we need to compute GP so many times?

@avivajpeyi
Copy link
Collaborator Author

probably not

@avivajpeyi
Copy link
Collaborator Author

probably just the median(GP)

@avivajpeyi
Copy link
Collaborator Author

Offt not fixed -- this is still occurring with 1.5k jobs!

Rather frustrating...

@avivajpeyi
Copy link
Collaborator Author

seems to be fixed now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants