-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors from bulk analysis #166
Comments
Full log file for all jobs:http://catalog.tess-atlas.cloud.edu.au/content/toi_notebooks/tess_atlas_runner.log grepped Error logsAttached is a file of grepped results from the logs of the jobs Error summaryHere are some details on the errors (2K+ errors):
|
SummaryOver the weekend I ran the following batches independently
Focusing on errors from normal TOIS ''Normal'' TOI errors:2563/4878 failed (~50%)
/fred/oz200/avajpeyi/projects/tess-atlas/src/tess_atlas/data/lightcurve_data.py in from_database(cls, tic, outdir)
40
41 logger.info("Downloading LightCurveData from MAST")
---> 42 search = lk.search_lightcurve(
43 target=f"TIC {tic}", mission="TESS", author="SPOC"
44 )
...
ConnectionError: HTTPSConnectionPool(host='mast.stsci.edu', port=443): Max retries exceeded with url: /portal/Mashup/Mashup.asmx/columnsconfig (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2b47dea1d9a0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
src/tess_atlas/plotting/extra_plotting/ci.py in plot_xy_binned(x, y, yerr, ax, bins)
81
82 def plot_xy_binned(x, y, yerr, ax, bins):
---> 83 bins = np.linspace(min(x), max(x), bins)
84 denom, _ = np.histogram(x, bins)
85 num, _ = np.histogram(x, bins, weights=y)
ValueError: min() arg is an empty sequence
RuntimeError: Chain 0 failed.
/tmp/ipykernel_124356/3349958539.py in run_inference(model)
4 sampling_kwargs = dict(tune=2000, draws=2000, chains=2, cores=2)
5 logger.info(f"Run sampler with kwargs: {sampling_kwargs}")
----> 6 inference_data = pmx.sample(
7 **sampling_kwargs, start=init_params, return_inferencedata=True
8 )
...
fred/oz200/avajpeyi/envs/tess/lib/python3.8/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
357 else:
358 error = RuntimeError("Chain %s failed." % proc.chain)
--> 359 raise error from old_error
360 elif msg[0] == "writing_done":
361 proc._readable = True
LinAlgError: failed to factorize or solve matrix
src/tess_atlas/data/inference_data_tools.py in get_optimized_init_params(model, planet_params, noise_params, stellar_params, period_params, theta, verbose)
140 theta = pmx.optimize(theta, [noise_params[0]], **kwargs)
141 theta = pmx.optimize(theta, planet_params, **kwargs)
--> 142 theta = pmx.optimize(theta, noise_params, **kwargs)
143 theta = pmx.optimize(theta, stellar_params, **kwargs)
144 theta = pmx.optimize(theta, period_params, **kwargs)
...
|
Maybe I can try to resub the ones that had |
Great, and then I'd try manually running one or two from each other category to see where the problem is coming from and if you can fix it by tweaking things. |
On re-running the LK download for the TOIs, I got See #157 |
yeah - you're definitely getting throttled. We probably need to put in some more friendly timeouts and back off strategies so that we don't piss off the archives. You could try catching that exception, adding a sleep for a minute and then trying again. |
LinAlgError: see #80 |
Reran the TOIs with the various fixes: Some things to look into: 2. From 1939 analysed TOIs why did ~586 fail?
3. From the 1353 successful runs, are the fits sensible? Most look good! Some look a bit weird (uncategorised)
Some initial fits still looking off:
looking at the generation logs (job to download TOI data) -- there are 2833 generation logs (so there should be at least 2833 notebooks)? More questions: >>> from tess_atlas.data.exofop import get_toi_list
>>> len(get_toi_list(remove_toi_without_lk=True))
2833
>>> len(get_toi_list(remove_toi_without_lk=False))
5525 Ok so 5. If there are 2833 generation jobs -- why aren't there 2833 TOI notebooks that were run (rather than 1939 ) |
Summary from sleuthing:
|
The from the 219 TOI analyses that did not have netcdfs appear to have not completed their ~28 of these have a Example TOI 104: See #214 |
2712/2833 analyses finished! Summary of errors121 Errors:
Execution errors
Need to check lens of the jobs that had time/mem errors |
From a total of 4.5K TOIs,
The text was updated successfully, but these errors were encountered: