Matthew Greenberg ([email protected])
MW 12:30-13:30 in MS 434, or by appointment
Sorif Hossain ([email protected])
MWF 14:00-14:50 in KNB 126
I'll be out of town on Friday, January 17 and Monday, January 20. I'll post video lecture for those two classes through the D2L site for you to consume asynchronously.
Tu 14:00-15:50 in MS 521
There is no tutorial during the first week. Tutorials start on Tuesday, January 21.
Four assignments worth 25% each, due Febuary 3, March 3, March 24, and April 14. Submit both a runnable Jupyter notebook .ipynb file and a static .pdf file to the appropriate D2L dropbox.
Roughly half each assignment will be drawn from lab problems, the other half coming from the textbook and other sources.
We'll collate the nicest student solutions to assignment problems and post them on the course D2L site (not on GitHub!). If you don't want your work included in solution sets, please let us know.
An introduction to statistical computing and Bayesian modelling. Topics covered include random numbers generation, system/process simulation and evaluation, numerical integration, constrained and unconstrained optimization, Bayesian inference framework, single and multi-parameter models, regression models, Bayesian hierarchical modelling, Markov chain Monte Carlo. source
GHV: Mark Gelman, Jennifer Hill, and Aki Vehtari, Regression and Other Stories. web, pdf, github
Mark Gelman and Aki Vehtari, Active Statistics. web, pdf
Mark Gelman, Statistical Modeling, Causal Inference, and Social Science. blog
- Fundamentals
- GHV Chapters 4-5 (read Chapters 1-3 on you own, as review)
- Linear regression
- GHV Chapters 6-12
- Generalized linear models
- GHV Chapters 13-15
- Before and after fitting a regression
- GHV Chapters 16-17
- Causal inference
- GHV Chapters 18-21
- What comes next?
- GHV Chapter 22
And maybe some other topics as time permits (it never does).
- Statistical modeling workflow, Bayesian workflow
- Visualization for exploration, summarization, diagnostics; Matplotlib, ArviZ
- Python, NumPy, pandas, PyStan or PyMC (preferences?)
- Simulation and sampling
- Array (tensor) programming -- slicing, vectorization, einsum
- Numerical methods, algorithms
- Clone this repo
git clone https://github.com/mgreenbe/DATA335W2025.git
and move into its directory:
cd DATA335W2025
- Make a virtual environment, activate it, and install packages:
python3.13 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
- To update your local clone of the repo, run
git pull
from the repo root. To be notified about updates, watch repo by clicking the Watch button at the top of repo's GitHub page.
If you find an error in any of the material in this repo, please help me fix. If you can fix it yourself, please do so and submit a PR. If you suspect an error, open an issue.
I'm very happy to take suggestions or have discussions in the issues tab. Feel free to open them and to engage with any open issues. Please keep in mind, though, that all of these interactions are public.