is a data system for optimized deep learning model selection. It uses a novel parallel execution strategy
called Model Hopper Parallelism (MOP) to execute end-to-end deep learning model selection workloads in a more
resource-efficient manner. Detailed technical information about Cerebro
can be found in our
Technical Report.
Prerequisites: You MUST be running on Python >= 3.6 with Tensorflow >= 2.3 (note version 2.3 and >=2.9.1 are both known to work, but version 2.4 - 2.5 are not working) and Apache Spark >= 2.4. You will need to install these separately, and you will also need to install pyspark with a matching version of your Spark. For most users, these (except for Spark, which you will need to follow their instructions) can be installed by
pip install tensorflow==2.3
pip install pyspark==<your spark version>
It's worth mentioning pyspark itself can be run in local/single-node mode without Spark installed. If you are just checking out/not using a cluster, then you can run
sudo apt-get update
sudo apt-get install -y openjdk-8-jdk
pip install pyspark==3.2.0
This alone should be sufficient for running the examples, but remember, to utilize a cluster with multiple machines, you will need Spark eventually.
Cerebro: The best way to install the Cerebro
is via pip (may not contain the latest changes).
pip install -U cerebro-dl
Alternatively, you can git clone and run the provided Makefile script
git clone https://github.com/ADALabUCSD/cerebro-system.git && cd cerebro-system && make
There are three examples in increasing complexity.
- Run the unit tests:
python -m pytest
- Run a bare minimum model selection example:
cd examples
python dummy_model_selection.py
- Run an end-to-end example:
cd examples
wget http://files.fast.ai/part2/lesson14/rossmann.tgz
tar zxvf rossmann.tgz
python rossmann_model_selection.py
Detailed documentation about the system can be found here.
This project was/is supported in part by a Hellman Fellowship, the NIDDK of the NIH under award number R01DK114945, and an NSF CAREER Award.
We used the following projects when building Cerebro.
- Horovod: Cerebro's Apache Spark implementation uses code from the Horovod's implementation for Apache Spark.
- Petastorm: We use Petastorm to read Apache Parquet data from remote storage (e.g., HDFS)
If you use this software for research, plase cite the following papers:
title={Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems},
author={Nakandala, Supun and Zhang, Yuhao and Kumar, Arun},
booktitle={Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning},