In this project im building a MLops project based on MLFlow tool along side ci/cd with aws deployment
- What is logging and why it is important ?
- why do we need Scripts instead of notebooks ?
- double underscore methods also called dunder methods and why do we have all these init files inside each folder ?
- making a Venv -> virtual environment and installing a requirements.txt file
- first step i'm testing every step in the research folder which is full of notebooks
- building logging files
- building config yaml variables
- buildig data ingestion
- building data validation
- building data transformation
- building data trainer
- building data evaluation
- prediction
- web application
- deployment
- Update config.yaml
- Update schema.yaml
- Update params.yaml
- Update the entity
- Update the configuration manager in src config
- Update the components
- Update the pipeline
- Update the main.py # ui related
- Update the app.py # ui related
I use VSCode for this so Ctrl+p is so handful when it comes to jumping between files
Clone the repository
https://github.com/Amr-Abdellatif/End-to-End-Mlops-with-MLFlow.git
python -m venv venv
./venv/Scripts/activate
python -m pip install -r requirements.txt
# Finally run the following command
python app.py
Now,
open up you local host and port
- mlflow ui to view mlflow user interface in web
#with specific access
1. EC2 access : It is virtual machine
2. ECR: Elastic Container registry to save your docker image in aws
#Description: About the deployment
1. Build docker image of the source code
2. Push your docker image to ECR
3. Launch Your EC2
4. Pull Your image from ECR in EC2
5. Lauch your docker image in EC2
#Policy:
1. AmazonEC2ContainerRegistryFullAccess
2. AmazonEC2FullAccess
- Save the URI: 566373416292.dkr.ecr.ap-south-1.amazonaws.com/mlproj
#optinal
sudo apt-get update -y
sudo apt-get upgrade
#required
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
newgrp docker
setting>actions>runner>new self hosted runner> choose os> then run command one by one
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION = us-east-1
AWS_ECR_LOGIN_URI = demo>> 566373416292.dkr.ecr.ap-south-1.amazonaws.com
ECR_REPOSITORY_NAME = simple-app
MLflow
- Its Production Grade
- Trace all of your expriements
- Logging & tagging your model
-
DataIngestionTrainingPipeline() -> stage_01_data_ingestion -> configuration.py (class ConfigurationManager) -> Constants folder (init.py) -> init.py is pointing to 3 yaml files -> config.yaml contains the configuration required for data ingestion
-
write me later ...
-
config.yaml insertion -> define
class DataTransformationConfig:
->class ConfigurationManager:
(def get_data_transformation_config(self) -> DataTransformationConfig:) ->class DataTransformation:
(def train_test_spliting(self):) -> Worflows copy and paste parts -> create stage 3 pipeline in which inside of it we're checking if the previous step returnsTrue
or not to initiate this stage3.py