snowplow-incubator · Jack-Keene · Oct 10, 2023 · Oct 27, 2023 · Nov 5, 2023 · rlh1994
diff --git a/content/snowplow-recommendations/_index.md b/content/snowplow-recommendations/_index.md
@@ -0,0 +1,83 @@
++++
+title = "Introduction"
+menuTitle = "Introduction"
+pre = "<i class='fas fa-rocket'></i> "
+chapter = false
+weight = 1
++++
+
+### Recommendations Accelerator
+
+#### Introduction
+
+This accelerator will show you how to use the [AWS Personalize service](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) to create recommenders based on your Snowplow data, enabling you to provide personalized recommendations to your users. There are two domain use cases that are covered in this accelerator: E-commerce and Video On Demand. Although the instructions will generally reference both use cases, you can choose to do one or the other, or both.
+
+
+
+In this accelerator, we will enable you to:
+
+* Run dbt to transform your Snowplow data into a format that can be used by AWS Personalize
+* Set up the infrastructure required to run AWS Personalize, including the necessary IAM roles and policies
+* Create ecommerce and/or media (video on demand) recommenders in AWS Personalize
+* Run a local Flask application to interact with AWS Personalize 
+***
+
+#### Who is this guide for?
+
+- Data practitioners who are already familiar with dbt and Python, as well as running commands in the command line.
+- Data practitioners who want to use AWS Personalize to create recommenders for either ecommerce or media (VOD) based on their Snowplow data.
+- Data practitioners who would like to learn with a hands-on approach about AWS Personalize and how it can be used to create recommenders, using sample data.
+***
+
+#### What will be covered
+
+In an estimated minimum of 5.5 hours, you can achieve the following:
+
+- **Upload -** Upload some sample data (optional) <!-- TODO add sample data -->
+- **Create Supporting Infrastructure -** Create the supporting infrastructure needed move your data from your warehouse to S3, including the necessary IAM roles and policies
+- **Model -** Run dbt to transform your Snowplow data into a format that can be used by AWS Personalize
+- **Create Recommenders -** Create recommenders in AWS Personalize
+- **Interact with Recommenders -** Run a local Flask application to interact with AWS Personalize
+- **Next Steps -** Learn about the next steps you can take to continue your journey with AWS Personalize
+***
+
+#### What will not be covered
+
+- We will not be covering how to implement ecommerce or media tracking in this accelerator. We will also not be covering how to run the ecommerce or media dbt packages. If you would like to learn more about these, the [quickstart](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/ecommerce/) or [accelerator](https://docs.snowplow.io/accelerators/ecommerce/) for ecommerce or the [quickstart](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/media-player/) for media could be a good place to start (media accelerator coming soon).
+
+<!-- TODO fix gantt chart styling - Section names overlap chart -->
+{{<mermaid>}} 
+gantt
+        dateFormat  HH:mm
+        axisFormat %H:%M
+        section 1. Upload
+        30min       :s1, 00:00, 00:30
+        section 2. Create Supporting Infrastructure
+        2h          :s2, after s1, 02:30
+        section 3. Model
+        30min       :s3, after s2, 03:00
+        section 4. Create Recommenders
+        1h          :s4, after s3, 04:00
+        section 5. Interact with Recommenders
+        1h          :s5, after s4, 05:00
+        section 6. Next steps
+        30min       :s6, after s5, 05:30
+{{</mermaid>}}
+
+{{% notice info %}}
+The time guides are only a rough estimation. For example, if you use Terraform for Step 2, it will take significantly less time than if you follow the manual instructions.
+{{% /notice %}}
+***
+
+#### Prerequisites
+
+- If using Snowflake, you have ACCOUNTADMIN or the global CREATE STORAGE INTEGRATION privilege in your Snowflake account.
+- If using Databricks, you need to have the Account Admin, Workspace Admin, and Metastore Admin (or the CREATE EXTERNAL LOCATION privilege which the Metastore Admin will have by default) as well as the CREATE MANAGED STORAGE permissions; Unity Catalog enabled
+- You have the necessary permissions in AWS to create/update IAM roles and policies, create an S3 bucket, and create AWS personalize resources.
+
+If you are not using the sample data: 
+- An implementation of the dbt-snowplow-ecommerce package. This will have created the `snowplow_ecommerce_product_interactions` table, which is used by this dbt package to create the ecommerce dataset for AWS Personalize. If you don't have this table, you can create it by following the instructions [here](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/ecommerce/).
+
+And/or
+
+- An implementation of the dbt-snowplow-media-player package (>= 0.6.0). This will have created the `snowplow_media_player_base` table, which is used by this dbt package to create the video_on_demand dataset for AWS Personalize. If you don't have this table, you can create it by following the instructions [here](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/media-player/).
diff --git a/content/snowplow-recommendations/create-recommenders/_index.md b/content/snowplow-recommendations/create-recommenders/_index.md
@@ -0,0 +1,23 @@
++++
+title = "Create Recommenders"
+date = 2023-09-26T17:24:05+01:00
+weight = 4
+chapter = true
+pre = "4. "
++++
+
+# Create the data needed for AWS Personalize with dbt
+
+{{<mermaid>}}
+flowchart LR
+    id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
+    style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id3 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id4 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
+    style id5 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
+{{</mermaid >}}
+
+
+In this chapter you will run a python script that will create the recommenders in AWS Personalize. 
diff --git a/content/snowplow-recommendations/create-recommenders/create.md b/content/snowplow-recommendations/create-recommenders/create.md
@@ -0,0 +1,41 @@
++++
+title = "Create recommenders"
+weight = 1
+post = ""
++++
+
+#### **Step 1:** Update the config.yml file
+Set up the configuration yaml file (`config.yaml`) that'll be used in setting up the AWS Personalize service and jobs.
+
+   ```yaml
+    dataset_group_base_name: 
+    schema_base_name: 
+    dataset_base_name: 
+    import_job_base_name: 
+    s3_bucket_name: 
+    role_arn: 
+    domains_and_datasets:
+      VIDEO_ON_DEMAND:
+        enable: true
+        datasets: [interactions]
+        recommenders: [most_popular]
+      ECOMMERCE:
+        enable: true
+        datasets: [interactions, items, users]
+        recommenders: [most_viewed, customers_who_viewed_x_also_viewed]
+   ```
+
+    The dataset/group, schema and import job names can be configured to your preference. The `s3_bucket_name` should be just the name of the S3 bucket (not the full s3 path) you created either manually or via Terraform; similarly the `role_arn` is the name of the Personalize IAM role you created, it corresponds to the Terraform variable `personalize_role_name` with a default name of `PersonalizeIAMRole`.
+
+    You may choose which or both of the domains for Personalize to create by setting the `enabled` flag in the config file to `true` or `false`. Additionally you can choose which datasets to be created for training by Personalize by adding them to the list for the `datasets` parameter. Ensure that whichever datasets you add here are also exported by your dbt package by adding the corresponding table names to the `dbt_project.yml`'s `tables_to_export` config value. Finally you will be able to select which recommender you'd like Personalize to generate for your datasets by adding their config names in to the list parameter `recommenders` (refer to the table below for the list of available recommenders).
+
+#### **Step 2:** Create virtual environment
+Before running the script to create the recommenders with AWS personalize, use the requirements.txt file to set up the necessary packages in a virtual environment. 
+
+#### **Step 3:** Run script to create recommenders
+Run the script in the virtual environment with `python create_personalize_service.py` (assuming you are in the /aws_personalize_utilities directory). This will create the dataset group(s), schema(s), dataset(s), import job(s) and recommenders in Personalize, as well as a file of the recommender arns that will be used by the flask app. If you are creating both the ecommerce and video_on_demand recommenders at the same time, these will be created as two individual dataset groups. 
+
+The script has been designed to be rerun and will use the same resources that have already been created rather than re-creating them (e.g. if you have already created the dataset group, it will not create it again). If you want to re-create the resources, you will need to delete them in the AWS console or using the AWS CLI. This way, if you have an error creating a recommender, e.g. not enough interactions data, you can fix the error and rerun the script without having to delete all the resources and start again.
+
+#### **Step 4:** Monitor status of recommenders 
+Once the script has run, navigate to the recommenders section in the AWS Personalize console and monitor the status of the recommenders that are being created. This can take upwards of 15 minutes to complete. Once the status of your recommenders is `Active` you can move on to the next step. If any of the recommenders fail to be created, check the status for the reason for failure. It may be due to not having enough interactions in the dataset(s). If this happens, you will need to either not use the failed recommenders, or increase the data in your dataset. If you do the latter, you will need to re-run dbt with more data, and ensure you delete the failed recommender(s) in the console before re-running the `create_personalize_service.py` script.
diff --git a/content/snowplow-recommendations/interact/_index.md b/content/snowplow-recommendations/interact/_index.md
@@ -0,0 +1,23 @@
++++
+title = "Interact with Recommenders"
+date = 2023-09-26T17:24:05+01:00
+weight = 5
+chapter = true
+pre = "5. "
++++
+
+# Interact with recommenders
+
+{{<mermaid>}}
+flowchart LR
+    id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
+    style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id3 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id4 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id5 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
+    style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
+{{</mermaid >}}
+
+
+In this chapter you will run a local Flask app that demonstrates how to interact with the recommenders created in AWS Personalize. You can send requests to the app via curl or python, or alternatively, you can use the provided html form. 
diff --git a/content/snowplow-recommendations/interact/images/recommendation_flask_ui.png b/content/snowplow-recommendations/interact/images/recommendation_flask_ui.png
diff --git a/content/snowplow-recommendations/interact/interact-with-recommenders.md b/content/snowplow-recommendations/interact/interact-with-recommenders.md
@@ -0,0 +1,60 @@
++++
+title = "Run Flask app"
+weight = 1
+post = ""
++++
+
+Run the Flask app with `python flask_app.py` (assuming you are in the /aws_personalize_utilities directory). This will start the app on port 5000. 
+
+There are multiple ways to interact with the app. Depending on which recommender you wish to use you will need to provide the user ID and or item ID, refer to the table below for the full list of recommenders and their required inputs.
+
+#### **Option 1:** 
+Make a GET request to the app with Python:
+```
+import requests
+import json
+
+url = "http://localhost:5000/get_recommendations"
+user_id = '0000'
+recommender = 'customers_who_viewed_x_also_viewed'
+item_id = "168f54efd6529951853f254e72f4d47b" # Optional, depending on the recommender you are using
+
+data = {
+    "user_id": user_id,
+    "recommender": recommender,
+    "item_id": item_id
+}
+
+response = requests.get(url, params=data)
+if response.status_code == 200:
+    print(response.json())
+else:
+    print(response.text)
+```
+
+#### **Option 2:** 
+Make a GET request to the app with curl:
+```
+curl -X GET "http://localhost:5000/get_recommendations?user_id=0000&recommender=customers_who_viewed_x_also_viewed&item_id=168f54efd6529951853f254e72f4d47b"
+```
+
+#### **Option 3:** 
+Navigate to `http://localhost:5000/` in your browser to use the provided form to send a request. 
+
+| ![Request Form](../images/recommendation_flask_ui.png) |
+|:--:|
+| Request Form |
+
+Which recommenders are available to you will depend on the domain(s) you chose and the dataset types you created. See the below table for more information about each recommender:
+
+| Domain | Recommender | Required dataset types | Optional dataset types | Required input parameters | Optional input parameters |
+| ------ | ------ | ------ | ------ | ------ | ------ |
+| ECOMMERCE | most_viewed | interactions |  | user_id |  |
+| ECOMMERCE | best_sellers | interactions |  | user_id |  |
+| ECOMMERCE | frequently_bought_together | interactions |  | item_id |  |
+| ECOMMERCE | customers_who_viewed_x_also_viewed | interactions |  | user_id, item_id |  |
+| ECOMMERCE | recommended_for_you | interactions | items, users | user_id |  |
+| VIDEO_ON_DEMAND | because_you_watched_x | interactions |  | user_id, item_id |  |
+| VIDEO_ON_DEMAND | most_popular | interactions |  | user_id |  |
+| VIDEO_ON_DEMAND | trending_now | interactions |  |  |  |
+| VIDEO_ON_DEMAND | top_picks_for_you | interactions | items, users | user_id |  |
diff --git a/content/snowplow-recommendations/model/_index.md b/content/snowplow-recommendations/model/_index.md
@@ -0,0 +1,23 @@
++++
+title = "Model"
+date = 2023-09-26T17:24:05+01:00
+weight = 3
+chapter = true
+pre = "3. "
++++
+
+# Model the data for AWS Personalize
+
+{{<mermaid>}}
+flowchart LR
+    id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
+    style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id3 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
+    style id4 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id5 fill:#f5f5f5,stroke:#333,stroke-width:1px
+    style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
+{{</mermaid >}}
+
+
+In this chapter you will create the data needed for AWS Personalize using the dbt-snowplow-recommendations package. This package, as well as creating the data needed, also stages the data in S3 using an on-run-end hook. This means that you don't have to manually load the data to S3.
diff --git a/content/snowplow-recommendations/model/install.md b/content/snowplow-recommendations/model/install.md
@@ -0,0 +1,27 @@
++++
+title = "Install Recommendations dbt package"
+weight = 1
+post = ""
++++
+
+> Ensure you have set up a new dbt project using [`dbt init`](https://docs.getdbt.com/reference/commands/init) and validate your connection project using [`dbt debug`](https://docs.getdbt.com/reference/commands/debug) before adding our package. All commands should be run in the directory of this project.
+
+In this section you are going to be adding our `snowplow_recommendations` package to your fresh project. This will mean your project is able to run all our models, but will keep our package in the `dbt_packages` folder to keep your project clean and organized.
+
+#### **Step 1:** Add the snowplow_recommendations package
+Add the latest snowplow_recommendations `packages.yml` file, which you may have to create at the same level as your `dbt_project.yml` file. The latest version of our package can be found [here](https://hub.getdbt.com/snowplow/snowplow_recommendations/latest/).
+
+```yml
+packages:
+  - package: snowplow/snowplow_recommendations
+    version: 0.0.1
+```
+
+#### **Step 2:** Install the package
+Install the package by running:
+
+```
+dbt deps
+```
+
+Once this is done, you can find our package in the `dbt_packages` folder.