Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendations Accelerator #5

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions content/snowplow-recommendations/_index.md
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add something here to note the dbt package is under SPAL licensing

Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
+++
title = "Introduction"
menuTitle = "Introduction"
pre = "<i class='fas fa-rocket'></i> "
chapter = false
weight = 1
+++

### Recommendations Accelerator

#### Introduction

This accelerator will show you how to use the [AWS Personalize service](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html) to create recommenders based on your Snowplow data, enabling you to provide personalized recommendations to your users. There are two domain use cases that are covered in this accelerator: E-commerce and Video On Demand. Although the instructions will generally reference both use cases, you can choose to do one or the other, or both.



In this accelerator, we will enable you to:

* Run dbt to transform your Snowplow data into a format that can be used by AWS Personalize
* Set up the infrastructure required to run AWS Personalize, including the necessary IAM roles and policies
* Create ecommerce and/or media (video on demand) recommenders in AWS Personalize
* Run a local Flask application to interact with AWS Personalize
***
Comment on lines +17 to +23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an image of the app here, having a picture on the front page is always good


#### Who is this guide for?

- Data practitioners who are already familiar with dbt and Python, as well as running commands in the command line.
- Data practitioners who want to use AWS Personalize to create recommenders for either ecommerce or media (VOD) based on their Snowplow data.
- Data practitioners who would like to learn with a hands-on approach about AWS Personalize and how it can be used to create recommenders, using sample data.
***

#### What will be covered

In an estimated minimum of 5.5 hours, you can achieve the following:

- **Upload -** Upload some sample data (optional) <!-- TODO add sample data -->
- **Create Supporting Infrastructure -** Create the supporting infrastructure needed move your data from your warehouse to S3, including the necessary IAM roles and policies
- **Model -** Run dbt to transform your Snowplow data into a format that can be used by AWS Personalize
- **Create Recommenders -** Create recommenders in AWS Personalize
- **Interact with Recommenders -** Run a local Flask application to interact with AWS Personalize
- **Next Steps -** Learn about the next steps you can take to continue your journey with AWS Personalize
***

#### What will not be covered

- We will not be covering how to implement ecommerce or media tracking in this accelerator. We will also not be covering how to run the ecommerce or media dbt packages. If you would like to learn more about these, the [quickstart](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/ecommerce/) or [accelerator](https://docs.snowplow.io/accelerators/ecommerce/) for ecommerce or the [quickstart](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/media-player/) for media could be a good place to start (media accelerator coming soon).

<!-- TODO fix gantt chart styling - Section names overlap chart -->
{{<mermaid>}}
gantt
dateFormat HH:mm
axisFormat %H:%M
section 1. Upload
30min :s1, 00:00, 00:30
section 2. Create Supporting Infrastructure
2h :s2, after s1, 02:30
section 3. Model
30min :s3, after s2, 03:00
section 4. Create Recommenders
1h :s4, after s3, 04:00
section 5. Interact with Recommenders
1h :s5, after s4, 05:00
section 6. Next steps
30min :s6, after s5, 05:30
{{</mermaid>}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR, but Jack do you have any idea how to make this chart look better, so that the text doesn't go into the chart area? I tried when I was writing it but couldn't figure it out, and it looks kinda janky


{{% notice info %}}
The time guides are only a rough estimation. For example, if you use Terraform for Step 2, it will take significantly less time than if you follow the manual instructions.
{{% /notice %}}
***

#### Prerequisites

- If using Snowflake, you have ACCOUNTADMIN or the global CREATE STORAGE INTEGRATION privilege in your Snowflake account.
- If using Databricks, you need to have the Account Admin, Workspace Admin, and Metastore Admin (or the CREATE EXTERNAL LOCATION privilege which the Metastore Admin will have by default) as well as the CREATE MANAGED STORAGE permissions; Unity Catalog enabled
- You have the necessary permissions in AWS to create/update IAM roles and policies, create an S3 bucket, and create AWS personalize resources.

If you are not using the sample data:
- An implementation of the dbt-snowplow-ecommerce package. This will have created the `snowplow_ecommerce_product_interactions` table, which is used by this dbt package to create the ecommerce dataset for AWS Personalize. If you don't have this table, you can create it by following the instructions [here](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/ecommerce/).

And/or

- An implementation of the dbt-snowplow-media-player package (>= 0.6.0). This will have created the `snowplow_media_player_base` table, which is used by this dbt package to create the video_on_demand dataset for AWS Personalize. If you don't have this table, you can create it by following the instructions [here](https://docs.snowplow.io/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-quickstart/media-player/).
Comment on lines +77 to +83
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to format/write this?

23 changes: 23 additions & 0 deletions content/snowplow-recommendations/create-recommenders/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "Create Recommenders"
date = 2023-09-26T17:24:05+01:00
weight = 4
chapter = true
pre = "4. "
+++

# Create the data needed for AWS Personalize with dbt

{{<mermaid>}}
flowchart LR
id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id3 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id4 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
style id5 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
{{</mermaid >}}


In this chapter you will run a python script that will create the recommenders in AWS Personalize.
41 changes: 41 additions & 0 deletions content/snowplow-recommendations/create-recommenders/create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
+++
title = "Create recommenders"
weight = 1
post = ""
+++

#### **Step 1:** Update the config.yml file
Set up the configuration yaml file (`config.yaml`) that'll be used in setting up the AWS Personalize service and jobs.

```yaml
dataset_group_base_name:
schema_base_name:
dataset_base_name:
import_job_base_name:
s3_bucket_name:
role_arn:
domains_and_datasets:
VIDEO_ON_DEMAND:
enable: true
datasets: [interactions]
recommenders: [most_popular]
ECOMMERCE:
enable: true
datasets: [interactions, items, users]
recommenders: [most_viewed, customers_who_viewed_x_also_viewed]
```

The dataset/group, schema and import job names can be configured to your preference. The `s3_bucket_name` should be just the name of the S3 bucket (not the full s3 path) you created either manually or via Terraform; similarly the `role_arn` is the name of the Personalize IAM role you created, it corresponds to the Terraform variable `personalize_role_name` with a default name of `PersonalizeIAMRole`.

You may choose which or both of the domains for Personalize to create by setting the `enabled` flag in the config file to `true` or `false`. Additionally you can choose which datasets to be created for training by Personalize by adding them to the list for the `datasets` parameter. Ensure that whichever datasets you add here are also exported by your dbt package by adding the corresponding table names to the `dbt_project.yml`'s `tables_to_export` config value. Finally you will be able to select which recommender you'd like Personalize to generate for your datasets by adding their config names in to the list parameter `recommenders` (refer to the table below for the list of available recommenders).
Comment on lines +28 to +30
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be formatted as code?


#### **Step 2:** Create virtual environment
Before running the script to create the recommenders with AWS personalize, use the requirements.txt file to set up the necessary packages in a virtual environment.

Comment on lines +31 to +34
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify python venv, maybe link to instructions to set up a venv?

#### **Step 3:** Run script to create recommenders
Run the script in the virtual environment with `python create_personalize_service.py` (assuming you are in the /aws_personalize_utilities directory). This will create the dataset group(s), schema(s), dataset(s), import job(s) and recommenders in Personalize, as well as a file of the recommender arns that will be used by the flask app. If you are creating both the ecommerce and video_on_demand recommenders at the same time, these will be created as two individual dataset groups.

The script has been designed to be rerun and will use the same resources that have already been created rather than re-creating them (e.g. if you have already created the dataset group, it will not create it again). If you want to re-create the resources, you will need to delete them in the AWS console or using the AWS CLI. This way, if you have an error creating a recommender, e.g. not enough interactions data, you can fix the error and rerun the script without having to delete all the resources and start again.

#### **Step 4:** Monitor status of recommenders
Once the script has run, navigate to the recommenders section in the AWS Personalize console and monitor the status of the recommenders that are being created. This can take upwards of 15 minutes to complete. Once the status of your recommenders is `Active` you can move on to the next step. If any of the recommenders fail to be created, check the status for the reason for failure. It may be due to not having enough interactions in the dataset(s). If this happens, you will need to either not use the failed recommenders, or increase the data in your dataset. If you do the latter, you will need to re-run dbt with more data, and ensure you delete the failed recommender(s) in the console before re-running the `create_personalize_service.py` script.
23 changes: 23 additions & 0 deletions content/snowplow-recommendations/interact/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "Interact with Recommenders"
date = 2023-09-26T17:24:05+01:00
weight = 5
chapter = true
pre = "5. "
+++

# Interact with recommenders

{{<mermaid>}}
flowchart LR
id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id3 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id4 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id5 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
{{</mermaid >}}


In this chapter you will run a local Flask app that demonstrates how to interact with the recommenders created in AWS Personalize. You can send requests to the app via curl or python, or alternatively, you can use the provided html form.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
+++
title = "Run Flask app"
weight = 1
post = ""
+++

Run the Flask app with `python flask_app.py` (assuming you are in the /aws_personalize_utilities directory). This will start the app on port 5000.

There are multiple ways to interact with the app. Depending on which recommender you wish to use you will need to provide the user ID and or item ID, refer to the table below for the full list of recommenders and their required inputs.

#### **Option 1:**
Make a GET request to the app with Python:
```
import requests
import json

url = "http://localhost:5000/get_recommendations"
user_id = '0000'
recommender = 'customers_who_viewed_x_also_viewed'
item_id = "168f54efd6529951853f254e72f4d47b" # Optional, depending on the recommender you are using

data = {
"user_id": user_id,
"recommender": recommender,
"item_id": item_id
}

response = requests.get(url, params=data)
if response.status_code == 200:
print(response.json())
else:
print(response.text)
```

#### **Option 2:**
Make a GET request to the app with curl:
```
curl -X GET "http://localhost:5000/get_recommendations?user_id=0000&recommender=customers_who_viewed_x_also_viewed&item_id=168f54efd6529951853f254e72f4d47b"
```

#### **Option 3:**
Navigate to `http://localhost:5000/` in your browser to use the provided form to send a request.

| ![Request Form](../images/recommendation_flask_ui.png) |
|:--:|
Comment on lines +43 to +45
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this not a full-width image? Seems a bit big

| Request Form |

Which recommenders are available to you will depend on the domain(s) you chose and the dataset types you created. See the below table for more information about each recommender:

| Domain | Recommender | Required dataset types | Optional dataset types | Required input parameters | Optional input parameters |
Comment on lines +47 to +50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this under a new header to make clearer it applies to all options?

| ------ | ------ | ------ | ------ | ------ | ------ |
| ECOMMERCE | most_viewed | interactions | | user_id | |
| ECOMMERCE | best_sellers | interactions | | user_id | |
| ECOMMERCE | frequently_bought_together | interactions | | item_id | |
| ECOMMERCE | customers_who_viewed_x_also_viewed | interactions | | user_id, item_id | |
| ECOMMERCE | recommended_for_you | interactions | items, users | user_id | |
| VIDEO_ON_DEMAND | because_you_watched_x | interactions | | user_id, item_id | |
| VIDEO_ON_DEMAND | most_popular | interactions | | user_id | |
| VIDEO_ON_DEMAND | trending_now | interactions | | | |
| VIDEO_ON_DEMAND | top_picks_for_you | interactions | items, users | user_id | |
23 changes: 23 additions & 0 deletions content/snowplow-recommendations/model/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
+++
title = "Model"
date = 2023-09-26T17:24:05+01:00
weight = 3
chapter = true
pre = "3. "
+++

# Model the data for AWS Personalize

{{<mermaid>}}
flowchart LR
id1(Upload)-->id2(Create Supporting Infrastructure)-->id3(Model)-->id4(Create Recommenders)-->id5(Interact with Recommenders)-->id6(Next steps)
style id1 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id2 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id3 fill:#f5f5f5,stroke:#6638B8,stroke-width:3px
style id4 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id5 fill:#f5f5f5,stroke:#333,stroke-width:1px
style id6 fill:#f5f5f5,stroke:#333,stroke-width:1px
{{</mermaid >}}


In this chapter you will create the data needed for AWS Personalize using the dbt-snowplow-recommendations package. This package, as well as creating the data needed, also stages the data in S3 using an on-run-end hook. This means that you don't have to manually load the data to S3.
27 changes: 27 additions & 0 deletions content/snowplow-recommendations/model/install.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
+++
title = "Install Recommendations dbt package"
weight = 1
post = ""
+++

> Ensure you have set up a new dbt project using [`dbt init`](https://docs.getdbt.com/reference/commands/init) and validate your connection project using [`dbt debug`](https://docs.getdbt.com/reference/commands/debug) before adding our package. All commands should be run in the directory of this project.

In this section you are going to be adding our `snowplow_recommendations` package to your fresh project. This will mean your project is able to run all our models, but will keep our package in the `dbt_packages` folder to keep your project clean and organized.

#### **Step 1:** Add the snowplow_recommendations package
Add the latest snowplow_recommendations `packages.yml` file, which you may have to create at the same level as your `dbt_project.yml` file. The latest version of our package can be found [here](https://hub.getdbt.com/snowplow/snowplow_recommendations/latest/).

```yml
packages:
- package: snowplow/snowplow_recommendations
version: 0.0.1
```

#### **Step 2:** Install the package
Install the package by running:

```
dbt deps
```

Once this is done, you can find our package in the `dbt_packages` folder.
Loading