7. Create Observation Tables
Create Observation Table¶¶
An Observation Set is a collection that combines specific moments in history (timestamps) with related entity key values, used to determine feature values for those moments. Think of it as the backbone of a training dataset.
An Observation Table is its representation in the feature store.
To establish an observation table for our specific Use Case, we'll undertake the following process:
- Create an Observation Table for a Defined Context: The first step involves uploading a table specifically designed for the context "In_Store_Customer_x_ProductGroup_2023_1K". This table is a representative sample of customers captured during their store visit, specifically focusing on the moments shortly before making a purchase. Additionally, a random assortment of a few Product Groups is integrated into the data to provide a broader context of customer interactions and preferences.
- Compute and Formulate a New Table for Future Spending Predictions: Once the initial observation table is in place, we'll move to the next phase, which involves computing the target values necessary for establishing a new table titled "In_Store_Customer_x_ProductGroup_Spending_next_2_weeks_2023_1K". The purpose of this table is to forecast customer spending patterns across different product groups in the subsequent two weeks, as outlined in our Use Case.
For the sake of preview, we will create an additional observation table at the item level that we can use to preview any features.
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:06:21 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088 16:06:21 | INFO | Using profile: tutorial 16:06:21 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:06:21 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:06:21 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:06:21 | INFO | No catalog activated. 16:06:21 | INFO | Catalog activated: Grocery Dataset Tutorial
To create an ObservationTable object, there are several methods available: 1) Upload a CSV or Parquet file, 2) Use a SourceTable object, or 3) Employ a View object.
Creating an observation table by uploading a file¶
Now, let's proceed to create the observation table by uploading "In-Store_Customer_x_ProductGroup_2023_1K_sample.parquet" that you should find in the same zip folder as this notebook.
table_name = "In_Store_Customer_x_ProductGroup_2023_1K"
context_observations = fb.ObservationTable.upload(
file_path="In-Store_Customer_x_ProductGroup_2023_1K_sample.parquet",
name=table_name,
purpose=fb.Purpose.TRAINING,
primary_entities=["customer", "productgroup"],
)
Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)
context_observations.preview()
GROCERYCUSTOMERGUID | POINT_IN_TIME | PRODUCTGROUP | |
---|---|---|---|
0 | 699efd7f-aba2-4515-9335-2c8040a94f9f | 2023-12-11 08:51:22 | Fromages |
1 | 125dfe7d-eac0-4eab-94d8-1cd008e1641c | 2023-05-16 09:00:11 | Laits |
2 | 326b6ccb-0891-49fe-acbf-31d06c6d9e67 | 2023-03-20 13:34:55 | Céréales |
3 | e42fa5f3-7737-4c6a-9ef4-856f113e60bd | 2023-12-18 19:04:45 | Fromages |
4 | dde029d7-ceca-4e44-aad0-38e22ba11b74 | 2023-09-08 15:00:07 | Pains |
5 | 3396195c-5379-4b2a-809f-247546f3440f | 2023-09-29 13:04:33 | Pains |
6 | fda229da-0c9e-4555-ab2e-67f9082bd9c2 | 2023-01-21 10:10:19 | Céréales |
7 | 4eb4ee84-ee13-4eec-9c26-61b6eb4ba35b | 2023-02-27 08:57:16 | Pains |
8 | 5ad54123-027b-4c3c-b0ed-27e2ef9adc48 | 2023-11-04 16:51:42 | Fromages |
9 | 0401635c-e6ab-4525-bb5d-00aba7f6d0c4 | 2023-11-03 15:24:08 | Pains |
We will now associate the table with its relevant context.
context = catalog.get_context(
"In-Store Customer Engagement with ProductGroup"
)
context.add_observation_table(table_name)
Adding a target to observation tables¶
Let's add the target we created in the last step to create an observation table for our use case.
target = catalog.get_target("CUSTOMER_x_PRODUCTGROUP_Sum_of_TotalCost_next_2_weeks")
new_table_name="In_Store_Customer_x_ProductGroup_Spending_next_2_weeks_2023_1K"
usecase_observations = target.compute_target_table(
context_observations,
observation_table_name=new_table_name
)
Done! |████████████████████████████████████████| 100% in 21.3s (0.05%/s)
usecase_observations.preview()
GROCERYCUSTOMERGUID | POINT_IN_TIME | PRODUCTGROUP | CUSTOMER_x_PRODUCTGROUP_Sum_of_TotalCost_next_2_weeks | |
---|---|---|---|---|
0 | df3dc0a5-5f13-4818-acdb-027083662eba | 2023-01-01 10:41:26 | Céréales | 3.19 |
1 | dde029d7-ceca-4e44-aad0-38e22ba11b74 | 2023-01-03 15:04:52 | Pains | 7.50 |
2 | 153b4e21-a8e3-470d-89b3-f039b7794e3d | 2023-01-03 17:57:01 | Fromages | 7.34 |
3 | 8c4818d5-9c52-4ba7-80cc-80fd800e3b20 | 2023-01-06 12:41:00 | Laits | 9.45 |
4 | 496a6c87-95a0-4ef0-9e70-1e3d1ed4b7cc | 2023-01-07 17:56:01 | Laits | 11.36 |
5 | 2dc8b0da-9f01-417d-907c-2f7eb38b403b | 2023-01-07 19:29:44 | Laits | 1.00 |
6 | a0651429-2861-40ce-a31c-623bdb11550d | 2023-01-03 18:23:07 | Fromages | 9.35 |
7 | 4eb4ee84-ee13-4eec-9c26-61b6eb4ba35b | 2023-01-09 09:40:43 | Pains | 8.58 |
8 | a0f05477-24a1-4a71-ae7c-9684f3bc2918 | 2023-01-09 11:46:00 | Céréales | 12.29 |
9 | 7065a837-e51c-40e5-b629-0f3f87a8c1c3 | 2023-01-10 10:00:21 | Céréales | 10.58 |
Its purpose was inherited from the context table.
usecase_observations.purpose
'training'
We will now associate the table with its relevant use case.
usecase = catalog.get_use_case(
"In-Store Prediction of Customer Spending on a given Product Group next 2 Weeks"
)
usecase.add_observation_table(new_table_name)
Creating an observation from a view¶
Now, let's proceed to create the observation table from a view. We will sample 10 items for preview purpose. This table is designed to materialize features associated with any entities that have a parent relationship with an item.
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get a subset
cond = (invoiceitems_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
invoiceitems_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
invoiceitems_view_1y_view = invoiceitems_view[cond].copy()
# Create an observation table by sampling 10 rows
preview_table = invoiceitems_view_1y_view.create_observation_table(
name="Preview Table with 10 items",
sample_rows=10,
columns=["Timestamp", "GroceryInvoiceItemGuid"],
columns_rename_mapping={
"Timestamp": "POINT_IN_TIME",
"GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
},
)
preview_table.update_description(
f"10 items between 01-Jul-2022 and 30-Jun-2023"
)
preview_table.update_purpose(fb.Purpose.PREVIEW)
Done! |████████████████████████████████████████| 100% in 12.2s (0.08%/s)
List observation tables in catalog¶
catalog.list_observation_tables()
id | name | type | shape | feature_store_name | created_at | |
---|---|---|---|---|---|---|
0 | 66695726850bd33441fdc242 | Preview Table with 10 items | view | [10, 2] | playground | 2024-06-12T08:07:09.337000 |
1 | 6669570fddd5be620a410f7f | In_Store_Customer_x_ProductGroup_Spending_next... | observation_table | [1000, 4] | playground | 2024-06-12T08:06:54.934000 |
2 | 666956fdddd5be620a410f7c | In_Store_Customer_x_ProductGroup_2023_1K | uploaded_file | [1000, 3] | playground | 2024-06-12T08:06:33.543000 |