7. Create Observation Tables

Create Observation Table¶¶

An Observation Set is a collection that combines specific moments in history (timestamps) with related entity key values, used to determine feature values for those moments. Think of it as the backbone of a training dataset.

An Observation Table is its representation in the feature store.

To establish an observation table for our specific Use Case, we'll undertake the following process:

Create an Observation Table for a Defined Context: The first step involves uploading a table specifically designed for the context "In_Store_Customer_x_ProductGroup_2023_1K". This table is a representative sample of customers captured during their store visit, specifically focusing on the moments shortly before making a purchase. Additionally, a random assortment of a few Product Groups is integrated into the data to provide a broader context of customer interactions and preferences.
Compute and Formulate a New Table for Future Spending Predictions: Once the initial observation table is in place, we'll move to the next phase, which involves computing the target values necessary for establishing a new table titled "In_Store_Customer_x_ProductGroup_Spending_next_2_weeks_2023_1K". The purpose of this table is to forecast customer spending patterns across different product groups in the subsequent two weeks, as outlined in our Use Case.

For the sake of preview, we will create an additional observation table at the item level that we can use to preview any features.

In [1]:

            
                Copied!
                
import featurebyte as fb
import pandas as pd

# Set your profile to the tutorial environment
fb.use_profile("tutorial")

catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
import featurebyte as fb
import pandas as pd

# Set your profile to the tutorial environment
fb.use_profile("tutorial")

catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)

10:43:53 | WARNING  | Service endpoint is inaccessible: http://featurebyte-server:8088/
10:43:53 | INFO     | Using profile: tutorial
10:43:53 | INFO     | Using configuration file at: /Users/gxav/.featurebyte/config.yaml
10:43:53 | INFO     | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
10:43:53 | INFO     | SDK version: 2.0.1.dev67
10:43:53 | INFO     | No catalog activated.
10:43:53 | INFO     | Catalog activated: Grocery Dataset Tutorial
16:06:21 | INFO     | Using configuration file at: /Users/gxav/.featurebyte/config.yaml
16:06:21 | INFO     | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
16:06:21 | WARNING  | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior.
16:06:21 | INFO     | No catalog activated.
16:06:21 | INFO     | Catalog activated: Grocery Dataset Tutorial

To create an ObservationTable object, there are several methods available: 1) Upload a CSV or Parquet file, 2) Use a SourceTable object, or 3) Employ a View object.

Creating an observation table by uploading a file¶

Now, let's proceed to create the observation table by uploading "In-Store_Customer_x_ProductGroup_2023_1K_sample.parquet" that you should find in the same zip folder as this notebook.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
table_name = "In_Store_Customer_x_ProductGroup_2023_1K"
context_observations = fb.ObservationTable.upload(
    file_path="In-Store_Customer_x_ProductGroup_2023_1K_sample.parquet",
    name=table_name,
    purpose=fb.Purpose.TRAINING,
    primary_entities=["customer", "productgroup"],
)
table_name = "In_Store_Customer_x_ProductGroup_2023_1K"
context_observations = fb.ObservationTable.upload(
    file_path="In-Store_Customer_x_ProductGroup_2023_1K_sample.parquet",
    name=table_name,
    purpose=fb.Purpose.TRAINING,
    primary_entities=["customer", "productgroup"],
)

Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)        
Done! |████████████████████████████████████████| 100% in 15.2s (0.07%/s)

In [3]:

            
                Copied!
                
context_observations.preview()
context_observations.preview()

Out[3]:

	GROCERYCUSTOMERGUID	POINT_IN_TIME	PRODUCTGROUP
0	699efd7f-aba2-4515-9335-2c8040a94f9f	2023-12-11 08:51:22	Fromages
1	125dfe7d-eac0-4eab-94d8-1cd008e1641c	2023-05-16 09:00:11	Laits
2	326b6ccb-0891-49fe-acbf-31d06c6d9e67	2023-03-20 13:34:55	Céréales
3	e42fa5f3-7737-4c6a-9ef4-856f113e60bd	2023-12-18 19:04:45	Fromages
4	dde029d7-ceca-4e44-aad0-38e22ba11b74	2023-09-08 15:00:07	Pains
5	3396195c-5379-4b2a-809f-247546f3440f	2023-09-29 13:04:33	Pains
6	fda229da-0c9e-4555-ab2e-67f9082bd9c2	2023-01-21 10:10:19	Céréales
7	4eb4ee84-ee13-4eec-9c26-61b6eb4ba35b	2023-02-27 08:57:16	Pains
8	5ad54123-027b-4c3c-b0ed-27e2ef9adc48	2023-11-04 16:51:42	Fromages
9	0401635c-e6ab-4525-bb5d-00aba7f6d0c4	2023-11-03 15:24:08	Pains

We will now associate the table with its relevant context.

In [4]:

            
                Copied!
                
context = catalog.get_context(
    "In-Store Customer Engagement with ProductGroup"
)
context.add_observation_table(table_name)
context = catalog.get_context(
    "In-Store Customer Engagement with ProductGroup"
)
context.add_observation_table(table_name)

Adding a target to observation tables¶

Let's add the target we created in the last step to create an observation table for our use case.

In [5]:

            
                Copied!
                
target = catalog.get_target("CUSTOMER_x_PRODUCTGROUP_Sum_of_TotalCost_next_2_weeks")

new_table_name="In_Store_Customer_x_ProductGroup_Spending_next_2_weeks_2023_1K"
usecase_observations = target.compute_target_table(
    context_observations,
    observation_table_name=new_table_name
)
target = catalog.get_target("CUSTOMER_x_PRODUCTGROUP_Sum_of_TotalCost_next_2_weeks")

new_table_name="In_Store_Customer_x_ProductGroup_Spending_next_2_weeks_2023_1K"
usecase_observations = target.compute_target_table(
    context_observations,
    observation_table_name=new_table_name
)

Done! |████████████████████████████████████████| 100% in 21.3s (0.05%/s)        
Done! |████████████████████████████████████████| 100% in 21.3s (0.05%/s)

In [6]:

            
                Copied!
                
usecase_observations.preview()
usecase_observations.preview()

Out[6]:

	GROCERYCUSTOMERGUID	POINT_IN_TIME	PRODUCTGROUP	CUSTOMER_x_PRODUCTGROUP_Sum_of_TotalCost_next_2_weeks
0	699efd7f-aba2-4515-9335-2c8040a94f9f	2023-12-11 08:51:22	Fromages	14.18
1	125dfe7d-eac0-4eab-94d8-1cd008e1641c	2023-05-16 09:00:11	Laits	1.85
2	326b6ccb-0891-49fe-acbf-31d06c6d9e67	2023-03-20 13:34:55	Céréales	0.00
3	e42fa5f3-7737-4c6a-9ef4-856f113e60bd	2023-12-18 19:04:45	Fromages	9.00
4	dde029d7-ceca-4e44-aad0-38e22ba11b74	2023-09-08 15:00:07	Pains	3.49
5	3396195c-5379-4b2a-809f-247546f3440f	2023-09-29 13:04:33	Pains	4.44
6	fda229da-0c9e-4555-ab2e-67f9082bd9c2	2023-01-21 10:10:19	Céréales	0.00
7	4eb4ee84-ee13-4eec-9c26-61b6eb4ba35b	2023-02-27 08:57:16	Pains	7.84
8	5ad54123-027b-4c3c-b0ed-27e2ef9adc48	2023-11-04 16:51:42	Fromages	15.00
9	0401635c-e6ab-4525-bb5d-00aba7f6d0c4	2023-11-03 15:24:08	Pains	2.00

Its purpose was inherited from the context table.

In [7]:

            
                Copied!
                
usecase_observations.purpose
usecase_observations.purpose

Out[7]:

'training'

We will now associate the table with its relevant use case.

In [8]:

            
                Copied!
                
usecase = catalog.get_use_case(
    "In-Store Prediction of Customer Spending on a given Product Group next 2 Weeks"
)
usecase.add_observation_table(new_table_name)
usecase = catalog.get_use_case(
    "In-Store Prediction of Customer Spending on a given Product Group next 2 Weeks"
)
usecase.add_observation_table(new_table_name)

Creating an observation from a view¶

Now, let's proceed to create the observation table from a view. We will sample 10 items for preview purpose. This table is designed to materialize features associated with any entities that have a parent relationship with an item.

In [9]:

            
                Copied!
                
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")
# Get view from INVOICEITEMS item table.
invoiceitems_view = catalog.get_view("INVOICEITEMS")

In [10]:

            
                Copied!
                
                    
                    
                
                

        
# Get a subset
cond = (invoiceitems_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
    invoiceitems_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
invoiceitems_view_1y_view = invoiceitems_view[cond].copy()
# Create an observation table by sampling 10 rows
preview_table = invoiceitems_view_1y_view.create_observation_table(
    name="Preview Table with 10 items",
    sample_rows=10,
    columns=["Timestamp", "GroceryInvoiceItemGuid"],
    columns_rename_mapping={
        "Timestamp": "POINT_IN_TIME",
        "GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
    },
)
preview_table.update_description(
    f"10 items between 01-Jul-2022 and 30-Jun-2023"
)
preview_table.update_purpose(fb.Purpose.PREVIEW)
# Get a subset
cond = (invoiceitems_view["Timestamp"] >= pd.to_datetime("2022-07-01")) & (
    invoiceitems_view["Timestamp"] < pd.to_datetime("2023-07-01")
)
invoiceitems_view_1y_view = invoiceitems_view[cond].copy()
# Create an observation table by sampling 10 rows
preview_table = invoiceitems_view_1y_view.create_observation_table(
    name="Preview Table with 10 items",
    sample_rows=10,
    columns=["Timestamp", "GroceryInvoiceItemGuid"],
    columns_rename_mapping={
        "Timestamp": "POINT_IN_TIME",
        "GroceryInvoiceItemGuid": "GROCERYINVOICEITEMGUID",
    },
)
preview_table.update_description(
    f"10 items between 01-Jul-2022 and 30-Jun-2023"
)
preview_table.update_purpose(fb.Purpose.PREVIEW)

Done! |████████████████████████████████████████| 100% in 12.2s (0.08%/s)        
Done! |████████████████████████████████████████| 100% in 12.2s (0.08%/s)

List observation tables in catalog¶

In [11]:

            
                Copied!
                
catalog.list_observation_tables()
catalog.list_observation_tables()

Out[11]:

	id	name	type	shape	feature_store_name	created_at
0	6762371262b797d7f0c7d7c2	Preview Table with 10 items	view	[10, 2]	playground	2024-12-18T02:44:41.884000
1	676236fb46e7f3187677d398	In_Store_Customer_x_ProductGroup_Spending_next...	observation_table	[1000, 4]	playground	2024-12-18T02:44:26.776000
2	676236ea46e7f3187677d395	In_Store_Customer_x_ProductGroup_2023_1K	uploaded_file	[1000, 3]	playground	2024-12-18T02:44:04.417000