Store Sales Forecast: End-to-End SDK + API Tutorial¶

This tutorial replicates the Store Sales Forecast UI Tutorials using Python code. It demonstrates time series forecasting: predicting daily store sales 28 days ahead using the M5 dataset.

This tutorial focuses on forecast-specific steps — forecast automation, naive prediction, and forecast comparison plots. Steps common to all use cases (source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, and deployment) are covered in the Credit Default tutorial and work the same way for forecast use cases.

Prerequisites:

FeatureByte instance with the playground feature store connected to DEMO_DATASETS.M5_STORE_SALES_AMOUNT
Python environment with featurebyte SDK installed
Profile tutorial configured (see SDK Setup)

What you'll build:

Register 3 source tables and tag entities (SDK)
Formulate a forecast use case (SDK)
Generate observation tables via forecast automation (API)
Run an automated ideation pipeline (API)
Train a standalone model and evaluate with forecast comparison plots (API)

Setup¶

In [1]:

            
                Copied!
                
import time
import featurebyte as fb

fb.use_profile("tutorial")

DATABASE_NAME = "DEMO_DATASETS"
SCHEMA_NAME = "M5_STORE_SALES_AMOUNT"
CATALOG_NAME = "Store Sales Forecast API Tutorial"
import time
import featurebyte as fb

fb.use_profile("tutorial")

DATABASE_NAME = "DEMO_DATASETS"
SCHEMA_NAME = "M5_STORE_SALES_AMOUNT"
CATALOG_NAME = "Store Sales Forecast API Tutorial"

WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=2, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fc940>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=1, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fcfd0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=0, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fd030>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
06:46:43 | WARNING  | Service endpoint is inaccessible: http://127.0.0.1:5000/api/v1
WARNING :featurebyte:Service endpoint is inaccessible: http://127.0.0.1:5000/api/v1
06:46:43 | INFO     | Using profile: tutorial
INFO    :featurebyte:Using profile: tutorial
06:46:43 | INFO     | Using configuration file at: /Users/gxav/.featurebyte/config.yaml
INFO    :featurebyte:Using configuration file at: /Users/gxav/.featurebyte/config.yaml
06:46:43 | INFO     | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
INFO    :featurebyte:Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
06:46:43 | INFO     | SDK version: 3.4.1.dev7
INFO    :featurebyte:SDK version: 3.4.1.dev7
06:46:43 | INFO     | No catalog activated.
INFO    :featurebyte:No catalog activated.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
def wait_for_task(client, task_id, poll_interval=30):
    """Poll a task until completion. Returns the full task response."""
    while True:
        task = client.get(f"/task/{task_id}").json()
        if task["status"] in ("SUCCESS", "FAILURE"):
            if task["status"] == "FAILURE":
                print(f"Task FAILED: {task.get('traceback', 'no traceback')}")
            return task
        print(f"  status: {task['status']}...")
        time.sleep(poll_interval)
def wait_for_task(client, task_id, poll_interval=30):
    """Poll a task until completion. Returns the full task response."""
    while True:
        task = client.get(f"/task/{task_id}").json()
        if task["status"] in ("SUCCESS", "FAILURE"):
            if task["status"] == "FAILURE":
                print(f"Task FAILED: {task.get('traceback', 'no traceback')}")
            return task
        print(f"  status: {task['status']}...")
        time.sleep(poll_interval)

Step 1: Create Catalog & Register Tables¶

Corresponds to UI Tutorial: Create Catalog and Register Tables

We register 3 tables:

SALES (Time Series) — daily store sales with timezone support
CALENDAR (Calendar) — calendar features by state
STORE_STATE (Dimension) — store-to-state mapping

In [3]:

            
                Copied!
                
                    
                    
                
                

        
catalog = fb.Catalog.create(CATALOG_NAME, "playground")
catalog.activate(CATALOG_NAME)

ds = catalog.get_data_source()

def get_source(table_name):
    return ds.get_source_table(
        database_name=DATABASE_NAME,
        schema_name=SCHEMA_NAME,
        table_name=table_name,
    )

print(f"Catalog '{catalog.name}' created")
catalog = fb.Catalog.create(CATALOG_NAME, "playground")
catalog.activate(CATALOG_NAME)

ds = catalog.get_data_source()

def get_source(table_name):
    return ds.get_source_table(
        database_name=DATABASE_NAME,
        schema_name=SCHEMA_NAME,
        table_name=table_name,
    )

print(f"Catalog '{catalog.name}' created")

06:46:44 | INFO     | Catalog activated: Store Sales Forecast API Tutorial
INFO    :featurebyte.api.catalog:Catalog activated: Store Sales Forecast API Tutorial

Catalog 'Store Sales Forecast API Tutorial' created

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# Time series table: daily sales per store
sales = get_source("SALES").create_time_series_table(
    name="SALES",
    reference_datetime_column="date",
    reference_datetime_schema=fb.TimestampSchema(
        is_utc_time=False,
        timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
    ),
    time_interval=fb.TimeInterval(value=1, unit="DAY"),
    series_id_column="store_id",
    record_creation_timestamp_column="record_creation_date",
)
sales.update_default_feature_job_setting(
    feature_job_setting=fb.CronFeatureJobSetting(
        crontab="30 2 * * *",
        timezone="America/Los_Angeles",
    )
)
print("Registered SALES (Time Series)")
# Time series table: daily sales per store
sales = get_source("SALES").create_time_series_table(
    name="SALES",
    reference_datetime_column="date",
    reference_datetime_schema=fb.TimestampSchema(
        is_utc_time=False,
        timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
    ),
    time_interval=fb.TimeInterval(value=1, unit="DAY"),
    series_id_column="store_id",
    record_creation_timestamp_column="record_creation_date",
)
sales.update_default_feature_job_setting(
    feature_job_setting=fb.CronFeatureJobSetting(
        crontab="30 2 * * *",
        timezone="America/Los_Angeles",
    )
)
print("Registered SALES (Time Series)")

Registered SALES (Time Series)

In [5]:

            
                Copied!
                
                    
                    
                
                

        
# Dimension table: store-to-state mapping
store_state = get_source("STORE_STATE").create_dimension_table(
    name="STORE_STATE",
    dimension_id_column="store_id",
)
print("Registered STORE_STATE (Dimension)")
# Dimension table: store-to-state mapping
store_state = get_source("STORE_STATE").create_dimension_table(
    name="STORE_STATE",
    dimension_id_column="store_id",
)
print("Registered STORE_STATE (Dimension)")

Registered STORE_STATE (Dimension)

In [6]:

            
                Copied!
                
                    
                    
                
                

        
# Calendar table: calendar features by state
calendar = get_source("CALENDAR").create_calendar_table(
    name="CALENDAR",
    calendar_datetime_column="date",
    calendar_datetime_schema=fb.TimestampSchema(),
    series_id_column="state_id",
)
print("Registered CALENDAR (Calendar)")
# Calendar table: calendar features by state
calendar = get_source("CALENDAR").create_calendar_table(
    name="CALENDAR",
    calendar_datetime_column="date",
    calendar_datetime_schema=fb.TimestampSchema(),
    series_id_column="state_id",
)
print("Registered CALENDAR (Calendar)")

Registered CALENDAR (Calendar)

SDK Reference: Table | SourceTable.create_time_series_table() | SourceTable.create_dimension_table() | SourceTable.create_calendar_table() | CronFeatureJobSetting

Step 2: Register Entities¶

Corresponds to UI Tutorial: Register Entities

In [7]:

            
                Copied!
                
entity_store = fb.Entity.create(name="Store", serving_names=["store_id"])
entity_state = fb.Entity.create(name="State", serving_names=["state_id"])

# Tag entity columns
sales["store_id"].as_entity("Store")

store_state["store_id"].as_entity("Store")
store_state["state_id"].as_entity("State")

calendar["state_id"].as_entity("State")

print("Entities created and tagged")
entity_store = fb.Entity.create(name="Store", serving_names=["store_id"])
entity_state = fb.Entity.create(name="State", serving_names=["state_id"])

# Tag entity columns
sales["store_id"].as_entity("Store")

store_state["store_id"].as_entity("Store")
store_state["state_id"].as_entity("State")

calendar["state_id"].as_entity("State")

print("Entities created and tagged")

Entities created and tagged

SDK Reference: Entity | TableColumn.as_entity()

Step 3: Formulate Forecast Use Case¶

Corresponds to UI Tutorial: Formulate Use Case

Create a forecast context with daily granularity and a regression target from the SALES table.

In [8]:

            
                Copied!
                
                    
                    
                
                

        
context = fb.Context.create(
    name="Store Daily Forecast",
    primary_entity=["Store"],
    description="Daily forecasting per store across Walmart locations.",
    forecast_point_schema=fb.ForecastPointSchema(
        granularity=fb.TimeIntervalUnit.DAY,
        dtype=fb.DBVarType.TIMESTAMP,
        is_utc_time=False,
        timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
    ),    
)
print(f"Context: {context.name}")
context = fb.Context.create(
    name="Store Daily Forecast",
    primary_entity=["Store"],
    description="Daily forecasting per store across Walmart locations.",
    forecast_point_schema=fb.ForecastPointSchema(
        granularity=fb.TimeIntervalUnit.DAY,
        dtype=fb.DBVarType.TIMESTAMP,
        is_utc_time=False,
        timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
    ),    
)
print(f"Context: {context.name}")

Context: Store Daily Forecast

In [9]:

            
                Copied!
                
                    
                    
                
                

        
# Create target from the SALES table
sales_table = catalog.get_table("SALES")
sales_view = sales_table.get_view()
target = sales_view["sales_amount"].as_target(
    "sales_amount",
    target_type=fb.TargetType.REGRESSION,
    fill_value=0,
)
target.save()
print(f"Target: {target.name}")
# Create target from the SALES table
sales_table = catalog.get_table("SALES")
sales_view = sales_table.get_view()
target = sales_view["sales_amount"].as_target(
    "sales_amount",
    target_type=fb.TargetType.REGRESSION,
    fill_value=0,
)
target.save()
print(f"Target: {target.name}")

Target: sales_amount

In [10]:

            
                Copied!
                
                    
                    
                
                

        
use_case = fb.UseCase.create(
    name="Store Daily Sales Amount Forecast for 28 Days",
    target_name="sales_amount",
    context_name="Store Daily Forecast",
    description="Predict daily sales amount (revenue) per store up to 28 days ahead.",
)
use_case_id = str(use_case.id)
print(f"Use case: {use_case.name} (id: {use_case_id})")
use_case = fb.UseCase.create(
    name="Store Daily Sales Amount Forecast for 28 Days",
    target_name="sales_amount",
    context_name="Store Daily Forecast",
    description="Predict daily sales amount (revenue) per store up to 28 days ahead.",
)
use_case_id = str(use_case.id)
print(f"Use case: {use_case.name} (id: {use_case_id})")

Use case: Store Daily Sales Amount Forecast for 28 Days (id: 69d97de58790ab65aa5483fb)

SDK Reference: Context | Target | UseCase

Switch to API¶

From here, we use the REST API for forecast automation, ideation, training, and deployment.

Note: This tutorial focuses on forecast-specific steps. The following steps are covered in the Credit Default tutorial and work the same way for forecast use cases: source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, Parquet download, and deployment. The ideation pipeline handles table EDA and semantic detection automatically when run end-to-end.

In [11]:

            
                Copied!
                
client = fb.Configurations().get_client()
client = fb.Configurations().get_client()

Step 4: Forecast Automation — Create Observation Tables¶

Corresponds to UI Tutorial: Create Observation Tables API docs: Observation Table Automation

Generate training, validation, and holdout observation tables automatically using forecast automation. This endpoint samples from the time series data based on the prediction schedule and horizon.

In [12]:

            
                Copied!
                
                    
                    
                
                

        
response = client.post(
    "/observation_table/forecast_automation",
    json={
        "use_case_id": use_case_id,
        "prediction_schedule_cron": "30 3 * * 1",
        "prediction_schedule_timezone": "America/Los_Angeles",
        "forecast_start_offset": 0,
        "forecast_horizon": 28,
        "periods": [
            {
                "start": "2012-03-01",
                "end": "2016-01-01",
                "name": "Training",
                "target_observation_count": 50000,
                "purpose": "training",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-01-01",
                "end": "2016-04-01",
                "name": "Validation_eval",
                "target_observation_count": 50000,
                "purpose": "validation_test",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-04-01",
                "end": "2016-05-23",
                "name": "Holdout_eval",
                "target_observation_count": 50000,
                "purpose": "validation_test",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-01-01",
                "end": "2016-05-23",
                "name": "Forecast_series",
                "target_observation_count": 50000,
                "purpose": "other",
                "mode": "FORECAST_SERIES",
            },
        ],
    },
)
task_id = response.json()["id"]
print(f"Forecast automation started (task: {task_id})")
task = wait_for_task(client, task_id)
print(f"Forecast automation: {task['status']}")
response = client.post(
    "/observation_table/forecast_automation",
    json={
        "use_case_id": use_case_id,
        "prediction_schedule_cron": "30 3 * * 1",
        "prediction_schedule_timezone": "America/Los_Angeles",
        "forecast_start_offset": 0,
        "forecast_horizon": 28,
        "periods": [
            {
                "start": "2012-03-01",
                "end": "2016-01-01",
                "name": "Training",
                "target_observation_count": 50000,
                "purpose": "training",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-01-01",
                "end": "2016-04-01",
                "name": "Validation_eval",
                "target_observation_count": 50000,
                "purpose": "validation_test",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-04-01",
                "end": "2016-05-23",
                "name": "Holdout_eval",
                "target_observation_count": 50000,
                "purpose": "validation_test",
                "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
            },
            {
                "start": "2016-01-01",
                "end": "2016-05-23",
                "name": "Forecast_series",
                "target_observation_count": 50000,
                "purpose": "other",
                "mode": "FORECAST_SERIES",
            },
        ],
    },
)
task_id = response.json()["id"]
print(f"Forecast automation started (task: {task_id})")
task = wait_for_task(client, task_id)
print(f"Forecast automation: {task['status']}")

Forecast automation started (task: 7bbd62bb-b519-4393-ad7b-ef5a89277c5b)
  status: STARTED...
  status: STARTED...
Forecast automation: SUCCESS

In [13]:

            
                Copied!
                
                    
                    
                
                

        
# Set EDA table and get observation table IDs
use_case.update_default_eda_table("Training")

training_table_id = str(catalog.get_observation_table("Training").id)
validation_table_id = str(catalog.get_observation_table("Validation_eval").id)
holdout_table_id = str(catalog.get_observation_table("Holdout_eval").id)
forecast_series_table_id = str(catalog.get_observation_table("Forecast_series").id)

print(f"Training: {training_table_id}")
print(f"Validation: {validation_table_id}")
print(f"Holdout: {holdout_table_id}")
print(f"Forecast series: {forecast_series_table_id}")
# Set EDA table and get observation table IDs
use_case.update_default_eda_table("Training")

training_table_id = str(catalog.get_observation_table("Training").id)
validation_table_id = str(catalog.get_observation_table("Validation_eval").id)
holdout_table_id = str(catalog.get_observation_table("Holdout_eval").id)
forecast_series_table_id = str(catalog.get_observation_table("Forecast_series").id)

print(f"Training: {training_table_id}")
print(f"Validation: {validation_table_id}")
print(f"Holdout: {holdout_table_id}")
print(f"Forecast series: {forecast_series_table_id}")

Training: 69d97de5885d3d27f3fc964d
Validation: 69d97de5885d3d27f3fc964e
Holdout: 69d97de5885d3d27f3fc964f
Forecast series: 69d97de5885d3d27f3fc9650

Step 5: Run Ideation Pipeline¶

Corresponds to UI Tutorial: Ideate Features and Models API docs: Ideation

In [14]:

            
                Copied!
                
                    
                    
                
                

        
# Create pipeline
response = client.post(
    "/pipeline",
    json={
        "action": "create",
        "use_case_id": use_case_id,
        "pipeline_type": "FEATURE_IDEATION",
    },
)
pipeline_id = response.json()["_id"]
print(f"Pipeline created: {pipeline_id}")
# Create pipeline
response = client.post(
    "/pipeline",
    json={
        "action": "create",
        "use_case_id": use_case_id,
        "pipeline_type": "FEATURE_IDEATION",
    },
)
pipeline_id = response.json()["_id"]
print(f"Pipeline created: {pipeline_id}")

Pipeline created: 69d97e2349c7d923c28449c7

In [15]:

            
                Copied!
                
                    
                    
                
                

        
# Configure training and validation tables
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "model-train-setup-v2",
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": training_table_id,
            "validation_table_id": validation_table_id,
        },
    },
)
print("Configured training/validation tables")
# Configure training and validation tables
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "model-train-setup-v2",
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": training_table_id,
            "validation_table_id": validation_table_id,
        },
    },
)
print("Configured training/validation tables")

Configured training/validation tables

In [16]:

            
                Copied!
                
                    
                    
                
                

        
# Run to completion
response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance", "step_type": "end"},
)

pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    task_id = pipeline_task["task_id"]
    print(f"Pipeline running (task: {task_id})")
    print("This will take a while...")
    task = wait_for_task(client, task_id, poll_interval=180)
    print(f"Pipeline: {task['status']}")
# Run to completion
response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance", "step_type": "end"},
)

pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    task_id = pipeline_task["task_id"]
    print(f"Pipeline running (task: {task_id})")
    print("This will take a while...")
    task = wait_for_task(client, task_id, poll_interval=180)
    print(f"Pipeline: {task['status']}")

Pipeline running (task: 3ef3876d-20b5-4326-ae87-a56397da1103)
This will take a while...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
Pipeline: SUCCESS

In [17]:

            
                Copied!
                
                    
                    
                
                

        
# Check pipeline status and get model ID
response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()

for group in data["groups"]:
    for step in group["steps"]:
        marker = "+" if step["step_status"] == "completed" else " "
        print(f"  [{marker}] {step['step_type']}: {step['step_status']}")

ideation_model_id = None
for group in data["groups"]:
    for step in group["steps"]:
        if step["step_type"] == "model-train" and step.get("ml_model_ids"):
            ideation_model_id = step["ml_model_ids"][0]

print(f"\nIdeation model: {ideation_model_id}")
# Check pipeline status and get model ID
response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()

for group in data["groups"]:
    for step in group["steps"]:
        marker = "+" if step["step_status"] == "completed" else " "
        print(f"  [{marker}] {step['step_type']}: {step['step_status']}")

ideation_model_id = None
for group in data["groups"]:
    for step in group["steps"]:
        if step["step_type"] == "model-train" and step.get("ml_model_ids"):
            ideation_model_id = step["ml_model_ids"][0]

print(f"\nIdeation model: {ideation_model_id}")

  [+] start: completed
  [+] table-selection: completed
  [+] semantic-detection: completed
  [+] transform: completed
  [+] filter: completed
  [+] ideation-metadata: completed
  [+] feature-ideation: completed
  [+] eda: completed
  [+] feature-selection: completed
  [+] model-train-setup-v2: completed
  [+] model-train: completed

Ideation model: 69d9817a885d3d27f3fc9817

Step 5b: Explore Ideated Features¶

API docs: Ideated Features

Browse the features generated by ideation and inspect their SDK code.

In [18]:

            
                Copied!
                
                    
                    
                
                

        
# Get the feature ideation ID from the pipeline
response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
feature_ideation_id = response.json().get("feature_ideation_id")

# List top features by Predictive Score (Incremental score is used when ideation is using a naive prediction)
response = client.get(
    f"/catalog/feature_ideation/{feature_ideation_id}/suggested_features",
    params={"page_size": 10, "sort_by": "predictive_power_score", "sort_dir": "desc"},
)
suggested_features = response.json()["data"]

for f in suggested_features[:10]:
    print(f"{f['feature_name']}")
    print(f"  Predictive Score: {f.get('predictive_power_score', 'N/A')}")
    print(f"  Type: {f.get('signal_type', 'N/A')}, Table: {f.get('primary_table', [])}")
    print()
# Get the feature ideation ID from the pipeline
response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
feature_ideation_id = response.json().get("feature_ideation_id")

# List top features by Predictive Score (Incremental score is used when ideation is using a naive prediction)
response = client.get(
    f"/catalog/feature_ideation/{feature_ideation_id}/suggested_features",
    params={"page_size": 10, "sort_by": "predictive_power_score", "sort_dir": "desc"},
)
suggested_features = response.json()["data"]

for f in suggested_features[:10]:
    print(f"{f['feature_name']}")
    print(f"  Predictive Score: {f.get('predictive_power_score', 'N/A')}")
    print(f"  Type: {f.get('signal_type', 'N/A')}, Table: {f.get('primary_table', [])}")
    print()

STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.12092299721799404
  Type: timing, Table: ['SALES']

STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.11453466036210935
  Type: timing, Table: ['SALES']

STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.10880036850031649
  Type: timing, Table: ['SALES']

TIME_UNTIL_FORECAST
  Predictive Score: 0.10161245952239872
  Type: forecast_point, Table: []

STORE_Time_To_Forecast_from_Latest_Sales_record_date_182cD
  Predictive Score: 0.10156346860542143
  Type: recency, Table: ['SALES']

FORECAST_Day_Of_Week
  Predictive Score: 0.10121223782742017
  Type: forecast_point, Table: []

STORE_Avg_of_Sales_records_sales_amounts_7cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.09508037793032476
  Type: timing, Table: ['SALES']

STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Minus_3_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.0715875138155494
  Type: timing, Table: ['SALES']

STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Minus_4_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.07134586913240382
  Type: timing, Table: ['SALES']

STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Minus_3_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
  Predictive Score: 0.05568852977294858
  Type: timing, Table: ['SALES']

In [19]:

            
                Copied!
                
# View SDK code for the top feature
best = response.json()["data"][0]
print(f"Feature: {best['feature_name']}\n")
print(best["code"])
# View SDK code for the top feature
best = response.json()["data"][0]
print(f"Feature: {best['feature_name']}\n")
print(best["code"])

Feature: STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD

"""

SDK code to create STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STO
RE_Avg_of_Sales_records_sales_amounts_182cD

Feature description:
Ratio of STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To
STORE_Avg_of_Sales_records_sales_amounts_182cD

"""

import featurebyte as fb

#==================================================================================================
# Activate catalog
#==================================================================================================

catalog = fb.Catalog.activate("Store Sales Forecast API Tutorial")

#==================================================================================================
# Get view from table
#==================================================================================================

# Get view from SALES time series table.
sales_view = catalog.get_view("SALES")

#==================================================================================================
# Extract Forecast Point Date Parts
#==================================================================================================

# Get context by id.
context = fb.Context.get_by_id("69d97de38790ab65aa5483f8")

#--------------------------------------------------------------------------------------------------

# Extract day_of_week from the forecast point.
forecast_day_of_week = context.get_forecast_point_feature().dt.day_of_week

# Name feature
forecast_day_of_week.name = "FORECAST_Day_Of_Week"

#==================================================================================================
# Extract Date parts
#==================================================================================================
sales_view["Weekday of Sales_record"] = sales_view["date"].dt.day_of_week

#==================================================================================================
# Do window aggregation from SALES
#==================================================================================================

# Group SALES view by Store entity (store_id).
sales_view_by_store =\
sales_view.groupby(['store_id'])

#--------------------------------------------------------------------------------------------------

# Group SALES view by Store entity (store_id) across different Weekdays of Sales_record.
sales_view_by_store_across_weekdays_of_sales_record =\
sales_view.groupby(
    ['store_id'], category="Weekday of Sales_record"
)

#--------------------------------------------------------------------------------------------------

# Distribution of the Avg of sales_amounts of Sales_records, segmented by Sales_record's Weekday
# for the Store over time.
store_avg_of_sales_records_sales_amounts_by_sales_record_weekday_28cd =\
sales_view_by_store_across_weekdays_of_sales_record.aggregate_over(
    "sales_amount", method="avg",
    feature_names=["STORE_Avg_of_Sales_records_sales_amounts_by_Sales_record_Weekday_28cD"],
    windows=[fb.CalendarWindow(unit="DAY", size=28)],
)["STORE_Avg_of_Sales_records_sales_amounts_by_Sales_record_Weekday_28cD"]

#==================================================================================================
# Extract Value from a dictionary-based feature and a key feature
#==================================================================================================

# Get the Value of the Store's Avg of Sales_record sales_amounts, which corresponds to
# Sales_records matching the forecast day of week over a 28 calendar days period.
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday =\
store_avg_of_sales_records_sales_amounts_by_sales_record_weekday_28cd.cd.get_value(
    forecast_day_of_week
)


# Give a name to new feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday.name = \
"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday"

#--------------------------------------------------------------------------------------------------

# Get Avg of sales_amount for the Store over time.
store_avg_of_sales_records_sales_amounts_182cd =\
sales_view_by_store.aggregate_over(
    "sales_amount", method="avg",
    feature_names=["STORE_Avg_of_Sales_records_sales_amounts_182cD"],
    windows=[fb.CalendarWindow(unit="DAY", size=182)],
)["STORE_Avg_of_Sales_records_sales_amounts_182cD"]

#--------------------------------------------------------------------------------------------------

# Get the Ratio of STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To
# STORE_Avg_of_Sales_records_sales_amounts_182cD
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd = (
    store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday
    / store_avg_of_sales_records_sales_amounts_182cd
)

# Give a name to new feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.name = \
"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD"

#==================================================================================================
# Save feature
#==================================================================================================

# Save feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.save()

#==================================================================================================
# Update feature type
#==================================================================================================

# Update feature type
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.update_feature_type(
	"numeric"
)

#==================================================================================================
# Add description
#==================================================================================================

# Add description
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.update_description(
	"Ratio of "
	"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To"
	" STORE_Avg_of_Sales_records_sales_amounts_182cD"
)

Step 6: Refine Features & Train Standalone Model¶

API docs: Feature Refinement | Model Training

In [20]:

            
                Copied!
                
                    
                    
                
                

        
# Refine features using key importance
response = client.post(
    "/feature_list_from_model",
    json={
        "mode": "Feature key importance based",
        "ml_model_id": ideation_model_id,
        "top_n": 200,
        "importance_threshold_percentage": 0.90,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

feature_list_from_model_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/feature_list_from_model/{feature_list_from_model_id}")
result = response.json()

feature_list_id = result["feature_list_id"]
print(f"Feature keys selected: {result['feature_keys_created_count']}")
print(f"Total features: {result['features_selected_count']}")

# Get feature list details
response = client.get(f"/feature_list/{feature_list_id}")
feature_list = response.json()
print(f"Feature list: {feature_list['name']} ({len(feature_list['feature_ids'])} features)")
# Refine features using key importance
response = client.post(
    "/feature_list_from_model",
    json={
        "mode": "Feature key importance based",
        "ml_model_id": ideation_model_id,
        "top_n": 200,
        "importance_threshold_percentage": 0.90,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

feature_list_from_model_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/feature_list_from_model/{feature_list_from_model_id}")
result = response.json()

feature_list_id = result["feature_list_id"]
print(f"Feature keys selected: {result['feature_keys_created_count']}")
print(f"Total features: {result['features_selected_count']}")

# Get feature list details
response = client.get(f"/feature_list/{feature_list_id}")
feature_list = response.json()
print(f"Feature list: {feature_list['name']} ({len(feature_list['feature_ids'])} features)")

  status: STARTED...
Feature keys selected: 0
Total features: 20
Feature list: 20 Features from LightGBM [50 features: Walmart Store Sales Forecasting Features] by cumulative Feature Key Importance (0.9) (20 features)

In [21]:

            
                Copied!
                
                    
                    
                
                

        
# Get model settings and template
response = client.get(
    f"/use_case/{use_case_id}/ml_model_template_setting",
    params={
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
        "feature_list_id": feature_list_id,
        "machine_learning_role": "OUTCOME",
    },
)
settings = response.json()

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template",
    params={
        "feature_list_id": feature_list_id,
        "training_table_id": training_table_id,
        "objective": settings["objective"],
        "metric": settings["metric"],
        "machine_learning_role": "OUTCOME",
    },
)
template = response.json()["data"][0]
print(f"Objective: {settings['objective']}, Metric: {settings['metric']}, Template: {template['type']}")
# Get model settings and template
response = client.get(
    f"/use_case/{use_case_id}/ml_model_template_setting",
    params={
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
        "feature_list_id": feature_list_id,
        "machine_learning_role": "OUTCOME",
    },
)
settings = response.json()

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template",
    params={
        "feature_list_id": feature_list_id,
        "training_table_id": training_table_id,
        "objective": settings["objective"],
        "metric": settings["metric"],
        "machine_learning_role": "OUTCOME",
    },
)
template = response.json()["data"][0]
print(f"Objective: {settings['objective']}, Metric: {settings['metric']}, Template: {template['type']}")

Objective: reg_poisson, Metric: poisson, Template: NCTsDE_XGB

In [22]:

            
                Copied!
                
                    
                    
                
                

        
# Extract default parameters from template
node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
    params = {
        p["name"]: p["default_value"]
        for p in preprocessor.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[preprocessor["node_name"]] = params

model_info = template.get("model", {})
if model_info:
    params = {
        p["name"]: p["default_value"]
        for p in model_info.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[model_info["node_name"]] = params

print(f"Nodes configured: {list(node_name_to_parameters.keys())}")
# Extract default parameters from template
node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
    params = {
        p["name"]: p["default_value"]
        for p in preprocessor.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[preprocessor["node_name"]] = params

model_info = template.get("model", {})
if model_info:
    params = {
        p["name"]: p["default_value"]
        for p in model_info.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[model_info["node_name"]] = params

print(f"Nodes configured: {list(node_name_to_parameters.keys())}")

Nodes configured: ['transformer_2', 'estimator_1']

In [23]:

            
                Copied!
                
                    
                    
                
                

        
# Train model
payload = {
    "use_case_id": use_case_id,
    "model_name": "Store Sales - Refined Features",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_list_id": feature_list_id,
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}
if settings.get("calibration_method"):
    payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
    payload["use_naive_as_offset"] = settings["use_naive_as_offset"]

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
print(f"Training started (task: {task_id})")
task = wait_for_task(client, task_id)

ml_model_id = task.get("payload", {}).get("output_document_id")
print(f"Model trained: {ml_model_id}")
# Train model
payload = {
    "use_case_id": use_case_id,
    "model_name": "Store Sales - Refined Features",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_list_id": feature_list_id,
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}
if settings.get("calibration_method"):
    payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
    payload["use_naive_as_offset"] = settings["use_naive_as_offset"]

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
print(f"Training started (task: {task_id})")
task = wait_for_task(client, task_id)

ml_model_id = task.get("payload", {}).get("output_document_id")
print(f"Model trained: {ml_model_id}")

Training started (task: efee0b1e-8f8f-4028-94ac-99a2ea3b4646)
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
Model trained: 69d983e5c4e0e91875fd959a

In [24]:

            
                Copied!
                
                    
                    
                
                

        
# View model details
response = client.get(f"/catalog/ml_model/{ml_model_id}")
model = response.json()
print(f"Model: {model['name']}")
print(f"Template: {model['model_template_type']}")
print(f"Features: {len(model.get('feature_importance', []))}")

# Show top 5 features by importance
for fi in sorted(model.get("feature_importance", []), key=lambda x: x["importance"], reverse=True)[:5]:
    print(f"  {fi['feature']}: {fi['importance_percent'] * 100:.1f}%")
# View model details
response = client.get(f"/catalog/ml_model/{ml_model_id}")
model = response.json()
print(f"Model: {model['name']}")
print(f"Template: {model['model_template_type']}")
print(f"Features: {len(model.get('feature_importance', []))}")

# Show top 5 features by importance
for fi in sorted(model.get("feature_importance", []), key=lambda x: x["importance"], reverse=True)[:5]:
    print(f"  {fi['feature']}: {fi['importance_percent'] * 100:.1f}%")

Model: Store Sales - Refined Features
Template: NCTsDE_XGB
Features: 20
  STORE_Avg_of_Sales_records_sales_amounts_182cD: 46.3%
  STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 14.1%
  STATE_calendar_snap: 9.0%
  STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 6.5%
  STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 5.7%

Step 7: Predict & Evaluate¶

Corresponds to UI Tutorial: Predict and Evaluate API docs: Batch Predictions | Evaluation

Generate predictions on the holdout table for evaluation, and on the FORECAST_SERIES table for visualization.

In [25]:

            
                Copied!
                
                    
                    
                
                

        
# Generate predictions on holdout
response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": holdout_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
print(f"Prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"Prediction table: {prediction_table_id}")
# Generate predictions on holdout
response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": holdout_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
print(f"Prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"Prediction table: {prediction_table_id}")

Prediction started (task: 88172d75-f8de-4880-970c-a232a03214a5)
  status: PENDING...
  status: PENDING...
  status: PENDING...
  status: STARTED...
  status: STARTED...
  status: STARTED...
Prediction table: 69d9853149c7d923c28449fc

In [26]:

            
                Copied!
                
                    
                    
                
                

        
# Get available evaluation plot options
response = client.request("OPTIONS", f"/ml_model/{ml_model_id}/evaluate")
options = response.json()
print(f"Available plots: {options.get('options', [])}")

# Evaluation plot: predicted vs actual
from IPython.display import HTML, display

response = client.post(
    f"/ml_model/{ml_model_id}/evaluate",
    json={
        "option": "predicted_vs_actual",
        "plot_params": {"height": 500, "width": 800, "font_size": 14},
        "holdout_table": {"table_type": "observation_table", "table_id": holdout_table_id},
    },
)
display(HTML(response.json()["content"]))
# Get available evaluation plot options
response = client.request("OPTIONS", f"/ml_model/{ml_model_id}/evaluate")
options = response.json()
print(f"Available plots: {options.get('options', [])}")

# Evaluation plot: predicted vs actual
from IPython.display import HTML, display

response = client.post(
    f"/ml_model/{ml_model_id}/evaluate",
    json={
        "option": "predicted_vs_actual",
        "plot_params": {"height": 500, "width": 800, "font_size": 14},
        "holdout_table": {"table_type": "observation_table", "table_id": holdout_table_id},
    },
)
display(HTML(response.json()["content"]))

Available plots: ['distribution', 'predicted_vs_actual', 'predicted_vs_actual_per_bin']

Bokeh Plot

In [27]:

            
                Copied!
                
                    
                    
                
                

        
# Generate predictions on FORECAST_SERIES table for visualization
response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": forecast_series_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
print(f"FORECAST_SERIES prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
fc_prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"FORECAST_SERIES prediction table: {fc_prediction_table_id}")
# Generate predictions on FORECAST_SERIES table for visualization
response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": forecast_series_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
print(f"FORECAST_SERIES prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
fc_prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"FORECAST_SERIES prediction table: {fc_prediction_table_id}")

FORECAST_SERIES prediction started (task: 4f54a7ed-1645-49a3-a9ed-946f63de3970)
  status: STARTED...
  status: STARTED...
  status: STARTED...
  status: STARTED...
FORECAST_SERIES prediction table: 69d985e649c7d923c28449fd

In [28]:

            
                Copied!
                
                    
                    
                
                

        
# Extract prediction entities
response = client.post(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)

response = client.get(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
entities = response.json()
print(f"Entity columns: {entities['entity_data']['columns']}")
for row in entities["entity_data"]["data"][:5]:
    print(f"  {row}")

# Create forecast comparison plot for a specific store
response = client.post(
    f"/prediction_table/{fc_prediction_table_id}/forecast_comparison",
    json={
        "entity_filter": {"store_id": "CA_1"},
        "plot_params": {"height": 500, "width": 1000, "font_size": 14},
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)

# Retrieve and display the plot
response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison")
comparisons = response.json()["data"]
if comparisons:
    fc_id = comparisons[0]["id"]
    response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison/{fc_id}")
    display(HTML(response.json()["content"]))
# Extract prediction entities
response = client.post(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)

response = client.get(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
entities = response.json()
print(f"Entity columns: {entities['entity_data']['columns']}")
for row in entities["entity_data"]["data"][:5]:
    print(f"  {row}")

# Create forecast comparison plot for a specific store
response = client.post(
    f"/prediction_table/{fc_prediction_table_id}/forecast_comparison",
    json={
        "entity_filter": {"store_id": "CA_1"},
        "plot_params": {"height": 500, "width": 1000, "font_size": 14},
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)

# Retrieve and display the plot
response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison")
comparisons = response.json()["data"]
if comparisons:
    fc_id = comparisons[0]["id"]
    response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison/{fc_id}")
    display(HTML(response.json()["content"]))

  status: STARTED...
Entity columns: ['store_id']
  ['CA_1']
  ['CA_2']
  ['CA_3']
  ['CA_4']
  ['TX_1']
  status: STARTED...

Bokeh Plot

Summary¶

Step	Method	What we did
1-2	SDK	Created catalog, registered 3 tables (time series, calendar, dimension), tagged 2 entities
3	SDK	Formulated forecast use case with daily granularity, 28-day horizon
4	API	Generated training/validation/holdout observation tables via forecast automation
5	API	Ran automated ideation pipeline
5b	API	Explored ideated features — SDK code and relevance scores
6	API	Refined features and trained standalone model
7	API	Predicted on holdout, generated evaluation and forecast comparison plots

Steps covered in the Credit Default tutorial but skipped here: source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, Parquet download, and deployment. These work the same way for forecast use cases.

Key differences from the Credit Default tutorial:

Forecast automation replaces manual observation table creation
use_naive_as_offset is used for time series modeling
Forecast comparison plots show prediction lines vs actual target per entity