Store Sales Forecast: End-to-End SDK + API Tutorial¶
This tutorial replicates the Store Sales Forecast UI Tutorials using Python code. It demonstrates time series forecasting: predicting daily store sales 28 days ahead using the M5 dataset.
This tutorial focuses on forecast-specific steps — forecast automation, naive prediction, and forecast comparison plots. Steps common to all use cases (source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, and deployment) are covered in the Credit Default tutorial and work the same way for forecast use cases.
Prerequisites:
- FeatureByte instance with the
playgroundfeature store connected toDEMO_DATASETS.M5_STORE_SALES_AMOUNT - Python environment with
featurebyteSDK installed - Profile
tutorialconfigured (see SDK Setup)
What you'll build:
- Register 3 source tables and tag entities (SDK)
- Formulate a forecast use case (SDK)
- Generate observation tables via forecast automation (API)
- Run an automated ideation pipeline (API)
- Train a standalone model and evaluate with forecast comparison plots (API)
Setup¶
import time
import featurebyte as fb
fb.use_profile("tutorial")
DATABASE_NAME = "DEMO_DATASETS"
SCHEMA_NAME = "M5_STORE_SALES_AMOUNT"
CATALOG_NAME = "Store Sales Forecast API Tutorial"
WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=2, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fc940>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=1, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fcfd0>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=0, read=3, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x3339fd030>: Failed to establish a new connection: [Errno 61] Connection refused')': /api/v1/status
06:46:43 | WARNING | Service endpoint is inaccessible: http://127.0.0.1:5000/api/v1
WARNING :featurebyte:Service endpoint is inaccessible: http://127.0.0.1:5000/api/v1
06:46:43 | INFO | Using profile: tutorial
INFO :featurebyte:Using profile: tutorial
06:46:43 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml
INFO :featurebyte:Using configuration file at: /Users/gxav/.featurebyte/config.yaml
06:46:43 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
INFO :featurebyte:Active profile: tutorial (https://tutorials.featurebyte.com/api/v1)
06:46:43 | INFO | SDK version: 3.4.1.dev7
INFO :featurebyte:SDK version: 3.4.1.dev7
06:46:43 | INFO | No catalog activated.
INFO :featurebyte:No catalog activated.
def wait_for_task(client, task_id, poll_interval=30):
"""Poll a task until completion. Returns the full task response."""
while True:
task = client.get(f"/task/{task_id}").json()
if task["status"] in ("SUCCESS", "FAILURE"):
if task["status"] == "FAILURE":
print(f"Task FAILED: {task.get('traceback', 'no traceback')}")
return task
print(f" status: {task['status']}...")
time.sleep(poll_interval)
Step 1: Create Catalog & Register Tables¶
Corresponds to UI Tutorial: Create Catalog and Register Tables
We register 3 tables:
- SALES (Time Series) — daily store sales with timezone support
- CALENDAR (Calendar) — calendar features by state
- STORE_STATE (Dimension) — store-to-state mapping
catalog = fb.Catalog.create(CATALOG_NAME, "playground")
catalog.activate(CATALOG_NAME)
ds = catalog.get_data_source()
def get_source(table_name):
return ds.get_source_table(
database_name=DATABASE_NAME,
schema_name=SCHEMA_NAME,
table_name=table_name,
)
print(f"Catalog '{catalog.name}' created")
06:46:44 | INFO | Catalog activated: Store Sales Forecast API Tutorial INFO :featurebyte.api.catalog:Catalog activated: Store Sales Forecast API Tutorial
Catalog 'Store Sales Forecast API Tutorial' created
# Time series table: daily sales per store
sales = get_source("SALES").create_time_series_table(
name="SALES",
reference_datetime_column="date",
reference_datetime_schema=fb.TimestampSchema(
is_utc_time=False,
timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
),
time_interval=fb.TimeInterval(value=1, unit="DAY"),
series_id_column="store_id",
record_creation_timestamp_column="record_creation_date",
)
sales.update_default_feature_job_setting(
feature_job_setting=fb.CronFeatureJobSetting(
crontab="30 2 * * *",
timezone="America/Los_Angeles",
)
)
print("Registered SALES (Time Series)")
Registered SALES (Time Series)
# Dimension table: store-to-state mapping
store_state = get_source("STORE_STATE").create_dimension_table(
name="STORE_STATE",
dimension_id_column="store_id",
)
print("Registered STORE_STATE (Dimension)")
Registered STORE_STATE (Dimension)
# Calendar table: calendar features by state
calendar = get_source("CALENDAR").create_calendar_table(
name="CALENDAR",
calendar_datetime_column="date",
calendar_datetime_schema=fb.TimestampSchema(),
series_id_column="state_id",
)
print("Registered CALENDAR (Calendar)")
Registered CALENDAR (Calendar)
Step 2: Register Entities¶
Corresponds to UI Tutorial: Register Entities
entity_store = fb.Entity.create(name="Store", serving_names=["store_id"])
entity_state = fb.Entity.create(name="State", serving_names=["state_id"])
# Tag entity columns
sales["store_id"].as_entity("Store")
store_state["store_id"].as_entity("Store")
store_state["state_id"].as_entity("State")
calendar["state_id"].as_entity("State")
print("Entities created and tagged")
Entities created and tagged
SDK Reference: Entity | TableColumn.as_entity()
Step 3: Formulate Forecast Use Case¶
Corresponds to UI Tutorial: Formulate Use Case
Create a forecast context with daily granularity and a regression target from the SALES table.
context = fb.Context.create(
name="Store Daily Forecast",
primary_entity=["Store"],
description="Daily forecasting per store across Walmart locations.",
forecast_point_schema=fb.ForecastPointSchema(
granularity=fb.TimeIntervalUnit.DAY,
dtype=fb.DBVarType.TIMESTAMP,
is_utc_time=False,
timezone=fb.TimeZoneColumn(column_name="timezone", type="timezone"),
),
)
print(f"Context: {context.name}")
Context: Store Daily Forecast
# Create target from the SALES table
sales_table = catalog.get_table("SALES")
sales_view = sales_table.get_view()
target = sales_view["sales_amount"].as_target(
"sales_amount",
target_type=fb.TargetType.REGRESSION,
fill_value=0,
)
target.save()
print(f"Target: {target.name}")
Target: sales_amount
use_case = fb.UseCase.create(
name="Store Daily Sales Amount Forecast for 28 Days",
target_name="sales_amount",
context_name="Store Daily Forecast",
description="Predict daily sales amount (revenue) per store up to 28 days ahead.",
)
use_case_id = str(use_case.id)
print(f"Use case: {use_case.name} (id: {use_case_id})")
Use case: Store Daily Sales Amount Forecast for 28 Days (id: 69d97de58790ab65aa5483fb)
Switch to API¶
From here, we use the REST API for forecast automation, ideation, training, and deployment.
Note: This tutorial focuses on forecast-specific steps. The following steps are covered in the Credit Default tutorial and work the same way for forecast use cases: source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, Parquet download, and deployment. The ideation pipeline handles table EDA and semantic detection automatically when run end-to-end.
client = fb.Configurations().get_client()
Step 4: Forecast Automation — Create Observation Tables¶
Corresponds to UI Tutorial: Create Observation Tables API docs: Observation Table Automation
Generate training, validation, and holdout observation tables automatically using forecast automation. This endpoint samples from the time series data based on the prediction schedule and horizon.
response = client.post(
"/observation_table/forecast_automation",
json={
"use_case_id": use_case_id,
"prediction_schedule_cron": "30 3 * * 1",
"prediction_schedule_timezone": "America/Los_Angeles",
"forecast_start_offset": 0,
"forecast_horizon": 28,
"periods": [
{
"start": "2012-03-01",
"end": "2016-01-01",
"name": "Training",
"target_observation_count": 50000,
"purpose": "training",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
{
"start": "2016-01-01",
"end": "2016-04-01",
"name": "Validation_eval",
"target_observation_count": 50000,
"purpose": "validation_test",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
{
"start": "2016-04-01",
"end": "2016-05-23",
"name": "Holdout_eval",
"target_observation_count": 50000,
"purpose": "validation_test",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
{
"start": "2016-01-01",
"end": "2016-05-23",
"name": "Forecast_series",
"target_observation_count": 50000,
"purpose": "other",
"mode": "FORECAST_SERIES",
},
],
},
)
task_id = response.json()["id"]
print(f"Forecast automation started (task: {task_id})")
task = wait_for_task(client, task_id)
print(f"Forecast automation: {task['status']}")
Forecast automation started (task: 7bbd62bb-b519-4393-ad7b-ef5a89277c5b) status: STARTED... status: STARTED... Forecast automation: SUCCESS
# Set EDA table and get observation table IDs
use_case.update_default_eda_table("Training")
training_table_id = str(catalog.get_observation_table("Training").id)
validation_table_id = str(catalog.get_observation_table("Validation_eval").id)
holdout_table_id = str(catalog.get_observation_table("Holdout_eval").id)
forecast_series_table_id = str(catalog.get_observation_table("Forecast_series").id)
print(f"Training: {training_table_id}")
print(f"Validation: {validation_table_id}")
print(f"Holdout: {holdout_table_id}")
print(f"Forecast series: {forecast_series_table_id}")
Training: 69d97de5885d3d27f3fc964d Validation: 69d97de5885d3d27f3fc964e Holdout: 69d97de5885d3d27f3fc964f Forecast series: 69d97de5885d3d27f3fc9650
Step 5: Run Ideation Pipeline¶
Corresponds to UI Tutorial: Ideate Features and Models API docs: Ideation
# Create pipeline
response = client.post(
"/pipeline",
json={
"action": "create",
"use_case_id": use_case_id,
"pipeline_type": "FEATURE_IDEATION",
},
)
pipeline_id = response.json()["_id"]
print(f"Pipeline created: {pipeline_id}")
Pipeline created: 69d97e2349c7d923c28449c7
# Configure training and validation tables
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "model-train-setup-v2",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
},
)
print("Configured training/validation tables")
Configured training/validation tables
# Run to completion
response = client.patch(
f"/pipeline/{pipeline_id}",
json={"action": "advance", "step_type": "end"},
)
pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
task_id = pipeline_task["task_id"]
print(f"Pipeline running (task: {task_id})")
print("This will take a while...")
task = wait_for_task(client, task_id, poll_interval=180)
print(f"Pipeline: {task['status']}")
Pipeline running (task: 3ef3876d-20b5-4326-ae87-a56397da1103) This will take a while... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... Pipeline: SUCCESS
# Check pipeline status and get model ID
response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()
for group in data["groups"]:
for step in group["steps"]:
marker = "+" if step["step_status"] == "completed" else " "
print(f" [{marker}] {step['step_type']}: {step['step_status']}")
ideation_model_id = None
for group in data["groups"]:
for step in group["steps"]:
if step["step_type"] == "model-train" and step.get("ml_model_ids"):
ideation_model_id = step["ml_model_ids"][0]
print(f"\nIdeation model: {ideation_model_id}")
[+] start: completed [+] table-selection: completed [+] semantic-detection: completed [+] transform: completed [+] filter: completed [+] ideation-metadata: completed [+] feature-ideation: completed [+] eda: completed [+] feature-selection: completed [+] model-train-setup-v2: completed [+] model-train: completed Ideation model: 69d9817a885d3d27f3fc9817
Step 5b: Explore Ideated Features¶
API docs: Ideated Features
Browse the features generated by ideation and inspect their SDK code.
# Get the feature ideation ID from the pipeline
response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
feature_ideation_id = response.json().get("feature_ideation_id")
# List top features by Predictive Score (Incremental score is used when ideation is using a naive prediction)
response = client.get(
f"/catalog/feature_ideation/{feature_ideation_id}/suggested_features",
params={"page_size": 10, "sort_by": "predictive_power_score", "sort_dir": "desc"},
)
suggested_features = response.json()["data"]
for f in suggested_features[:10]:
print(f"{f['feature_name']}")
print(f" Predictive Score: {f.get('predictive_power_score', 'N/A')}")
print(f" Type: {f.get('signal_type', 'N/A')}, Table: {f.get('primary_table', [])}")
print()
STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.12092299721799404 Type: timing, Table: ['SALES'] STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.11453466036210935 Type: timing, Table: ['SALES'] STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.10880036850031649 Type: timing, Table: ['SALES'] TIME_UNTIL_FORECAST Predictive Score: 0.10161245952239872 Type: forecast_point, Table: [] STORE_Time_To_Forecast_from_Latest_Sales_record_date_182cD Predictive Score: 0.10156346860542143 Type: recency, Table: ['SALES'] FORECAST_Day_Of_Week Predictive Score: 0.10121223782742017 Type: forecast_point, Table: [] STORE_Avg_of_Sales_records_sales_amounts_7cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.09508037793032476 Type: timing, Table: ['SALES'] STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Minus_3_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.0715875138155494 Type: timing, Table: ['SALES'] STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Minus_4_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.07134586913240382 Type: timing, Table: ['SALES'] STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Minus_3_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD Predictive Score: 0.05568852977294858 Type: timing, Table: ['SALES']
# View SDK code for the top feature
best = response.json()["data"][0]
print(f"Feature: {best['feature_name']}\n")
print(best["code"])
Feature: STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD
"""
SDK code to create STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STO
RE_Avg_of_Sales_records_sales_amounts_182cD
Feature description:
Ratio of STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To
STORE_Avg_of_Sales_records_sales_amounts_182cD
"""
import featurebyte as fb
#==================================================================================================
# Activate catalog
#==================================================================================================
catalog = fb.Catalog.activate("Store Sales Forecast API Tutorial")
#==================================================================================================
# Get view from table
#==================================================================================================
# Get view from SALES time series table.
sales_view = catalog.get_view("SALES")
#==================================================================================================
# Extract Forecast Point Date Parts
#==================================================================================================
# Get context by id.
context = fb.Context.get_by_id("69d97de38790ab65aa5483f8")
#--------------------------------------------------------------------------------------------------
# Extract day_of_week from the forecast point.
forecast_day_of_week = context.get_forecast_point_feature().dt.day_of_week
# Name feature
forecast_day_of_week.name = "FORECAST_Day_Of_Week"
#==================================================================================================
# Extract Date parts
#==================================================================================================
sales_view["Weekday of Sales_record"] = sales_view["date"].dt.day_of_week
#==================================================================================================
# Do window aggregation from SALES
#==================================================================================================
# Group SALES view by Store entity (store_id).
sales_view_by_store =\
sales_view.groupby(['store_id'])
#--------------------------------------------------------------------------------------------------
# Group SALES view by Store entity (store_id) across different Weekdays of Sales_record.
sales_view_by_store_across_weekdays_of_sales_record =\
sales_view.groupby(
['store_id'], category="Weekday of Sales_record"
)
#--------------------------------------------------------------------------------------------------
# Distribution of the Avg of sales_amounts of Sales_records, segmented by Sales_record's Weekday
# for the Store over time.
store_avg_of_sales_records_sales_amounts_by_sales_record_weekday_28cd =\
sales_view_by_store_across_weekdays_of_sales_record.aggregate_over(
"sales_amount", method="avg",
feature_names=["STORE_Avg_of_Sales_records_sales_amounts_by_Sales_record_Weekday_28cD"],
windows=[fb.CalendarWindow(unit="DAY", size=28)],
)["STORE_Avg_of_Sales_records_sales_amounts_by_Sales_record_Weekday_28cD"]
#==================================================================================================
# Extract Value from a dictionary-based feature and a key feature
#==================================================================================================
# Get the Value of the Store's Avg of Sales_record sales_amounts, which corresponds to
# Sales_records matching the forecast day of week over a 28 calendar days period.
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday =\
store_avg_of_sales_records_sales_amounts_by_sales_record_weekday_28cd.cd.get_value(
forecast_day_of_week
)
# Give a name to new feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday.name = \
"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday"
#--------------------------------------------------------------------------------------------------
# Get Avg of sales_amount for the Store over time.
store_avg_of_sales_records_sales_amounts_182cd =\
sales_view_by_store.aggregate_over(
"sales_amount", method="avg",
feature_names=["STORE_Avg_of_Sales_records_sales_amounts_182cD"],
windows=[fb.CalendarWindow(unit="DAY", size=182)],
)["STORE_Avg_of_Sales_records_sales_amounts_182cD"]
#--------------------------------------------------------------------------------------------------
# Get the Ratio of STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To
# STORE_Avg_of_Sales_records_sales_amounts_182cD
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd = (
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday
/ store_avg_of_sales_records_sales_amounts_182cd
)
# Give a name to new feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.name = \
"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD"
#==================================================================================================
# Save feature
#==================================================================================================
# Save feature
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.save()
#==================================================================================================
# Update feature type
#==================================================================================================
# Update feature type
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.update_feature_type(
"numeric"
)
#==================================================================================================
# Add description
#==================================================================================================
# Add description
store_avg_of_sales_records_sales_amounts_28cd_same_forecast_weekday_to_naive_store_avg_of_sales_records_sales_amounts_182cd.update_description(
"Ratio of "
"STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday To"
" STORE_Avg_of_Sales_records_sales_amounts_182cD"
)
Step 6: Refine Features & Train Standalone Model¶
API docs: Feature Refinement | Model Training
# Refine features using key importance
response = client.post(
"/feature_list_from_model",
json={
"mode": "Feature key importance based",
"ml_model_id": ideation_model_id,
"top_n": 200,
"importance_threshold_percentage": 0.90,
},
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
feature_list_from_model_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/feature_list_from_model/{feature_list_from_model_id}")
result = response.json()
feature_list_id = result["feature_list_id"]
print(f"Feature keys selected: {result['feature_keys_created_count']}")
print(f"Total features: {result['features_selected_count']}")
# Get feature list details
response = client.get(f"/feature_list/{feature_list_id}")
feature_list = response.json()
print(f"Feature list: {feature_list['name']} ({len(feature_list['feature_ids'])} features)")
status: STARTED... Feature keys selected: 0 Total features: 20 Feature list: 20 Features from LightGBM [50 features: Walmart Store Sales Forecasting Features] by cumulative Feature Key Importance (0.9) (20 features)
# Get model settings and template
response = client.get(
f"/use_case/{use_case_id}/ml_model_template_setting",
params={
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
"feature_list_id": feature_list_id,
"machine_learning_role": "OUTCOME",
},
)
settings = response.json()
response = client.get(
f"/use_case/{use_case_id}/ml_model_template",
params={
"feature_list_id": feature_list_id,
"training_table_id": training_table_id,
"objective": settings["objective"],
"metric": settings["metric"],
"machine_learning_role": "OUTCOME",
},
)
template = response.json()["data"][0]
print(f"Objective: {settings['objective']}, Metric: {settings['metric']}, Template: {template['type']}")
Objective: reg_poisson, Metric: poisson, Template: NCTsDE_XGB
# Extract default parameters from template
node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
params = {
p["name"]: p["default_value"]
for p in preprocessor.get("parameters_metadata", [])
if p.get("default_value") is not None
}
if params:
node_name_to_parameters[preprocessor["node_name"]] = params
model_info = template.get("model", {})
if model_info:
params = {
p["name"]: p["default_value"]
for p in model_info.get("parameters_metadata", [])
if p.get("default_value") is not None
}
if params:
node_name_to_parameters[model_info["node_name"]] = params
print(f"Nodes configured: {list(node_name_to_parameters.keys())}")
Nodes configured: ['transformer_2', 'estimator_1']
# Train model
payload = {
"use_case_id": use_case_id,
"model_name": "Store Sales - Refined Features",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
"feature_list_id": feature_list_id,
"model_template_type": template["type"],
"objective": settings["objective"],
"metric": settings["metric"],
"node_name_to_parameters": node_name_to_parameters,
"role": "OUTCOME",
}
if settings.get("calibration_method"):
payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
payload["use_naive_as_offset"] = settings["use_naive_as_offset"]
response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
print(f"Training started (task: {task_id})")
task = wait_for_task(client, task_id)
ml_model_id = task.get("payload", {}).get("output_document_id")
print(f"Model trained: {ml_model_id}")
Training started (task: efee0b1e-8f8f-4028-94ac-99a2ea3b4646) status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... status: STARTED... Model trained: 69d983e5c4e0e91875fd959a
# View model details
response = client.get(f"/catalog/ml_model/{ml_model_id}")
model = response.json()
print(f"Model: {model['name']}")
print(f"Template: {model['model_template_type']}")
print(f"Features: {len(model.get('feature_importance', []))}")
# Show top 5 features by importance
for fi in sorted(model.get("feature_importance", []), key=lambda x: x["importance"], reverse=True)[:5]:
print(f" {fi['feature']}: {fi['importance_percent'] * 100:.1f}%")
Model: Store Sales - Refined Features Template: NCTsDE_XGB Features: 20 STORE_Avg_of_Sales_records_sales_amounts_182cD: 46.3% STORE_Avg_of_Sales_records_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 14.1% STATE_calendar_snap: 9.0% STORE_Avg_of_Sales_records_sales_amounts_91cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 6.5% STORE_Avg_of_Sales_records_sales_amounts_182cD_same_Forecast_Weekday_To_Naive_STORE_Avg_of_Sales_records_sales_amounts_182cD: 5.7%
Step 7: Predict & Evaluate¶
Corresponds to UI Tutorial: Predict and Evaluate API docs: Batch Predictions | Evaluation
Generate predictions on the holdout table for evaluation, and on the FORECAST_SERIES table for visualization.
# Generate predictions on holdout
response = client.post(
f"/ml_model/{ml_model_id}/prediction_table",
json={
"request_input": {
"request_type": "observation_table",
"table_id": holdout_table_id,
},
"include_input_features": False,
},
)
task_id = response.json()["id"]
print(f"Prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"Prediction table: {prediction_table_id}")
Prediction started (task: 88172d75-f8de-4880-970c-a232a03214a5) status: PENDING... status: PENDING... status: PENDING... status: STARTED... status: STARTED... status: STARTED... Prediction table: 69d9853149c7d923c28449fc
# Get available evaluation plot options
response = client.request("OPTIONS", f"/ml_model/{ml_model_id}/evaluate")
options = response.json()
print(f"Available plots: {options.get('options', [])}")
# Evaluation plot: predicted vs actual
from IPython.display import HTML, display
response = client.post(
f"/ml_model/{ml_model_id}/evaluate",
json={
"option": "predicted_vs_actual",
"plot_params": {"height": 500, "width": 800, "font_size": 14},
"holdout_table": {"table_type": "observation_table", "table_id": holdout_table_id},
},
)
display(HTML(response.json()["content"]))
Available plots: ['distribution', 'predicted_vs_actual', 'predicted_vs_actual_per_bin']
# Generate predictions on FORECAST_SERIES table for visualization
response = client.post(
f"/ml_model/{ml_model_id}/prediction_table",
json={
"request_input": {
"request_type": "observation_table",
"table_id": forecast_series_table_id,
},
"include_input_features": False,
},
)
task_id = response.json()["id"]
print(f"FORECAST_SERIES prediction started (task: {task_id})")
task = wait_for_task(client, task_id)
fc_prediction_table_id = task.get("payload", {}).get("output_document_id")
print(f"FORECAST_SERIES prediction table: {fc_prediction_table_id}")
FORECAST_SERIES prediction started (task: 4f54a7ed-1645-49a3-a9ed-946f63de3970) status: STARTED... status: STARTED... status: STARTED... status: STARTED... FORECAST_SERIES prediction table: 69d985e649c7d923c28449fd
# Extract prediction entities
response = client.post(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)
response = client.get(f"/prediction_table/{fc_prediction_table_id}/prediction_entities")
entities = response.json()
print(f"Entity columns: {entities['entity_data']['columns']}")
for row in entities["entity_data"]["data"][:5]:
print(f" {row}")
# Create forecast comparison plot for a specific store
response = client.post(
f"/prediction_table/{fc_prediction_table_id}/forecast_comparison",
json={
"entity_filter": {"store_id": "CA_1"},
"plot_params": {"height": 500, "width": 1000, "font_size": 14},
},
)
task_id = response.json()["id"]
wait_for_task(client, task_id, poll_interval=10)
# Retrieve and display the plot
response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison")
comparisons = response.json()["data"]
if comparisons:
fc_id = comparisons[0]["id"]
response = client.get(f"/prediction_table/{fc_prediction_table_id}/forecast_comparison/{fc_id}")
display(HTML(response.json()["content"]))
status: STARTED... Entity columns: ['store_id'] ['CA_1'] ['CA_2'] ['CA_3'] ['CA_4'] ['TX_1'] status: STARTED...
Summary¶
| Step | Method | What we did |
|---|---|---|
| 1-2 | SDK | Created catalog, registered 3 tables (time series, calendar, dimension), tagged 2 entities |
| 3 | SDK | Formulated forecast use case with daily granularity, 28-day horizon |
| 4 | API | Generated training/validation/holdout observation tables via forecast automation |
| 5 | API | Ran automated ideation pipeline |
| 5b | API | Explored ideated features — SDK code and relevance scores |
| 6 | API | Refined features and trained standalone model |
| 7 | API | Predicted on holdout, generated evaluation and forecast comparison plots |
Steps covered in the Credit Default tutorial but skipped here: source table analysis, table EDA, cleaning operations, semantic detection, development dataset, feature EDA, model refit, Parquet download, and deployment. These work the same way for forecast use cases.
Key differences from the Credit Default tutorial:
- Forecast automation replaces manual observation table creation
use_naive_as_offsetis used for time series modeling- Forecast comparison plots show prediction lines vs actual target per entity