Observation Table Automation¶
See also
UI Tutorial: Create Observation Tables | Forecast UI Tutorial: Create Observation Tables | Concepts: Use Case Formulation | API Tutorial: Store Sales Forecast — Step 4
Prerequisites
This page uses the wait_for_task helper defined in API Overview.
The observation table automation endpoint generates observation tables for forecast use cases by sampling from your data within a specified date range. This is available for forecast use cases that define a target using as_target or forward_aggregate_asat operations from a time series or snapshots table. This endpoint is not available through the SDK.
Forecast Automation¶
import featurebyte as fb
client = fb.Configurations().get_client()
payload = {
"use_case_id": str(use_case.id),
"prediction_schedule_cron": "30 3 * * 1",
"prediction_schedule_timezone": "Etc/UTC",
"forecast_start_offset": 0,
"forecast_horizon": 28,
"periods": [
{
"start": "2012-02-18",
"end": "2016-04-25",
"name": "eda",
"target_observation_count": 50000,
"purpose": "eda",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
{
"start": "2012-02-18",
"end": "2016-04-25",
"name": "training",
"target_observation_count": 1000000,
"purpose": "training",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
],
}
response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)
Then set the default EDA table via the SDK:
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
use_case_id |
string | Yes | ID of the use case (must be a forecast use case) |
prediction_schedule_cron |
string | Yes | Cron expression defining the prediction schedule (e.g., "30 3 * * 1" for Monday 3:30 AM) |
prediction_schedule_timezone |
string | Yes | IANA timezone for the cron schedule (e.g., "Etc/UTC", "America/New_York") |
forecast_start_offset |
integer | Yes | Number of periods after prediction time for the first forecast point. 0 = same period, 1 = next period |
forecast_start_anchor |
string | No | How the first forecast point is anchored: "CRON_TIME" (default) or "NEXT_DAY_START" (sub-daily only) |
forecast_horizon |
integer | Yes | Number of periods to forecast from the first forecast point |
periods |
array | Yes | List of observation table periods to generate (see below) |
dimension_filter |
object | No | Filter entities by a dimension table condition (see Filtered Observation Tables) |
Period parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
start |
string | Yes | Start date of the sampling range (ISO format, e.g., "2012-02-18") |
end |
string | Yes | End date of the sampling range |
name |
string | Yes | Name for the generated observation table (must be unique within the request) |
target_observation_count |
integer | Yes | Target number of rows to sample |
purpose |
string | Yes | See purpose values below |
mode |
string | Yes | "ONE_ROW_PER_ENTITY_FORECAST_POINT" or "FORECAST_SERIES" |
Purpose values:
| Value | Description |
|---|---|
"preview" |
Quick preview of the data |
"eda" |
Typically a smaller table used as the default EDA table for ideation (see EDA table sizing) |
"training" |
Model training |
"validation_test" |
Model validation and testing |
"other" |
Other purposes (e.g., visualization) |
Mode and purpose combinations:
| Mode | Description |
|---|---|
ONE_ROW_PER_ENTITY_FORECAST_POINT |
One row per entity per forecast point. Required for EDA and training. |
FORECAST_SERIES |
Full forecast series per entity. Recommended for visualization only. |
Mode restrictions
"eda" and "training" tables must use ONE_ROW_PER_ENTITY_FORECAST_POINT mode. The FORECAST_SERIES mode cannot be used with "training" or "eda" purposes and is recommended for visualization with "other" purpose.
EDA Table Sizing¶
The EDA observation table is used by the ideation pipeline to filter out noisy features, run EDA analysis, sort and pre-screen feature candidates, and run feature selection — before training models on larger data. It should be large enough to produce reliable signal estimates but small enough for fast iteration.
Recommended sizes:
- 50,000 rows — good default for most use cases. Fast iteration with reliable signal.
- 100,000 rows — for use cases with many entities or sparse signals where more data improves feature ranking.
The final feature lists are typically refined from the SHAP importance of a model trained on the larger training table. The EDA table is for efficient exploration, not final model quality.
Filtered Observation Tables¶
Create observation tables filtered by a dimension (e.g., per-shop training tables). Uses the same endpoint with an additional dimension_filter parameter:
item_store_table = catalog.get_table("ITEM_STORE")
payload = {
"use_case_id": use_case_id,
"prediction_schedule_cron": "30 3 * * 1",
"prediction_schedule_timezone": "Etc/UTC",
"forecast_start_offset": 0,
"forecast_horizon": 28,
"dimension_filter": {
"dimension_table_id": str(item_store_table.id),
"conditions": [
{
"column_name": "store_id",
"operation": "is",
"value": "CA_1",
}
],
},
"periods": [
{
"start": "2012-02-18",
"end": "2016-04-25",
"name": "training_CA_1",
"target_observation_count": 1000000,
"purpose": "training",
"mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
},
],
}
response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)
Dimension filter parameters:
| Parameter | Type | Description |
|---|---|---|
dimension_filter.dimension_table_id |
string | ID of the dimension table to filter on |
dimension_filter.conditions |
array | List of filter conditions |
dimension_filter.conditions[].column_name |
string | Column to filter on |
dimension_filter.conditions[].operation |
string | Filter operation (see below) |
dimension_filter.conditions[].value |
any | Value to compare against |
Condition operations:
| Operation | Value type | Description |
|---|---|---|
"is" |
scalar | Exact match |
"is in" |
array | Match any value in the list |
"less than" |
number | Strictly less than |
"less than or equal" |
number | Less than or equal to |
"greater than" |
number | Strictly greater than |
"greater than or equal" |
number | Greater than or equal to |
"contains" |
string | Substring match |