Skip to content

Observation Table Automation

Prerequisites

This page uses the wait_for_task helper defined in API Overview.

The observation table automation endpoint generates observation tables for forecast use cases by sampling from your data within a specified date range. This is available for forecast use cases that define a target using as_target or forward_aggregate_asat operations from a time series or snapshots table. This endpoint is not available through the SDK.

Forecast Automation

import featurebyte as fb

client = fb.Configurations().get_client()

payload = {
    "use_case_id": str(use_case.id),
    "prediction_schedule_cron": "30 3 * * 1",
    "prediction_schedule_timezone": "Etc/UTC",
    "forecast_start_offset": 0,
    "forecast_horizon": 28,
    "periods": [
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "eda",
            "target_observation_count": 50000,
            "purpose": "eda",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "training",
            "target_observation_count": 1000000,
            "purpose": "training",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
    ],
}

response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Then set the default EDA table via the SDK:

use_case.update_default_eda_table("eda")

Parameters:

Parameter Type Required Description
use_case_id string Yes ID of the use case (must be a forecast use case)
prediction_schedule_cron string Yes Cron expression defining the prediction schedule (e.g., "30 3 * * 1" for Monday 3:30 AM)
prediction_schedule_timezone string Yes IANA timezone for the cron schedule (e.g., "Etc/UTC", "America/New_York")
forecast_start_offset integer Yes Number of periods after prediction time for the first forecast point. 0 = same period, 1 = next period
forecast_start_anchor string No How the first forecast point is anchored: "CRON_TIME" (default) or "NEXT_DAY_START" (sub-daily only)
forecast_horizon integer Yes Number of periods to forecast from the first forecast point
periods array Yes List of observation table periods to generate (see below)
dimension_filter object No Filter entities by a dimension table condition (see Filtered Observation Tables)

Period parameters:

Parameter Type Required Description
start string Yes Start date of the sampling range (ISO format, e.g., "2012-02-18")
end string Yes End date of the sampling range
name string Yes Name for the generated observation table (must be unique within the request)
target_observation_count integer Yes Target number of rows to sample
purpose string Yes See purpose values below
mode string Yes "ONE_ROW_PER_ENTITY_FORECAST_POINT" or "FORECAST_SERIES"

Purpose values:

Value Description
"preview" Quick preview of the data
"eda" Typically a smaller table used as the default EDA table for ideation (see EDA table sizing)
"training" Model training
"validation_test" Model validation and testing
"other" Other purposes (e.g., visualization)

Mode and purpose combinations:

Mode Description
ONE_ROW_PER_ENTITY_FORECAST_POINT One row per entity per forecast point. Required for EDA and training.
FORECAST_SERIES Full forecast series per entity. Recommended for visualization only.

Mode restrictions

"eda" and "training" tables must use ONE_ROW_PER_ENTITY_FORECAST_POINT mode. The FORECAST_SERIES mode cannot be used with "training" or "eda" purposes and is recommended for visualization with "other" purpose.

EDA Table Sizing

The EDA observation table is used by the ideation pipeline to filter out noisy features, run EDA analysis, sort and pre-screen feature candidates, and run feature selection — before training models on larger data. It should be large enough to produce reliable signal estimates but small enough for fast iteration.

Recommended sizes:

  • 50,000 rows — good default for most use cases. Fast iteration with reliable signal.
  • 100,000 rows — for use cases with many entities or sparse signals where more data improves feature ranking.

The final feature lists are typically refined from the SHAP importance of a model trained on the larger training table. The EDA table is for efficient exploration, not final model quality.

Filtered Observation Tables

Create observation tables filtered by a dimension (e.g., per-shop training tables). Uses the same endpoint with an additional dimension_filter parameter:

item_store_table = catalog.get_table("ITEM_STORE")

payload = {
    "use_case_id": use_case_id,
    "prediction_schedule_cron": "30 3 * * 1",
    "prediction_schedule_timezone": "Etc/UTC",
    "forecast_start_offset": 0,
    "forecast_horizon": 28,
    "dimension_filter": {
        "dimension_table_id": str(item_store_table.id),
        "conditions": [
            {
                "column_name": "store_id",
                "operation": "is",
                "value": "CA_1",
            }
        ],
    },
    "periods": [
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "training_CA_1",
            "target_observation_count": 1000000,
            "purpose": "training",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
    ],
}

response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Dimension filter parameters:

Parameter Type Description
dimension_filter.dimension_table_id string ID of the dimension table to filter on
dimension_filter.conditions array List of filter conditions
dimension_filter.conditions[].column_name string Column to filter on
dimension_filter.conditions[].operation string Filter operation (see below)
dimension_filter.conditions[].value any Value to compare against

Condition operations:

Operation Value type Description
"is" scalar Exact match
"is in" array Match any value in the list
"less than" number Strictly less than
"less than or equal" number Less than or equal to
"greater than" number Strictly greater than
"greater than or equal" number Greater than or equal to
"contains" string Substring match