Observation Table Automation¶

Forecast Automation¶

import featurebyte as fb

client = fb.Configurations().get_client()

payload = {
    "use_case_id": str(use_case.id),
    "prediction_schedule_cron": "30 3 * * 1",
    "prediction_schedule_timezone": "Etc/UTC",
    "forecast_start_offset": 0,
    "forecast_horizon": 28,
    "periods": [
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "eda",
            "target_observation_count": 50000,
            "purpose": "eda",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "training",
            "target_observation_count": 1000000,
            "purpose": "training",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
    ],
}

response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Then set the default EDA table via the SDK:

use_case.update_default_eda_table("eda")

Parameters:

Parameter	Type	Required	Description
`use_case_id`	string	Yes	ID of the use case (must be a forecast use case)
`prediction_schedule_cron`	string	Yes	Cron expression defining the prediction schedule (e.g., `"30 3 * * 1"` for Monday 3:30 AM)
`prediction_schedule_timezone`	string	Yes	IANA timezone for the cron schedule (e.g., `"Etc/UTC"`, `"America/New_York"`)
`forecast_start_offset`	integer	Yes	Number of periods after prediction time for the first forecast point. `0` = same period, `1` = next period
`forecast_start_anchor`	string	No	How the first forecast point is anchored: `"CRON_TIME"` (default) or `"NEXT_DAY_START"` (sub-daily only)
`forecast_horizon`	integer	Yes	Number of periods to forecast from the first forecast point
`periods`	array	Yes	List of observation table periods to generate (see below)
`dimension_filter`	object	No	Filter entities by a dimension table condition (see Filtered Observation Tables)

Period parameters:

Parameter	Type	Required	Description
`start`	string	Yes	Start date of the sampling range (ISO format, e.g., `"2012-02-18"`)
`end`	string	Yes	End date of the sampling range
`name`	string	Yes	Name for the generated observation table (must be unique within the request)
`target_observation_count`	integer	Yes	Target number of rows to sample
`purpose`	string	Yes	See purpose values below
`mode`	string	Yes	`"ONE_ROW_PER_ENTITY_FORECAST_POINT"` or `"FORECAST_SERIES"`

Purpose values:

Value	Description
`"preview"`	Quick preview of the data
`"eda"`	Typically a smaller table used as the default EDA table for ideation (see EDA table sizing)
`"training"`	Model training
`"validation_test"`	Model validation and testing
`"other"`	Other purposes (e.g., visualization)

Mode and purpose combinations:

Mode	Description
`ONE_ROW_PER_ENTITY_FORECAST_POINT`	One row per entity per forecast point. Required for EDA and training.
`FORECAST_SERIES`	Full forecast series per entity. Recommended for visualization only.

Mode restrictions

"eda" and "training" tables must use ONE_ROW_PER_ENTITY_FORECAST_POINT mode. The FORECAST_SERIES mode cannot be used with "training" or "eda" purposes and is recommended for visualization with "other" purpose.

EDA Table Sizing¶

The EDA observation table is used by the ideation pipeline to filter out noisy features, run EDA analysis, sort and pre-screen feature candidates, and run feature selection — before training models on larger data. It should be large enough to produce reliable signal estimates but small enough for fast iteration.

Recommended sizes:

50,000 rows — good default for most use cases. Fast iteration with reliable signal.
100,000 rows — for use cases with many entities or sparse signals where more data improves feature ranking.

The final feature lists are typically refined from the SHAP importance of a model trained on the larger training table. The EDA table is for efficient exploration, not final model quality.

Filtered Observation Tables¶

Create observation tables filtered by a dimension (e.g., per-shop training tables). Uses the same endpoint with an additional dimension_filter parameter:

item_store_table = catalog.get_table("ITEM_STORE")

payload = {
    "use_case_id": use_case_id,
    "prediction_schedule_cron": "30 3 * * 1",
    "prediction_schedule_timezone": "Etc/UTC",
    "forecast_start_offset": 0,
    "forecast_horizon": 28,
    "dimension_filter": {
        "dimension_table_id": str(item_store_table.id),
        "conditions": [
            {
                "column_name": "store_id",
                "operation": "is",
                "value": "CA_1",
            }
        ],
    },
    "periods": [
        {
            "start": "2012-02-18",
            "end": "2016-04-25",
            "name": "training_CA_1",
            "target_observation_count": 1000000,
            "purpose": "training",
            "mode": "ONE_ROW_PER_ENTITY_FORECAST_POINT",
        },
    ],
}

response = client.post("/observation_table/forecast_automation", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Dimension filter parameters:

Parameter	Type	Description
`dimension_filter.dimension_table_id`	string	ID of the dimension table to filter on
`dimension_filter.conditions`	array	List of filter conditions
`dimension_filter.conditions[].column_name`	string	Column to filter on
`dimension_filter.conditions[].operation`	string	Filter operation (see below)
`dimension_filter.conditions[].value`	any	Value to compare against

Condition operations:

Operation	Value type	Description
`"is"`	scalar	Exact match
`"is in"`	array	Match any value in the list
`"less than"`	number	Strictly less than
`"less than or equal"`	number	Less than or equal to
`"greater than"`	number	Strictly greater than
`"greater than or equal"`	number	Greater than or equal to
`"contains"`	string	Substring match