Ideation Configuration¶

Ideation Metadata¶

step_type: "ideation-metadata"

Configures entity selection, aggregation windows, and naive prediction. See Entity Selection for a detailed guide.

Development dataset constraint

If the pipeline uses a development dataset, the eligible and suggested entity candidates are constrained by the entity selection used when the development dataset was created. You can only select entities that are present in the sampled development tables. See Entity Selection — Development Datasets for details.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {
            "final": entity_selection,
        },
    },
)

Parameters:

Parameter	Type	Default	Description
`entity_selection`	object	None	Entity selection override. Set `entity_selection.final` to a list of per-table entity selections. See Entity Selection.
`windows`	array	None	Override aggregation time windows. Each item has `size` (integer) and `unit` (`"w"`, `"d"`, or `"h"`).
`naive_prediction_schema`	object	None	Set `naive_prediction_schema.final` to `null` to disable naive prediction baseline (forecast only).
`excluded_entity_sets`	array	None	Lists of entity ID groups to exclude from feature generation. Cannot be set together with `entity_selection`.
`udf_mapping`	array	None	User-defined function mappings for custom transformations.

Disabling Naive Prediction (Forecast Only)¶

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {"final": entity_selection},
        "naive_prediction_schema": {"final": None},
    },
)

This only affects forecast use cases. Classification and regression use cases ignore this setting.

Custom Aggregation Windows¶

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "windows": [
            {"size": 4, "unit": "w"},
            {"size": 12, "unit": "w"},
            {"size": 26, "unit": "w"},
            {"size": 52, "unit": "w"},
        ],
    },
)

Semantic Detection¶

step_type: "semantic-detection"

Controls which columns are analyzed for semantic types. This is particularly important for tables with many columns — restricting the column selection can dramatically reduce run time.

Columns not included in the column_selection_request are automatically marked as non_informative, which excludes them from feature generation. This effectively prunes uninformative columns before ideation begins.

Parameters:

Parameter	Type	Default	Description
`column_selection_request`	object	None	Restrict detection to specific columns (see below)
`column_selection_request.column_inputs`	array	—	List of `{table_id, column_name}` objects to include
`column_selection_request.reference_metadata`	object	None	Tracking metadata (optional)
`column_selection_request.reference_metadata.feature_selection_id`	string	None	ID of the feature selection used to derive column inputs

Using a Prior Feature Selection to Prune Columns¶

A common pattern is to run an initial ideation, then use its feature selection results to restrict subsequent ideations to only the columns that mattered. This two-step approach significantly reduces run time on wide tables.

Step 1: Get feature selections from a prior pipeline:

response = client.get(
    "/feature_selection",
    params={
        "pipeline_id": prior_pipeline_id,
        "page_size": 100,
        "sort_dir": "desc",
    },
)
feature_selections = response.json()["data"]
feature_selection_id = feature_selections[0]["id"]

Step 2: Get the column inputs used by that feature selection:

response = client.get(
    "/pipeline/semantic_detection/column_inputs",
    params={"feature_selection_id": feature_selection_id},
)
column_inputs_response = response.json()
column_inputs = column_inputs_response["column_inputs"]

print(f"Columns to use: {len(column_inputs)}")

Step 3: Configure the new pipeline's semantic detection with those columns:

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "semantic-detection",
        "column_selection_request": {
            "column_inputs": column_inputs,
            "reference_metadata": {
                "feature_selection_id": feature_selection_id,
            },
        },
    },
)

Filters¶

step_type: "filter"

Controls event filters applied before feature generation.

# Enable filter detection (disabled by default)
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "filter",
        "skip_filters": False,
    },
)

Parameters:

Parameter	Type	Default	Description
`skip_filters`	boolean	`true`	Set to `false` to enable filter detection. When enabled, the pipeline suggests event filters.

EDA¶

step_type: "eda"

Controls which features are included in the EDA analysis step.

Parameters:

Parameter	Type	Default	Description
`apply_eda_to`	string	`"suggested"`	Which features to analyze: `"suggested"` (ideated features only) or `"all"` (ideated + existing catalog features)
`observation_table_id`	string	None	Override the observation table used for EDA
`min_relevance_score`	float	None	Minimum relevance score for features to include (0–9)
`excluded_entity_sets`	array	None	Entity ID groups to exclude from EDA

Feature Selection¶

step_type: "feature-selection"

Controls how features are selected from the ideated set. Supports multiple selection requests in a single step.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "feature-selection",
        "feature_selection_requests": [
            {
                "name": "Top 1 per theme",
                "request_params": {
                    "mode": "Rule_based",
                    "rule": {
                        "top_n_overall": 500,
                        "top_m_per_theme": 1,
                        "logic_operator": "AND",
                    },
                },
            },
        ],
    },
)

Parameters:

Parameter	Type	Default	Description
`feature_selection_requests`	array	`[]`	List of selection requests (see below)

Each selection request contains:

Field	Type	Default	Description
`name`	string	None	Display name for this selection
`description`	string	None	Description
`request_params`	object	—	Selection parameters (see below)

Selection parameters (request_params):

Parameter	Type	Default	Description
`mode`	string	`"GenAi_based"`	Selection mode: `"GenAi_based"`, `"Rule_based"`, or `"Shap_based"`
`target_feature_count`	integer	50	Target number of features (max: 500)
`rule`	object	None	Rule-based parameters (only when `mode` is `"Rule_based"`)
`rule.top_n_overall`	integer	100	Maximum features overall
`rule.top_m_per_theme`	integer	5	Maximum features per theme
`rule.logic_operator`	string	`"OR"`	`"OR"` (either criterion) or `"AND"` (both criteria must be met)
`use_relevance_score`	boolean	`true`	Use semantic relevance scores
`use_predictive_power_score`	boolean	`true`	Use predictive power scores
`remove_redundant_features`	boolean	`true`	Remove highly correlated features
`remove_dictionary_and_vector`	boolean	`true`	Exclude dictionary/vector features
`remove_low_added_value_features`	boolean	`true`	Remove features with low marginal value
`keep_always_observation_features`	boolean	`true`	Always keep observation table features

Table Selection¶

step_type: "table-selection"

Controls the sampling range for source tables during ideation.

Parameters:

Parameter	Type	Default	Description
`sampling_range`	object	None	Time range to sample from source tables

Tip

Choosing a time range enables faster results. This selection typically does not significantly affect the final feature ideation results — unless the range is less than one year, which may lead to incomplete data.