Skip to content

Ideation Configuration

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

This page documents the optional step configurations for an ideation pipeline. All steps can be left at their defaults for most use cases. The required model training setup is documented on the Ideation page.

All configurations use the same endpoint:

response = client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "...",  # discriminator — determines which config is applied
        # ... step-specific parameters
    },
)

Ideation Metadata

step_type: "ideation-metadata"

Configures entity selection, aggregation windows, and naive prediction. See Entity Selection for a detailed guide.

Development dataset constraint

If the pipeline uses a development dataset, the eligible and suggested entity candidates are constrained by the entity selection used when the development dataset was created. You can only select entities that are present in the sampled development tables. See Entity Selection — Development Datasets for details.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {
            "final": entity_selection,
        },
    },
)

Parameters:

Parameter Type Default Description
entity_selection object None Entity selection override. Set entity_selection.final to a list of per-table entity selections. See Entity Selection.
windows array None Override aggregation time windows. Each item has size (integer) and unit ("w", "d", or "h").
naive_prediction_schema object None Set naive_prediction_schema.final to null to disable naive prediction baseline (forecast only).
excluded_entity_sets array None Lists of entity ID groups to exclude from feature generation. Cannot be set together with entity_selection.
udf_mapping array None User-defined function mappings for custom transformations.

Disabling Naive Prediction (Forecast Only)

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {"final": entity_selection},
        "naive_prediction_schema": {"final": None},
    },
)

This only affects forecast use cases. Classification and regression use cases ignore this setting.

Custom Aggregation Windows

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "windows": [
            {"size": 4, "unit": "w"},
            {"size": 12, "unit": "w"},
            {"size": 26, "unit": "w"},
            {"size": 52, "unit": "w"},
        ],
    },
)

Semantic Detection

step_type: "semantic-detection"

Controls which columns are analyzed for semantic types. This is particularly important for tables with many columns — restricting the column selection can dramatically reduce run time.

Columns not included in the column_selection_request are automatically marked as non_informative, which excludes them from feature generation. This effectively prunes uninformative columns before ideation begins.

Parameters:

Parameter Type Default Description
column_selection_request object None Restrict detection to specific columns (see below)
column_selection_request.column_inputs array List of {table_id, column_name} objects to include
column_selection_request.reference_metadata object None Tracking metadata (optional)
column_selection_request.reference_metadata.feature_selection_id string None ID of the feature selection used to derive column inputs

Using a Prior Feature Selection to Prune Columns

A common pattern is to run an initial ideation, then use its feature selection results to restrict subsequent ideations to only the columns that mattered. This two-step approach significantly reduces run time on wide tables.

Step 1: Get feature selections from a prior pipeline:

response = client.get(
    "/feature_selection",
    params={
        "pipeline_id": prior_pipeline_id,
        "page_size": 100,
        "sort_dir": "desc",
    },
)
feature_selections = response.json()["data"]
feature_selection_id = feature_selections[0]["id"]

Step 2: Get the column inputs used by that feature selection:

response = client.get(
    "/pipeline/semantic_detection/column_inputs",
    params={"feature_selection_id": feature_selection_id},
)
column_inputs_response = response.json()
column_inputs = column_inputs_response["column_inputs"]

print(f"Columns to use: {len(column_inputs)}")

Step 3: Configure the new pipeline's semantic detection with those columns:

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "semantic-detection",
        "column_selection_request": {
            "column_inputs": column_inputs,
            "reference_metadata": {
                "feature_selection_id": feature_selection_id,
            },
        },
    },
)

Filters

step_type: "filter"

Controls event filters applied before feature generation.

# Enable filter detection (disabled by default)
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "filter",
        "skip_filters": False,
    },
)

Parameters:

Parameter Type Default Description
skip_filters boolean true Set to false to enable filter detection. When enabled, the pipeline suggests event filters.

EDA

step_type: "eda"

Controls which features are included in the EDA analysis step.

Parameters:

Parameter Type Default Description
apply_eda_to string "suggested" Which features to analyze: "suggested" (ideated features only) or "all" (ideated + existing catalog features)
observation_table_id string None Override the observation table used for EDA
min_relevance_score float None Minimum relevance score for features to include (0–9)
excluded_entity_sets array None Entity ID groups to exclude from EDA

Feature Selection

step_type: "feature-selection"

Controls how features are selected from the ideated set. Supports multiple selection requests in a single step.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "feature-selection",
        "feature_selection_requests": [
            {
                "name": "Top 1 per theme",
                "request_params": {
                    "mode": "Rule_based",
                    "rule": {
                        "top_n_overall": 500,
                        "top_m_per_theme": 1,
                        "logic_operator": "AND",
                    },
                },
            },
        ],
    },
)

Parameters:

Parameter Type Default Description
feature_selection_requests array [] List of selection requests (see below)

Each selection request contains:

Field Type Default Description
name string None Display name for this selection
description string None Description
request_params object Selection parameters (see below)

Selection parameters (request_params):

Parameter Type Default Description
mode string "GenAi_based" Selection mode: "GenAi_based", "Rule_based", or "Shap_based"
target_feature_count integer 50 Target number of features (max: 500)
rule object None Rule-based parameters (only when mode is "Rule_based")
rule.top_n_overall integer 100 Maximum features overall
rule.top_m_per_theme integer 5 Maximum features per theme
rule.logic_operator string "OR" "OR" (either criterion) or "AND" (both criteria must be met)
use_relevance_score boolean true Use semantic relevance scores
use_predictive_power_score boolean true Use predictive power scores
remove_redundant_features boolean true Remove highly correlated features
remove_dictionary_and_vector boolean true Exclude dictionary/vector features
remove_low_added_value_features boolean true Remove features with low marginal value
keep_always_observation_features boolean true Always keep observation table features

Table Selection

step_type: "table-selection"

Controls the sampling range for source tables during ideation.

Parameters:

Parameter Type Default Description
sampling_range object None Time range to sample from source tables

Tip

Choosing a time range enables faster results. This selection typically does not significantly affect the final feature ideation results — unless the range is less than one year, which may lead to incomplete data.