Ideation Configuration¶
Prerequisites
This page uses the client and wait_for_task helpers defined in API Overview.
This page documents the optional step configurations for an ideation pipeline. All steps can be left at their defaults for most use cases. The required model training setup is documented on the Ideation page.
All configurations use the same endpoint:
response = client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "...", # discriminator — determines which config is applied
# ... step-specific parameters
},
)
Ideation Metadata¶
step_type: "ideation-metadata"
Configures entity selection, aggregation windows, and naive prediction. See Entity Selection for a detailed guide.
Development dataset constraint
If the pipeline uses a development dataset, the eligible and suggested entity candidates are constrained by the entity selection used when the development dataset was created. You can only select entities that are present in the sampled development tables. See Entity Selection — Development Datasets for details.
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "ideation-metadata",
"entity_selection": {
"final": entity_selection,
},
},
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
entity_selection |
object | None | Entity selection override. Set entity_selection.final to a list of per-table entity selections. See Entity Selection. |
windows |
array | None | Override aggregation time windows. Each item has size (integer) and unit ("w", "d", or "h"). |
naive_prediction_schema |
object | None | Set naive_prediction_schema.final to null to disable naive prediction baseline (forecast only). |
excluded_entity_sets |
array | None | Lists of entity ID groups to exclude from feature generation. Cannot be set together with entity_selection. |
udf_mapping |
array | None | User-defined function mappings for custom transformations. |
Disabling Naive Prediction (Forecast Only)¶
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "ideation-metadata",
"entity_selection": {"final": entity_selection},
"naive_prediction_schema": {"final": None},
},
)
This only affects forecast use cases. Classification and regression use cases ignore this setting.
Custom Aggregation Windows¶
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "ideation-metadata",
"windows": [
{"size": 4, "unit": "w"},
{"size": 12, "unit": "w"},
{"size": 26, "unit": "w"},
{"size": 52, "unit": "w"},
],
},
)
Semantic Detection¶
step_type: "semantic-detection"
Controls which columns are analyzed for semantic types. This is particularly important for tables with many columns — restricting the column selection can dramatically reduce run time.
Columns not included in the column_selection_request are automatically marked as non_informative, which excludes them from feature generation. This effectively prunes uninformative columns before ideation begins.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
column_selection_request |
object | None | Restrict detection to specific columns (see below) |
column_selection_request.column_inputs |
array | — | List of {table_id, column_name} objects to include |
column_selection_request.reference_metadata |
object | None | Tracking metadata (optional) |
column_selection_request.reference_metadata.feature_selection_id |
string | None | ID of the feature selection used to derive column inputs |
Using a Prior Feature Selection to Prune Columns¶
A common pattern is to run an initial ideation, then use its feature selection results to restrict subsequent ideations to only the columns that mattered. This two-step approach significantly reduces run time on wide tables.
Step 1: Get feature selections from a prior pipeline:
response = client.get(
"/feature_selection",
params={
"pipeline_id": prior_pipeline_id,
"page_size": 100,
"sort_dir": "desc",
},
)
feature_selections = response.json()["data"]
feature_selection_id = feature_selections[0]["id"]
Step 2: Get the column inputs used by that feature selection:
response = client.get(
"/pipeline/semantic_detection/column_inputs",
params={"feature_selection_id": feature_selection_id},
)
column_inputs_response = response.json()
column_inputs = column_inputs_response["column_inputs"]
print(f"Columns to use: {len(column_inputs)}")
Step 3: Configure the new pipeline's semantic detection with those columns:
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "semantic-detection",
"column_selection_request": {
"column_inputs": column_inputs,
"reference_metadata": {
"feature_selection_id": feature_selection_id,
},
},
},
)
Filters¶
step_type: "filter"
Controls event filters applied before feature generation.
# Enable filter detection (disabled by default)
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "filter",
"skip_filters": False,
},
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
skip_filters |
boolean | true |
Set to false to enable filter detection. When enabled, the pipeline suggests event filters. |
EDA¶
step_type: "eda"
Controls which features are included in the EDA analysis step.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
apply_eda_to |
string | "suggested" |
Which features to analyze: "suggested" (ideated features only) or "all" (ideated + existing catalog features) |
observation_table_id |
string | None | Override the observation table used for EDA |
min_relevance_score |
float | None | Minimum relevance score for features to include (0–9) |
excluded_entity_sets |
array | None | Entity ID groups to exclude from EDA |
Feature Selection¶
step_type: "feature-selection"
Controls how features are selected from the ideated set. Supports multiple selection requests in a single step.
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "feature-selection",
"feature_selection_requests": [
{
"name": "Top 1 per theme",
"request_params": {
"mode": "Rule_based",
"rule": {
"top_n_overall": 500,
"top_m_per_theme": 1,
"logic_operator": "AND",
},
},
},
],
},
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
feature_selection_requests |
array | [] |
List of selection requests (see below) |
Each selection request contains:
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | None | Display name for this selection |
description |
string | None | Description |
request_params |
object | — | Selection parameters (see below) |
Selection parameters (request_params):
| Parameter | Type | Default | Description |
|---|---|---|---|
mode |
string | "GenAi_based" |
Selection mode: "GenAi_based", "Rule_based", or "Shap_based" |
target_feature_count |
integer | 50 | Target number of features (max: 500) |
rule |
object | None | Rule-based parameters (only when mode is "Rule_based") |
rule.top_n_overall |
integer | 100 | Maximum features overall |
rule.top_m_per_theme |
integer | 5 | Maximum features per theme |
rule.logic_operator |
string | "OR" |
"OR" (either criterion) or "AND" (both criteria must be met) |
use_relevance_score |
boolean | true |
Use semantic relevance scores |
use_predictive_power_score |
boolean | true |
Use predictive power scores |
remove_redundant_features |
boolean | true |
Remove highly correlated features |
remove_dictionary_and_vector |
boolean | true |
Exclude dictionary/vector features |
remove_low_added_value_features |
boolean | true |
Remove features with low marginal value |
keep_always_observation_features |
boolean | true |
Always keep observation table features |
Table Selection¶
step_type: "table-selection"
Controls the sampling range for source tables during ideation.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
sampling_range |
object | None | Time range to sample from source tables |
Tip
Choosing a time range enables faster results. This selection typically does not significantly affect the final feature ideation results — unless the range is less than one year, which may lead to incomplete data.