Ideation¶
See also
UI Tutorial: Ideate Features and Models | Forecast UI Tutorial: Ideate Features and Models | Concepts: Ideation | API Tutorial: Credit Default — Step 9 | API Tutorial: Store Sales Forecast — Step 5
Prerequisites
This page uses the client and wait_for_task helpers defined in API Overview.
The ideation pipeline automates the entire process of feature engineering: from table selection and semantic detection through feature generation, EDA, feature selection, and model training. This page covers how to create, configure, run, and extract results from ideation pipelines.
Pipeline Steps¶
An ideation pipeline progresses through these steps sequentially:
start -> table-selection -> semantic-detection -> transform -> filter
-> ideation-metadata -> feature-ideation -> eda -> feature-selection
-> model-train-setup-v2 -> model-train -> end
Create a Pipeline¶
client = fb.Configurations().get_client()
response = client.post(
"/pipeline",
json={
"action": "create",
"use_case_id": use_case_id,
"pipeline_type": "FEATURE_IDEATION",
"development_dataset_id": development_dataset_id, # optional
},
)
pipeline_id = response.json()["_id"]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
string | Yes | Must be "create" |
use_case_id |
string | Yes | ID of the use case to run ideation for |
pipeline_type |
string | Yes | Pipeline type: "FEATURE_IDEATION" |
development_dataset_id |
string | No | ID of a development dataset to use for faster ideation. The dataset's EDA table must match the use case's EDA table. See Entity Selection constraints. |
Configure Model Training Setup¶
Before advancing, you must configure which observation tables to use for training and validation. The pipeline will stop at the model training setup step if this is not pre-configured.
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "model-train-setup-v2",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
},
)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
data_source |
object | None | Training data configuration (required) |
data_source.type |
string | — | "train_valid_observation_tables" |
data_source.training_table_id |
string | — | ID of the training observation table |
data_source.validation_table_id |
string | — | ID of the validation observation table |
model_template_types |
array | ["NCTsDE_LGB", "NCTsDE_XGB"] |
Model templates to train |
primary_metric |
string | None | Override the primary evaluation metric |
ml_objective |
string | None | Override the ML objective |
ml_eval_metric |
string | None | Override the ML evaluation metric |
calibration_method |
string | None | Calibration method for model predictions |
decision_threshold |
float | None | Decision threshold for classification models |
refinement_top_n |
integer | 500 | Maximum features for automatic refinement (min: 1) |
refinement_importance_threshold |
float | 0.95 | Cumulative importance threshold for refinement (0.0–1.0) |
Configure Other Steps (Optional)¶
You can optionally configure entity selection, filters, transforms, and other steps. If omitted, the pipeline uses recommended defaults. See Ideation Configuration for all configurable steps.
# Optionally override entity selection (see Entity Selection page)
# If omitted, the system uses the recommended default
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "ideation-metadata",
"entity_selection": {"final": entity_selection},
},
)
See also: Entity Selection | Ideation Configuration (filters, transforms, windows, EDA, feature selection, etc.)
Advance the Pipeline¶
Step-by-step Advancement¶
Advance one step at a time (useful for table selection adjustments):
response = client.patch(
f"/pipeline/{pipeline_id}",
json={"action": "advance"},
)
# Wait for the step task to complete
pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
wait_for_task(client, pipeline_task["task_id"])
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
action |
string | Yes | Must be "advance" |
step_type |
string | No | Target step to advance to (e.g., "end" to run all remaining steps). If omitted, advances one step. |
Table Selection¶
After advancing to the table-selection step, you can exclude specific tables:
# Get available tables
response = client.get(f"/pipeline/{pipeline_id}/table_selection/table")
all_tables = response.json()["data"]
Each item in data contains:
| Field | Type | Description |
|---|---|---|
id |
string | Table ID |
name |
string | Table name |
table_selected |
boolean | Whether the table is currently selected for ideation |
# Select only the tables you want
excluded = ["ITEM_STORE2"]
selected_ids = [
t["_id"] for t in all_tables
if t.get("table_selected") and t["name"] not in excluded
]
response = client.patch(
f"/pipeline/{pipeline_id}/table_selection",
json={"table_selection": selected_ids},
)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
table_selection |
array | Yes | List of table IDs to include in ideation |
Run to Completion¶
Advance directly to the end (runs all remaining steps):
response = client.patch(
f"/pipeline/{pipeline_id}",
json={"action": "advance", "step_type": "end"},
)
pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
wait_for_task(client, pipeline_task["task_id"])
Monitor Pipeline Status¶
response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()
print(f"Current step: {data['current_step_type']}")
for group in data["groups"]:
for step in group["steps"]:
marker = "+" if step["step_status"] == "completed" else " "
print(f" [{marker}] {step['step_type']}: {step['step_status']}")
Pipeline response fields:
| Field | Type | Description |
|---|---|---|
id |
string | Pipeline ID |
name |
string | Pipeline name |
use_case_id |
string | Associated use case |
current_step_type |
string | Current pipeline step |
groups |
array | Step groups (EXPLORE, CREATE, EXPERIMENT), each containing steps |
pipeline_runner_task |
object | Currently running task (with task_id), or null |
development_dataset_id |
string | Associated development dataset |
report_id |
string | Generated report ID |
task_runner_failed |
boolean | Whether the pipeline task has failed |
Each step in groups[].steps contains:
| Field | Type | Description |
|---|---|---|
step_type |
string | Step name (e.g., "table-selection", "model-train") |
step_status |
string | "pending", "in_progress", "completed", "failed" |
ml_model_ids |
array | Model IDs (only on model-train step, after completion) |
Extract Pipeline Results¶
After a pipeline completes, you can retrieve the generated features, models, and reports.
Get Suggested Features (Feature Ideation)¶
response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
ideation_step = response.json()
feature_ideation_id = ideation_step.get("feature_ideation_id")
# Get full ideation details via the feature ideation endpoint
response = client.get(f"/feature_ideation/{feature_ideation_id}")
ideation = response.json()
Step response fields:
| Field | Type | Description |
|---|---|---|
step_type |
string | "feature-ideation" |
step_status |
string | "pending", "in_progress", "completed", "failed" |
feature_ideation_id |
string | ID of the generated feature ideation (use with GET /feature_ideation/{id} for full details) |
semantic_detection_id |
string | ID of the semantic detection used |
Get Trained Models¶
List models from the pipeline, sorted by evaluation metric:
response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")
response = client.get(
"/catalog/ml_model",
params={
"pipeline_id": pipeline_id,
"pipeline_id_to_mark": pipeline_id,
"use_case_id": use_case_id,
"sort_by": sort_metric,
"sort_dir": sort_dir,
"sort_by_metric": True,
"page_size": 100,
},
)
models = response.json()["data"]
for m in models:
pipeline_tag = " (pipeline)" if m.get("is_pipeline_generated") else ""
print(f" {m['name']}{pipeline_tag}")
Query parameters:
| Parameter | Type | Description |
|---|---|---|
pipeline_id |
string | Filter to models trained in this pipeline |
pipeline_id_to_mark |
string | Mark models from this pipeline (is_pipeline_generated: true in response) |
use_case_id |
string | Filter by use case |
sort_by |
string | Field or metric name to sort by (e.g., "auc", "rmse") |
sort_dir |
string | Sort direction: "asc" or "desc" |
sort_by_metric |
boolean | Set to true to sort by evaluation metric name instead of a regular field |
show_refits |
boolean | Include refit models (default: false) |
Get Feature Selection Results¶
response = client.get(f"/pipeline/{pipeline_id}/feature_selection")
selection_step = response.json()
feature_selection_ids = selection_step.get("feature_selection_ids", [])
# Get full selection details via the feature selection endpoint
response = client.get(f"/feature_selection/{feature_selection_ids[0]}")
selection = response.json()
Step response fields:
| Field | Type | Description |
|---|---|---|
step_type |
string | "feature-selection" |
step_status |
string | "pending", "in_progress", "completed", "failed" |
feature_selection_ids |
array | IDs of generated feature selections |
selected_feature_selection_ids |
array | IDs of the selected feature selections |
feature_ideation_id |
string | ID of the source feature ideation |
Feature selection detail fields (GET /feature_selection/{id}):
| Field | Type | Description |
|---|---|---|
id |
string | Feature selection ID |
nb_candidates |
integer | Number of candidate features evaluated |
nb_selected |
integer | Number of features selected |
feature_ids |
array | IDs of selected features |
data |
array | Detailed per-feature selection results with scores and rankings |
signal_range |
string | Signal range description |
Get the Validation Leaderboard¶
The pipeline's validation leaderboard step provides the leaderboard ID and sort settings:
response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
leaderboard_id = leaderboard_step.get("leaderboard_id")
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")
# List models in the leaderboard sorted by metric
response = client.get(
"/catalog/ml_model",
params={
"leaderboard_id": leaderboard_id,
"sort_by": sort_metric,
"sort_dir": sort_dir,
"sort_by_metric": True,
"leaderboard_role": "OUTCOME",
"page_size": 100,
},
)
models = response.json()["data"]
for m in models:
scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
print(f" {m['name']}: {scores}")
Step response fields:
| Field | Type | Description |
|---|---|---|
step_type |
string | "validation-leaderboard" |
step_status |
string | "pending", "in_progress", "completed", "failed" |
ml_model_ids |
array | IDs of models evaluated |
leaderboard_id |
string | ID of the generated leaderboard |
sort_metric |
string | Metric used for sorting |
sort_dir |
string | Sort direction ("asc" or "desc") |
See Evaluation for leaderboard detail fields.
Get the Pipeline Report¶
Report response fields:
| Field | Type | Description |
|---|---|---|
id |
string | Report ID |
setup_report |
object | Data setup summary |
explore_report |
object | Table exploration and semantic detection summary |
create_report |
object | Feature ideation, EDA, and selection summary |
experiment_report |
object | Model training and evaluation summary |
summary |
string | Overall pipeline summary |
user_doc |
string | Consolidated documentation |
Download Report as PDF¶
response = client.get(f"/pipeline/{pipeline_id}/report_pdf")
with open("ideation_report.pdf", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
Download Report as Markdown¶
Parallel Ideation¶
A powerful pattern is to launch multiple ideation pipelines with different entity selections simultaneously:
def launch_ideation(client, use_case_id, entity_selection_config, training_table_id, validation_table_id):
"""Create and launch a pipeline with the given entity selection."""
# Create pipeline
response = client.post(
"/pipeline",
json={"action": "create", "use_case_id": use_case_id, "pipeline_type": "FEATURE_IDEATION"},
)
pipeline_id = response.json()["_id"]
# Configure training tables
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={
"step_type": "model-train-setup-v2",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
},
)
# Configure entity selection
client.patch(
f"/pipeline/{pipeline_id}/step_configs",
json={"step_type": "ideation-metadata", "entity_selection": {"final": entity_selection_config}},
)
# Run to completion
response = client.patch(f"/pipeline/{pipeline_id}", json={"action": "advance", "step_type": "end"})
task_id = response.json()["pipeline_runner_task"]["task_id"]
return pipeline_id, task_id
# Launch multiple pipelines with different entity configurations
entity_configs = {
"default": [{"table_id": str(sales_table.id), "entities": [["item_store_id"]]}],
"item": [{"table_id": str(sales_table.id), "entities": [["item_store_id"], ["item_id"]]}],
# ... add more configurations as needed
}
pipelines = []
for name, config in entity_configs.items():
pipeline_id, task_id = launch_ideation(client, use_case_id, config, training_table_id, validation_table_id)
pipelines.append({"name": name, "pipeline_id": pipeline_id, "task_id": task_id})
# Wait for all to complete
for p in pipelines:
if p["task_id"]:
wait_for_task(client, p["task_id"])
This lets you compare which entity configuration produces the best model, without running them sequentially.