Ideation¶

Pipeline Steps¶

An ideation pipeline progresses through these steps sequentially:

start -> table-selection -> semantic-detection -> transform -> filter
  -> ideation-metadata -> feature-ideation -> eda -> feature-selection
  -> model-train-setup-v2 -> model-train -> end

Create a Pipeline¶

client = fb.Configurations().get_client()

response = client.post(
    "/pipeline",
    json={
        "action": "create",
        "use_case_id": use_case_id,
        "pipeline_type": "FEATURE_IDEATION",
        "development_dataset_id": development_dataset_id,  # optional
    },
)
pipeline_id = response.json()["_id"]

Parameters:

Parameter	Type	Required	Description
`action`	string	Yes	Must be `"create"`
`use_case_id`	string	Yes	ID of the use case to run ideation for
`pipeline_type`	string	Yes	Pipeline type: `"FEATURE_IDEATION"`
`development_dataset_id`	string	No	ID of a development dataset to use for faster ideation. The dataset's EDA table must match the use case's EDA table. See Entity Selection constraints.

Configure Model Training Setup¶

Before advancing, you must configure which observation tables to use for training and validation. The pipeline will stop at the model training setup step if this is not pre-configured.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "model-train-setup-v2",
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": training_table_id,
            "validation_table_id": validation_table_id,
        },
    },
)

Parameters:

Parameter	Type	Default	Description
`data_source`	object	None	Training data configuration (required)
`data_source.type`	string	—	`"train_valid_observation_tables"`
`data_source.training_table_id`	string	—	ID of the training observation table
`data_source.validation_table_id`	string	—	ID of the validation observation table
`model_template_types`	array	`["NCTsDE_LGB", "NCTsDE_XGB"]`	Model templates to train
`primary_metric`	string	None	Override the primary evaluation metric
`ml_objective`	string	None	Override the ML objective
`ml_eval_metric`	string	None	Override the ML evaluation metric
`calibration_method`	string	None	Calibration method for model predictions
`decision_threshold`	float	None	Decision threshold for classification models
`refinement_top_n`	integer	500	Maximum features for automatic refinement (min: 1)
`refinement_importance_threshold`	float	0.95	Cumulative importance threshold for refinement (0.0–1.0)

Configure Other Steps (Optional)¶

You can optionally configure entity selection, filters, transforms, and other steps. If omitted, the pipeline uses recommended defaults. See Ideation Configuration for all configurable steps.

# Optionally override entity selection (see Entity Selection page)
# If omitted, the system uses the recommended default
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {"final": entity_selection},
    },
)

See also: Entity Selection | Ideation Configuration (filters, transforms, windows, EDA, feature selection, etc.)

Advance the Pipeline¶

Step-by-step Advancement¶

Advance one step at a time (useful for table selection adjustments):

response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance"},
)

# Wait for the step task to complete
pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    wait_for_task(client, pipeline_task["task_id"])

Parameters:

Parameter	Type	Required	Description
`action`	string	Yes	Must be `"advance"`
`step_type`	string	No	Target step to advance to (e.g., `"end"` to run all remaining steps). If omitted, advances one step.

Table Selection¶

After advancing to the table-selection step, you can exclude specific tables:

# Get available tables
response = client.get(f"/pipeline/{pipeline_id}/table_selection/table")
all_tables = response.json()["data"]

Each item in data contains:

Field	Type	Description
`id`	string	Table ID
`name`	string	Table name
`table_selected`	boolean	Whether the table is currently selected for ideation

# Select only the tables you want
excluded = ["ITEM_STORE2"]
selected_ids = [
    t["_id"] for t in all_tables
    if t.get("table_selected") and t["name"] not in excluded
]

response = client.patch(
    f"/pipeline/{pipeline_id}/table_selection",
    json={"table_selection": selected_ids},
)

Parameters:

Parameter	Type	Required	Description
`table_selection`	array	Yes	List of table IDs to include in ideation

Run to Completion¶

Advance directly to the end (runs all remaining steps):

response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance", "step_type": "end"},
)

pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    wait_for_task(client, pipeline_task["task_id"])

Monitor Pipeline Status¶

response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()

print(f"Current step: {data['current_step_type']}")
for group in data["groups"]:
    for step in group["steps"]:
        marker = "+" if step["step_status"] == "completed" else " "
        print(f"  [{marker}] {step['step_type']}: {step['step_status']}")

Pipeline response fields:

Field	Type	Description
`id`	string	Pipeline ID
`name`	string	Pipeline name
`use_case_id`	string	Associated use case
`current_step_type`	string	Current pipeline step
`groups`	array	Step groups (EXPLORE, CREATE, EXPERIMENT), each containing `steps`
`pipeline_runner_task`	object	Currently running task (with `task_id`), or `null`
`development_dataset_id`	string	Associated development dataset
`report_id`	string	Generated report ID
`task_runner_failed`	boolean	Whether the pipeline task has failed

Each step in groups[].steps contains:

Field	Type	Description
`step_type`	string	Step name (e.g., `"table-selection"`, `"model-train"`)
`step_status`	string	`"pending"`, `"in_progress"`, `"completed"`, `"failed"`
`ml_model_ids`	array	Model IDs (only on `model-train` step, after completion)

Extract Pipeline Results¶

After a pipeline completes, you can retrieve the generated features, models, and reports.

Get Suggested Features (Feature Ideation)¶

response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
ideation_step = response.json()
feature_ideation_id = ideation_step.get("feature_ideation_id")

# Get full ideation details via the feature ideation endpoint
response = client.get(f"/feature_ideation/{feature_ideation_id}")
ideation = response.json()

Step response fields:

Field	Type	Description
`step_type`	string	`"feature-ideation"`
`step_status`	string	`"pending"`, `"in_progress"`, `"completed"`, `"failed"`
`feature_ideation_id`	string	ID of the generated feature ideation (use with `GET /feature_ideation/{id}` for full details)
`semantic_detection_id`	string	ID of the semantic detection used

Get Trained Models¶

List models from the pipeline, sorted by evaluation metric:

response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")

response = client.get(
    "/catalog/ml_model",
    params={
        "pipeline_id": pipeline_id,
        "pipeline_id_to_mark": pipeline_id,
        "use_case_id": use_case_id,
        "sort_by": sort_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    pipeline_tag = " (pipeline)" if m.get("is_pipeline_generated") else ""
    print(f"  {m['name']}{pipeline_tag}")

Query parameters:

Parameter	Type	Description
`pipeline_id`	string	Filter to models trained in this pipeline
`pipeline_id_to_mark`	string	Mark models from this pipeline (`is_pipeline_generated: true` in response)
`use_case_id`	string	Filter by use case
`sort_by`	string	Field or metric name to sort by (e.g., `"auc"`, `"rmse"`)
`sort_dir`	string	Sort direction: `"asc"` or `"desc"`
`sort_by_metric`	boolean	Set to `true` to sort by evaluation metric name instead of a regular field
`show_refits`	boolean	Include refit models (default: `false`)

Get Feature Selection Results¶

response = client.get(f"/pipeline/{pipeline_id}/feature_selection")
selection_step = response.json()
feature_selection_ids = selection_step.get("feature_selection_ids", [])

# Get full selection details via the feature selection endpoint
response = client.get(f"/feature_selection/{feature_selection_ids[0]}")
selection = response.json()

Step response fields:

Field	Type	Description
`step_type`	string	`"feature-selection"`
`step_status`	string	`"pending"`, `"in_progress"`, `"completed"`, `"failed"`
`feature_selection_ids`	array	IDs of generated feature selections
`selected_feature_selection_ids`	array	IDs of the selected feature selections
`feature_ideation_id`	string	ID of the source feature ideation

Feature selection detail fields (GET /feature_selection/{id}):

Field	Type	Description
`id`	string	Feature selection ID
`nb_candidates`	integer	Number of candidate features evaluated
`nb_selected`	integer	Number of features selected
`feature_ids`	array	IDs of selected features
`data`	array	Detailed per-feature selection results with scores and rankings
`signal_range`	string	Signal range description

Get the Validation Leaderboard¶

The pipeline's validation leaderboard step provides the leaderboard ID and sort settings:

response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
leaderboard_id = leaderboard_step.get("leaderboard_id")
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")

# List models in the leaderboard sorted by metric
response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": sort_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

Step response fields:

Field	Type	Description
`step_type`	string	`"validation-leaderboard"`
`step_status`	string	`"pending"`, `"in_progress"`, `"completed"`, `"failed"`
`ml_model_ids`	array	IDs of models evaluated
`leaderboard_id`	string	ID of the generated leaderboard
`sort_metric`	string	Metric used for sorting
`sort_dir`	string	Sort direction (`"asc"` or `"desc"`)

See Evaluation for leaderboard detail fields.

Get the Pipeline Report¶

response = client.get(f"/pipeline/{pipeline_id}/report")
report = response.json()

Report response fields:

Field	Type	Description
`id`	string	Report ID
`setup_report`	object	Data setup summary
`explore_report`	object	Table exploration and semantic detection summary
`create_report`	object	Feature ideation, EDA, and selection summary
`experiment_report`	object	Model training and evaluation summary
`summary`	string	Overall pipeline summary
`user_doc`	string	Consolidated documentation

Download Report as PDF¶

response = client.get(f"/pipeline/{pipeline_id}/report_pdf")
with open("ideation_report.pdf", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Download Report as Markdown¶

response = client.get(f"/pipeline/{pipeline_id}/report_md")
markdown = response.text

Parallel Ideation¶

A powerful pattern is to launch multiple ideation pipelines with different entity selections simultaneously:

def launch_ideation(client, use_case_id, entity_selection_config, training_table_id, validation_table_id):
    """Create and launch a pipeline with the given entity selection."""
    # Create pipeline
    response = client.post(
        "/pipeline",
        json={"action": "create", "use_case_id": use_case_id, "pipeline_type": "FEATURE_IDEATION"},
    )
    pipeline_id = response.json()["_id"]

    # Configure training tables
    client.patch(
        f"/pipeline/{pipeline_id}/step_configs",
        json={
            "step_type": "model-train-setup-v2",
            "data_source": {
                "type": "train_valid_observation_tables",
                "training_table_id": training_table_id,
                "validation_table_id": validation_table_id,
            },
        },
    )

    # Configure entity selection
    client.patch(
        f"/pipeline/{pipeline_id}/step_configs",
        json={"step_type": "ideation-metadata", "entity_selection": {"final": entity_selection_config}},
    )

    # Run to completion
    response = client.patch(f"/pipeline/{pipeline_id}", json={"action": "advance", "step_type": "end"})
    task_id = response.json()["pipeline_runner_task"]["task_id"]
    return pipeline_id, task_id

# Launch multiple pipelines with different entity configurations
entity_configs = {
    "default": [{"table_id": str(sales_table.id), "entities": [["item_store_id"]]}],
    "item": [{"table_id": str(sales_table.id), "entities": [["item_store_id"], ["item_id"]]}],
    # ... add more configurations as needed
}

pipelines = []
for name, config in entity_configs.items():
    pipeline_id, task_id = launch_ideation(client, use_case_id, config, training_table_id, validation_table_id)
    pipelines.append({"name": name, "pipeline_id": pipeline_id, "task_id": task_id})

# Wait for all to complete
for p in pipelines:
    if p["task_id"]:
        wait_for_task(client, p["task_id"])

This lets you compare which entity configuration produces the best model, without running them sequentially.