Skip to content

Ideation

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

The ideation pipeline automates the entire process of feature engineering: from table selection and semantic detection through feature generation, EDA, feature selection, and model training. This page covers how to create, configure, run, and extract results from ideation pipelines.

Pipeline Steps

An ideation pipeline progresses through these steps sequentially:

start -> table-selection -> semantic-detection -> transform -> filter
  -> ideation-metadata -> feature-ideation -> eda -> feature-selection
  -> model-train-setup-v2 -> model-train -> end

Create a Pipeline

client = fb.Configurations().get_client()

response = client.post(
    "/pipeline",
    json={
        "action": "create",
        "use_case_id": use_case_id,
        "pipeline_type": "FEATURE_IDEATION",
        "development_dataset_id": development_dataset_id,  # optional
    },
)
pipeline_id = response.json()["_id"]

Parameters:

Parameter Type Required Description
action string Yes Must be "create"
use_case_id string Yes ID of the use case to run ideation for
pipeline_type string Yes Pipeline type: "FEATURE_IDEATION"
development_dataset_id string No ID of a development dataset to use for faster ideation. The dataset's EDA table must match the use case's EDA table. See Entity Selection constraints.

Configure Model Training Setup

Before advancing, you must configure which observation tables to use for training and validation. The pipeline will stop at the model training setup step if this is not pre-configured.

client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "model-train-setup-v2",
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": training_table_id,
            "validation_table_id": validation_table_id,
        },
    },
)

Parameters:

Parameter Type Default Description
data_source object None Training data configuration (required)
data_source.type string "train_valid_observation_tables"
data_source.training_table_id string ID of the training observation table
data_source.validation_table_id string ID of the validation observation table
model_template_types array ["NCTsDE_LGB", "NCTsDE_XGB"] Model templates to train
primary_metric string None Override the primary evaluation metric
ml_objective string None Override the ML objective
ml_eval_metric string None Override the ML evaluation metric
calibration_method string None Calibration method for model predictions
decision_threshold float None Decision threshold for classification models
refinement_top_n integer 500 Maximum features for automatic refinement (min: 1)
refinement_importance_threshold float 0.95 Cumulative importance threshold for refinement (0.0–1.0)

Configure Other Steps (Optional)

You can optionally configure entity selection, filters, transforms, and other steps. If omitted, the pipeline uses recommended defaults. See Ideation Configuration for all configurable steps.

# Optionally override entity selection (see Entity Selection page)
# If omitted, the system uses the recommended default
client.patch(
    f"/pipeline/{pipeline_id}/step_configs",
    json={
        "step_type": "ideation-metadata",
        "entity_selection": {"final": entity_selection},
    },
)

See also: Entity Selection | Ideation Configuration (filters, transforms, windows, EDA, feature selection, etc.)

Advance the Pipeline

Step-by-step Advancement

Advance one step at a time (useful for table selection adjustments):

response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance"},
)

# Wait for the step task to complete
pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    wait_for_task(client, pipeline_task["task_id"])

Parameters:

Parameter Type Required Description
action string Yes Must be "advance"
step_type string No Target step to advance to (e.g., "end" to run all remaining steps). If omitted, advances one step.

Table Selection

After advancing to the table-selection step, you can exclude specific tables:

# Get available tables
response = client.get(f"/pipeline/{pipeline_id}/table_selection/table")
all_tables = response.json()["data"]

Each item in data contains:

Field Type Description
id string Table ID
name string Table name
table_selected boolean Whether the table is currently selected for ideation
# Select only the tables you want
excluded = ["ITEM_STORE2"]
selected_ids = [
    t["_id"] for t in all_tables
    if t.get("table_selected") and t["name"] not in excluded
]

response = client.patch(
    f"/pipeline/{pipeline_id}/table_selection",
    json={"table_selection": selected_ids},
)

Parameters:

Parameter Type Required Description
table_selection array Yes List of table IDs to include in ideation

Run to Completion

Advance directly to the end (runs all remaining steps):

response = client.patch(
    f"/pipeline/{pipeline_id}",
    json={"action": "advance", "step_type": "end"},
)

pipeline_task = response.json()["pipeline_runner_task"]
if pipeline_task:
    wait_for_task(client, pipeline_task["task_id"])

Monitor Pipeline Status

response = client.get(f"/pipeline/{pipeline_id}")
data = response.json()

print(f"Current step: {data['current_step_type']}")
for group in data["groups"]:
    for step in group["steps"]:
        marker = "+" if step["step_status"] == "completed" else " "
        print(f"  [{marker}] {step['step_type']}: {step['step_status']}")

Pipeline response fields:

Field Type Description
id string Pipeline ID
name string Pipeline name
use_case_id string Associated use case
current_step_type string Current pipeline step
groups array Step groups (EXPLORE, CREATE, EXPERIMENT), each containing steps
pipeline_runner_task object Currently running task (with task_id), or null
development_dataset_id string Associated development dataset
report_id string Generated report ID
task_runner_failed boolean Whether the pipeline task has failed

Each step in groups[].steps contains:

Field Type Description
step_type string Step name (e.g., "table-selection", "model-train")
step_status string "pending", "in_progress", "completed", "failed"
ml_model_ids array Model IDs (only on model-train step, after completion)

Extract Pipeline Results

After a pipeline completes, you can retrieve the generated features, models, and reports.

Get Suggested Features (Feature Ideation)

response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
ideation_step = response.json()
feature_ideation_id = ideation_step.get("feature_ideation_id")

# Get full ideation details via the feature ideation endpoint
response = client.get(f"/feature_ideation/{feature_ideation_id}")
ideation = response.json()

Step response fields:

Field Type Description
step_type string "feature-ideation"
step_status string "pending", "in_progress", "completed", "failed"
feature_ideation_id string ID of the generated feature ideation (use with GET /feature_ideation/{id} for full details)
semantic_detection_id string ID of the semantic detection used

Get Trained Models

List models from the pipeline, sorted by evaluation metric:

response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")

response = client.get(
    "/catalog/ml_model",
    params={
        "pipeline_id": pipeline_id,
        "pipeline_id_to_mark": pipeline_id,
        "use_case_id": use_case_id,
        "sort_by": sort_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    pipeline_tag = " (pipeline)" if m.get("is_pipeline_generated") else ""
    print(f"  {m['name']}{pipeline_tag}")

Query parameters:

Parameter Type Description
pipeline_id string Filter to models trained in this pipeline
pipeline_id_to_mark string Mark models from this pipeline (is_pipeline_generated: true in response)
use_case_id string Filter by use case
sort_by string Field or metric name to sort by (e.g., "auc", "rmse")
sort_dir string Sort direction: "asc" or "desc"
sort_by_metric boolean Set to true to sort by evaluation metric name instead of a regular field
show_refits boolean Include refit models (default: false)

Get Feature Selection Results

response = client.get(f"/pipeline/{pipeline_id}/feature_selection")
selection_step = response.json()
feature_selection_ids = selection_step.get("feature_selection_ids", [])

# Get full selection details via the feature selection endpoint
response = client.get(f"/feature_selection/{feature_selection_ids[0]}")
selection = response.json()

Step response fields:

Field Type Description
step_type string "feature-selection"
step_status string "pending", "in_progress", "completed", "failed"
feature_selection_ids array IDs of generated feature selections
selected_feature_selection_ids array IDs of the selected feature selections
feature_ideation_id string ID of the source feature ideation

Feature selection detail fields (GET /feature_selection/{id}):

Field Type Description
id string Feature selection ID
nb_candidates integer Number of candidate features evaluated
nb_selected integer Number of features selected
feature_ids array IDs of selected features
data array Detailed per-feature selection results with scores and rankings
signal_range string Signal range description

Get the Validation Leaderboard

The pipeline's validation leaderboard step provides the leaderboard ID and sort settings:

response = client.get(f"/pipeline/{pipeline_id}/validation_leaderboard")
leaderboard_step = response.json()
leaderboard_id = leaderboard_step.get("leaderboard_id")
sort_metric = leaderboard_step.get("sort_metric", "auc")
sort_dir = leaderboard_step.get("sort_dir", "desc")

# List models in the leaderboard sorted by metric
response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": sort_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

Step response fields:

Field Type Description
step_type string "validation-leaderboard"
step_status string "pending", "in_progress", "completed", "failed"
ml_model_ids array IDs of models evaluated
leaderboard_id string ID of the generated leaderboard
sort_metric string Metric used for sorting
sort_dir string Sort direction ("asc" or "desc")

See Evaluation for leaderboard detail fields.

Get the Pipeline Report

response = client.get(f"/pipeline/{pipeline_id}/report")
report = response.json()

Report response fields:

Field Type Description
id string Report ID
setup_report object Data setup summary
explore_report object Table exploration and semantic detection summary
create_report object Feature ideation, EDA, and selection summary
experiment_report object Model training and evaluation summary
summary string Overall pipeline summary
user_doc string Consolidated documentation

Download Report as PDF

response = client.get(f"/pipeline/{pipeline_id}/report_pdf")
with open("ideation_report.pdf", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)

Download Report as Markdown

response = client.get(f"/pipeline/{pipeline_id}/report_md")
markdown = response.text

Parallel Ideation

A powerful pattern is to launch multiple ideation pipelines with different entity selections simultaneously:

def launch_ideation(client, use_case_id, entity_selection_config, training_table_id, validation_table_id):
    """Create and launch a pipeline with the given entity selection."""
    # Create pipeline
    response = client.post(
        "/pipeline",
        json={"action": "create", "use_case_id": use_case_id, "pipeline_type": "FEATURE_IDEATION"},
    )
    pipeline_id = response.json()["_id"]

    # Configure training tables
    client.patch(
        f"/pipeline/{pipeline_id}/step_configs",
        json={
            "step_type": "model-train-setup-v2",
            "data_source": {
                "type": "train_valid_observation_tables",
                "training_table_id": training_table_id,
                "validation_table_id": validation_table_id,
            },
        },
    )

    # Configure entity selection
    client.patch(
        f"/pipeline/{pipeline_id}/step_configs",
        json={"step_type": "ideation-metadata", "entity_selection": {"final": entity_selection_config}},
    )

    # Run to completion
    response = client.patch(f"/pipeline/{pipeline_id}", json={"action": "advance", "step_type": "end"})
    task_id = response.json()["pipeline_runner_task"]["task_id"]
    return pipeline_id, task_id

# Launch multiple pipelines with different entity configurations
entity_configs = {
    "default": [{"table_id": str(sales_table.id), "entities": [["item_store_id"]]}],
    "item": [{"table_id": str(sales_table.id), "entities": [["item_store_id"], ["item_id"]]}],
    # ... add more configurations as needed
}

pipelines = []
for name, config in entity_configs.items():
    pipeline_id, task_id = launch_ideation(client, use_case_id, config, training_table_id, validation_table_id)
    pipelines.append({"name": name, "pipeline_id": pipeline_id, "task_id": task_id})

# Wait for all to complete
for p in pipelines:
    if p["task_id"]:
        wait_for_task(client, p["task_id"])

This lets you compare which entity configuration produces the best model, without running them sequentially.