Skip to content

Feature Selection

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

Feature selection identifies the most impactful features from a feature ideation using SHAP-based analysis. This is typically done automatically by the ideation pipeline, but you can also run it standalone to create custom selections with different parameters.

Run Feature Selection

response = client.post(
    "/feature_selection",
    json={
        "feature_ideation_id": feature_ideation_id,
        "target_feature_count": 100,
        "mode": "GENAI_BASED",
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

feature_selection_id = task.get("payload", {}).get("output_document_id")
print(f"Feature selection: {feature_selection_id}")

Parameters:

Parameter Type Required Description
feature_ideation_id string Yes ID of the feature ideation to select from
feature_selection_name string No Custom name for the selection
mode string No Selection mode: "GENAI_BASED" (default) or "RULE_BASED"
target_feature_count integer No Target number of features to select (default: 50, max: 500)
use_relevance_score boolean No Use semantic relevance scores in selection (default: true)
use_predictive_power_score boolean No Use predictive power scores in selection (default: true)
remove_redundant_features boolean No Remove highly correlated features (default: true)
remove_dictionary_and_vector boolean No Exclude dictionary and vector features (default: true)
remove_low_added_value_features boolean No Remove features with low marginal value (default: true)
keep_always_observation_features boolean No Always keep features from the observation table (default: true)
rule object No Rule-based selection parameters (only when mode is "RULE_BASED")
rule.top_n_overall integer No Maximum features overall (default: 100)
rule.top_m_per_theme integer No Maximum features per theme (default: 5)
rule.logic_operator string No "OR" (default) or "AND" — how to combine top_n and top_m
observation_table_id string No Observation table for SHAP evaluation
feature_ids array No Restrict selection to specific feature IDs

Get Selection Results

response = client.get(f"/feature_selection/{feature_selection_id}")
selection = response.json()

print(f"Candidates: {selection['nb_candidates']}")
print(f"Selected: {selection['nb_selected']}")

Response fields:

Field Type Description
id string Feature selection ID
nb_candidates integer Number of candidate features evaluated
nb_selected integer Number of features selected
feature_ids array IDs of selected features
data array Per-feature results with feature_name, selection_rank, selection_rationale
signal_range string Signal range description

Create Feature List from Selection

Create a feature list containing the selected features:

response = client.post(f"/feature_selection/{feature_selection_id}/feature_list")
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

feature_list_id = task.get("payload", {}).get("output_document_id")