Feature Selection¶
See also
Prerequisites
This page uses the client and wait_for_task helpers defined in API Overview.
Feature selection identifies the most impactful features from a feature ideation using SHAP-based analysis. This is typically done automatically by the ideation pipeline, but you can also run it standalone to create custom selections with different parameters.
Run Feature Selection¶
response = client.post(
"/feature_selection",
json={
"feature_ideation_id": feature_ideation_id,
"target_feature_count": 100,
"mode": "GENAI_BASED",
},
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
feature_selection_id = task.get("payload", {}).get("output_document_id")
print(f"Feature selection: {feature_selection_id}")
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
feature_ideation_id |
string | Yes | ID of the feature ideation to select from |
feature_selection_name |
string | No | Custom name for the selection |
mode |
string | No | Selection mode: "GENAI_BASED" (default) or "RULE_BASED" |
target_feature_count |
integer | No | Target number of features to select (default: 50, max: 500) |
use_relevance_score |
boolean | No | Use semantic relevance scores in selection (default: true) |
use_predictive_power_score |
boolean | No | Use predictive power scores in selection (default: true) |
remove_redundant_features |
boolean | No | Remove highly correlated features (default: true) |
remove_dictionary_and_vector |
boolean | No | Exclude dictionary and vector features (default: true) |
remove_low_added_value_features |
boolean | No | Remove features with low marginal value (default: true) |
keep_always_observation_features |
boolean | No | Always keep features from the observation table (default: true) |
rule |
object | No | Rule-based selection parameters (only when mode is "RULE_BASED") |
rule.top_n_overall |
integer | No | Maximum features overall (default: 100) |
rule.top_m_per_theme |
integer | No | Maximum features per theme (default: 5) |
rule.logic_operator |
string | No | "OR" (default) or "AND" — how to combine top_n and top_m |
observation_table_id |
string | No | Observation table for SHAP evaluation |
feature_ids |
array | No | Restrict selection to specific feature IDs |
Get Selection Results¶
response = client.get(f"/feature_selection/{feature_selection_id}")
selection = response.json()
print(f"Candidates: {selection['nb_candidates']}")
print(f"Selected: {selection['nb_selected']}")
Response fields:
| Field | Type | Description |
|---|---|---|
id |
string | Feature selection ID |
nb_candidates |
integer | Number of candidate features evaluated |
nb_selected |
integer | Number of features selected |
feature_ids |
array | IDs of selected features |
data |
array | Per-feature results with feature_name, selection_rank, selection_rationale |
signal_range |
string | Signal range description |
Create Feature List from Selection¶
Create a feature list containing the selected features: