Feature Refinement¶
See also
UI Tutorial: Refine Ideation | UI Tutorial: Create New Feature Lists and Models | Concepts: Feature Selection | API Tutorial: Credit Default — Step 10
Prerequisites
This page uses the client and wait_for_task helpers defined in API Overview.
After an ideation pipeline completes, you can extract the most important features from the trained model and create a refined feature list. This dimensionality reduction step typically improves model performance by removing noise features.
Get the Best Model from a Pipeline¶
Retrieve the model trained during ideation:
client = fb.Configurations().get_client()
# Find pipelines for your use case
response = client.get("/pipeline", params={"use_case_id": use_case_id})
pipeline = response.json()["data"][0] # most recent
pipeline_id = pipeline["_id"]
# Get model IDs from the model-train step
response = client.get(f"/pipeline/{pipeline_id}")
pipeline_data = response.json()
ml_model_id = None
for group in pipeline_data["groups"]:
for step in group["steps"]:
if step["step_type"] == "model-train" and step.get("ml_model_ids"):
ml_model_id = step["ml_model_ids"][0]
break
Create a Feature List from Key Importance¶
Extract the top features by importance from the ideation model:
response = client.post(
"/feature_list_from_model",
json={
"mode": "Feature key importance based",
"ml_model_id": ml_model_id,
"top_n": 200,
"importance_threshold_percentage": 0.90,
},
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
mode |
string | Yes | Selection mode: "Feature key importance based" or "Feature importance based" |
ml_model_id |
string | Yes | ID of the trained model to extract features from |
top_n |
integer | No | Maximum number of feature keys to select (default: 200, max: 500) |
importance_threshold_percentage |
float | No | Cumulative importance threshold between 0 and 1 (default: 0.90) |
feature_list_name |
string | No | Custom name for the generated feature list |
The endpoint selects features until either top_n is reached or the cumulative importance exceeds the threshold, whichever comes first. The "Feature key importance based" mode unbundles dictionary features and extracts the most important keys, creating a single feature for each selected key. The "Feature importance based" mode keeps dictionary features as-is and selects at the individual feature level.
Inspect the Refined Feature List¶
feature_list_from_model_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/feature_list_from_model/{feature_list_from_model_id}")
result = response.json()
feature_list_id = result["feature_list_id"]
feature_keys_count = result["feature_keys_created_count"]
# Get full feature list details
response = client.get(f"/feature_list/{feature_list_id}")
feature_list = response.json()
print(f"Features: {len(feature_list['feature_ids'])}")
Feature list from model response fields (GET /feature_list_from_model/{id}):
| Field | Type | Description |
|---|---|---|
id |
string | Feature list from model ID |
mode |
string | Selection mode used |
ml_model_id |
string | Source model ID |
top_n |
integer | Maximum features requested |
importance_threshold_percentage |
float | Cumulative importance threshold used |
feature_keys_created_count |
integer | Number of feature keys selected |
features_selected_count |
integer | Number of individual features created |
feature_list_id |
string | ID of the generated feature list |
Feature list response fields (GET /feature_list/{id}):
| Field | Type | Description |
|---|---|---|
id |
string | Feature list ID |
name |
string | Feature list name |
feature_ids |
array | List of feature IDs in this list |
Adding custom features
To augment a feature list with additional features (e.g., SDK-created features), use the SDK's FeatureList API. See the SDK Reference for details.