Skip to content

Feature Refinement

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

After an ideation pipeline completes, you can extract the most important features from the trained model and create a refined feature list. This dimensionality reduction step typically improves model performance by removing noise features.

Get the Best Model from a Pipeline

Retrieve the model trained during ideation:

client = fb.Configurations().get_client()

# Find pipelines for your use case
response = client.get("/pipeline", params={"use_case_id": use_case_id})
pipeline = response.json()["data"][0]  # most recent
pipeline_id = pipeline["_id"]

# Get model IDs from the model-train step
response = client.get(f"/pipeline/{pipeline_id}")
pipeline_data = response.json()

ml_model_id = None
for group in pipeline_data["groups"]:
    for step in group["steps"]:
        if step["step_type"] == "model-train" and step.get("ml_model_ids"):
            ml_model_id = step["ml_model_ids"][0]
            break

Create a Feature List from Key Importance

Extract the top features by importance from the ideation model:

response = client.post(
    "/feature_list_from_model",
    json={
        "mode": "Feature key importance based",
        "ml_model_id": ml_model_id,
        "top_n": 200,
        "importance_threshold_percentage": 0.90,
    },
)

task_id = response.json()["id"]
task = wait_for_task(client, task_id)

Parameters:

Parameter Type Required Description
mode string Yes Selection mode: "Feature key importance based" or "Feature importance based"
ml_model_id string Yes ID of the trained model to extract features from
top_n integer No Maximum number of feature keys to select (default: 200, max: 500)
importance_threshold_percentage float No Cumulative importance threshold between 0 and 1 (default: 0.90)
feature_list_name string No Custom name for the generated feature list

The endpoint selects features until either top_n is reached or the cumulative importance exceeds the threshold, whichever comes first. The "Feature key importance based" mode unbundles dictionary features and extracts the most important keys, creating a single feature for each selected key. The "Feature importance based" mode keeps dictionary features as-is and selects at the individual feature level.

Inspect the Refined Feature List

feature_list_from_model_id = task.get("payload", {}).get("output_document_id")

response = client.get(f"/feature_list_from_model/{feature_list_from_model_id}")
result = response.json()

feature_list_id = result["feature_list_id"]
feature_keys_count = result["feature_keys_created_count"]

# Get full feature list details
response = client.get(f"/feature_list/{feature_list_id}")
feature_list = response.json()
print(f"Features: {len(feature_list['feature_ids'])}")

Feature list from model response fields (GET /feature_list_from_model/{id}):

Field Type Description
id string Feature list from model ID
mode string Selection mode used
ml_model_id string Source model ID
top_n integer Maximum features requested
importance_threshold_percentage float Cumulative importance threshold used
feature_keys_created_count integer Number of feature keys selected
features_selected_count integer Number of individual features created
feature_list_id string ID of the generated feature list

Feature list response fields (GET /feature_list/{id}):

Field Type Description
id string Feature list ID
name string Feature list name
feature_ids array List of feature IDs in this list

Adding custom features

To augment a feature list with additional features (e.g., SDK-created features), use the SDK's FeatureList API. See the SDK Reference for details.