Ideated Features¶

List Suggested Features¶

Retrieve all features generated by a feature ideation:

# Get the feature ideation ID from the pipeline
response = client.get(f"/pipeline/{pipeline_id}/feature_ideation")
feature_ideation_id = response.json().get("feature_ideation_id")

# List suggested features (paginated)
response = client.get(
    f"/catalog/feature_ideation/{feature_ideation_id}/suggested_features",
    params={"page": 1, "page_size": 50},
)
features = response.json()["data"]

for f in features[:5]:
    print(f"{f['feature_name']} (relevance: {f.get('relevance_score', 'N/A')})")

Parameters:

Parameter	Type	Required	Description
`page`	integer	No	Page number (default: 1)
`page_size`	integer	No	Items per page (default: 10)
`sort_by`	string	No	Field to sort by
`sort_dir`	string	No	Sort direction: `"asc"` or `"desc"`
`search`	string	No	Search by feature name
`readiness`	string	No	Filter by readiness: `"NEW"`, `"DRAFT"`, `"PRODUCTION_READY"`
`signal_type`	string	No	Filter by signal type (e.g., `"aggregate"`, `"transform"`)
`dtype`	string	No	Filter by data type
`suggested`	boolean	No	`true` for ideated features only, `false` for existing catalog features

Returns a paginated response. Each item in data contains:

Field	Type	Description
`id`	string	Feature ID
`feature_name`	string	Feature name
`feature_description`	string	Human-readable description of what the feature computes
`code`	string	Python SDK code to reproduce this feature
`relevance_score`	float	Semantic relevance score (higher = more relevant to the use case)
`relevance_explanation`	string	Why this feature is relevant
`predictive_power_score`	float	Predictive power score from EDA
`primary_entity`	array	Entity names this feature is computed for
`primary_table`	array	Source table names used
`signal_type`	string	Feature engineering type (e.g., `"aggregate"`, `"transform"`, `"lookup"`)
`inputs_description`	string	Description of input columns used
`filter_description`	string	Description of any filters applied
`readiness`	string	`"NEW"` for ideated features not yet saved to catalog
`complexity`	float	Feature engineering complexity score
`dtype`	string	Output data type (e.g., `"FLOAT"`, `"INT"`)
`feature_type`	string	`"numerical"` or `"categorical"`
`is_naive_prediction`	boolean	Whether this is a naive prediction baseline feature
`suggested`	boolean	`true` if generated by ideation

Get Feature SDK Code¶

The code field contains the Python SDK code to reproduce the feature:

# Get a specific feature's code
feature = features[0]
print(feature["code"])

Example output — a ratio feature comparing weekly averages on the same weekday to a longer-term naive baseline:

"""
SDK code to create ITEM_STORE_Avg_of_sales_amounts_28cD_same_Forecast_Weekday_
To_Naive_ITEM_STORE_Avg_of_sales_amounts_182cD

Feature description:
Ratio of the item-store's average sales amount on the same weekday (28 calendar days)
to the overall average sales amount (182 calendar days).
"""

import featurebyte as fb

# Activate catalog
catalog = fb.Catalog.activate("M5 Sales Amount Forecasting")

# Get view from SALES_AMOUNT time series table
sales_amount_view = catalog.get_view("SALES_AMOUNT")

# Extract day_of_week from the forecast point
context = fb.Context.get_by_id("69d5cf6d7791713392c73226")
forecast_day_of_week = context.get_forecast_point_feature().dt.day_of_week
forecast_day_of_week.name = "FORECAST_Day_Of_Week"

# Extract weekday from source table
sales_amount_view["Weekday"] = sales_amount_view["date"].dt.day_of_week

# Group by item_store, segmented by weekday
sales_amount_view_by_item_store = sales_amount_view.groupby(["item_store_id"])
sales_amount_view_by_item_store_by_weekday = sales_amount_view.groupby(
    ["item_store_id"], category="Weekday"
)

# Avg sales amount per weekday over 28 calendar days (dictionary feature)
avg_by_weekday_28cd = sales_amount_view_by_item_store_by_weekday.aggregate_over(
    "sales_amount",
    method="avg",
    feature_names=["ITEM_STORE_Avg_of_sales_amounts_by_Weekday_28cD"],
    windows=[fb.CalendarWindow(unit="DAY", size=28)],
)["ITEM_STORE_Avg_of_sales_amounts_by_Weekday_28cD"]

# Extract value matching the forecast weekday
avg_same_weekday_28cd = avg_by_weekday_28cd.cd.get_value(forecast_day_of_week)
avg_same_weekday_28cd.name = (
    "ITEM_STORE_Avg_of_sales_amounts_28cD_same_Forecast_Weekday"
)

# Avg sales amount over 182 calendar days (naive baseline)
avg_182cd = sales_amount_view_by_item_store.aggregate_over(
    "sales_amount",
    method="avg",
    feature_names=["ITEM_STORE_Avg_of_sales_amounts_182cD"],
    windows=[fb.CalendarWindow(unit="DAY", size=182)],
)["ITEM_STORE_Avg_of_sales_amounts_182cD"]

# Ratio: weekday-specific average / overall average
ratio = avg_same_weekday_28cd / avg_182cd
ratio.name = "ITEM_STORE_Avg_of_sales_amounts_28cD_same_Forecast_Weekday_To_Naive_ITEM_STORE_Avg_of_sales_amounts_182cD"

# Save and describe
ratio.save()
ratio.update_feature_type("numeric")
ratio.update_description(
    "Ratio of the item-store's average sales amount on the same weekday (28 calendar days) "
    "to the overall average sales amount (182 calendar days)."
)

This code can be executed directly in a Python environment with the FeatureByte SDK to recreate the feature, or modified to create variants (e.g., different windows, aggregation methods, or entity groupings).

Get Full Feature Metadata and Lineage¶

For a deeper understanding of how a feature is constructed, use the metadata endpoint:

suggested_feature_id = features[0]["id"]

response = client.get(
    f"/catalog/feature_ideation/suggested_feature_metadata/{suggested_feature_id}",
)
metadata = response.json()

Response fields:

Field	Type	Description
`sdk_code`	string	Formatted Python SDK code to reproduce the feature
`lineage`	object	Complete construction graph (see below)

Feature Lineage¶

The lineage object shows the step-by-step construction of the feature as a directed graph:

Field	Type	Description
`lineage.nodes`	array	Ordered list of operations in the feature construction
`lineage.edges`	array	Data flow connections between nodes
`lineage.inputs`	array	Input columns from source tables

Each node in lineage.nodes contains:

Field	Type	Description
`node_id`	string	Unique node identifier
`is_input`	boolean	Whether this node represents a source column
`node_type`	string	Operation type (e.g., `"groupby"`, `"aggregate"`, `"filter"`, `"lookup"`)
`title`	string	Human-readable operation title
`description`	string	What this operation does
`code`	string	Python code for this specific step
`output_type`	string	Output type of this operation
`output_name`	string	Name of the output variable

This is useful for understanding complex features that chain multiple operations (e.g., a ratio of two aggregations with a filter applied).

Preview Feature Values¶

Preview computed values for a feature on a sample observation set:

response = client.get(
    f"/catalog/feature_ideation/feature_preview/{suggested_feature_id}",
)
preview = response.json()

Example: Inspect and Modify a Feature¶

# 1. Find top features by relevance
response = client.get(
    f"/catalog/feature_ideation/{feature_ideation_id}/suggested_features",
    params={"page_size": 10, "sort_by": "relevance_score", "sort_dir": "desc"},
)
top_features = response.json()["data"]

# 2. Get the SDK code for the best feature
best = top_features[0]
print(f"Feature: {best['feature_name']}")
print(f"Relevance: {best['relevance_score']}")
print(f"Description: {best['feature_description']}")
print(f"\nCode:\n{best['code']}")

# 3. Get the full lineage to understand construction steps
response = client.get(
    f"/catalog/feature_ideation/suggested_feature_metadata/{best['id']}",
)
metadata = response.json()

lineage = metadata.get("lineage", {})
for node in lineage.get("nodes", []):
    if not node.get("is_input"):
        print(f"\nStep: {node['title']}")
        print(f"  {node['description']}")
        print(f"  Code: {node['code']}")