Feature EDA¶

From Feature IDs¶

import featurebyte as fb

client = fb.Configurations().get_client()

response = client.post(
    "/eda",
    json={
        "feature_ids": [feature_id_1, feature_id_2],
        "use_case_id": use_case_id,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

eda_id = task.get("payload", {}).get("output_document_id")

From a Feature Ideation¶

response = client.post(
    "/eda",
    json={
        "feature_ideation_id": feature_ideation_id,
        "use_case_id": use_case_id,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

eda_id = task.get("payload", {}).get("output_document_id")

From a Feature List¶

response = client.post(
    "/eda",
    json={
        "feature_list_id": feature_list_id,
        "use_case_id": use_case_id,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

eda_id = task.get("payload", {}).get("output_document_id")

Parameters:

Parameter	Type	Required	Description
`use_case_id`	string	Yes	ID of the use case providing context for EDA
`feature_ids`	array	One of	List of feature IDs to analyze
`feature_ideation_id`	string	One of	ID of a feature ideation to analyze all its features
`feature_list_id`	string	One of	ID of a feature list to analyze all its features
`naive_prediction`	object	No	Include to trigger Residual EDA (see Residual EDA)
`naive_prediction.feature_id`	string	—	ID of the naive prediction feature
`naive_prediction.structure`	string	—	`"additive"` (residuals: target - naive) or `"multiplicative"` (ratios: target / naive)
`overwrite`	boolean	No	Overwrite existing EDA results (default: `false`)

Get Feature EDA Details¶

Retrieve full EDA results for a specific feature, including power scores and analysis metadata:

response = client.get(f"/eda/{feature_eda_id}")
eda = response.json()

print(f"Feature: {eda.get('feature_id')}")
print(f"Predictive power score: {eda.get('predictive_power_score')}")
print(f"Feature type: {eda.get('feature_type')}")
print(f"Target type: {eda.get('target_type')}")
print(f"Target categories: {eda.get('target_categories')}")

Response fields:

Field	Type	Description
`id`	string	Feature EDA ID
`feature_id`	string	Associated feature ID
`feature_type`	string	Feature type: `"numerical"`, `"categorical"`, `"text"`, `"dict"`, `"embedding"`
`target_type`	string	Target type: `"REGRESSION"`, `"BINARY_CLASSIFICATION"`, etc.
`predictive_power_score`	float	Predictive power score (higher = more predictive)
`null_power_score`	float	Power score from null values alone
`no_bucket_power_score`	float	Power score without bucketing
`target_categories`	array	Available target categories (for classification)
`feature_categories`	array	Available feature categories (dictionary keys or embedding dimensions to filter by)
`feature_source`	string	Source of the feature (e.g., `"CATALOG"`)
`definition_hash`	string	Hash of the feature definition
`version`	object	Feature version identifier
`error_reason`	string	Error description if the analysis failed
`plots`	array	Plot data objects, each containing an `info` field with summary statistics (see below)
`use_case_id`	string	Associated use case ID
`context_id`	string	Associated context ID
`observation_table_id`	string	Observation table used for EDA

Summary Statistics¶

Each item in plots contains an info object with distribution and target statistics:

eda = client.get(f"/eda/{feature_eda_id}").json()

for plot in eda.get("plots", []):
    info = plot.get("info", {})
    print(f"Count: {info.get('count')}, Unique: {info.get('unique')}")
    print(f"Mean: {info.get('mean')}, Std: {info.get('stddev')}")
    print(f"Min: {info.get('min_val')}, Max: {info.get('max_val')}")
    print(f"Missing: {info.get('num_missing')}, Zeros: {info.get('num_zeros')}")
    print(f"Target mean (non-missing): {info.get('target_mean_non_missing')}")

info fields (numeric features):

Field	Type	Description
`count`	integer	Total number of observations
`unique`	integer	Number of distinct values
`mean`	float	Mean value
`stddev`	float	Standard deviation
`min_val`	float	Minimum value
`max_val`	float	Maximum value
`q01`, `q05`, `q10`, `q25`, `q50`, `q75`, `q90`, `q95`, `q99`	float	Percentiles
`num_missing`	integer	Number of missing values
`num_non_missing`	integer	Number of non-missing values
`num_zeros`	integer	Number of zero values
`num_non_zeros`	integer	Number of non-zero values
`pct_zeros`	float	Percentage of zeros
`num_lower_outliers`	integer	Number of lower outliers
`num_upper_outliers`	integer	Number of upper outliers
`target_mean_non_missing`	float	Mean target value for non-missing feature values
`target_mean_missing`	float	Mean target value for missing feature values
`target_mean_zeros`	float	Mean target value for zero feature values
`target_mean_non_zeros`	float	Mean target value for non-zero feature values
`target_mean_lower_outliers`	float	Mean target value for lower outlier feature values
`target_mean_upper_outliers`	float	Mean target value for upper outlier feature values

Get Plot Options¶

Check what plot options are available for a feature (e.g., target categories for classification, or dictionary/embedding keys):

response = client.request(
    "OPTIONS",
    f"/eda/{feature_id}/plots",
    params={"use_case_id": use_case_id},
)
options = response.json()

Parameters:

Parameter	Type	Required	Description
`use_case_id`	string	No	ID of the use case
`context_id`	string	No	ID of the context (alternative to `use_case_id`)
`feature_table_id`	string	No	ID of a feature table

Response fields:

Field	Type	Description
`target_categories`	array	Available target categories (classification use cases)
`feature_categories`	array	Available feature categories (dictionary and embedding features — keys/dimensions to filter by)

Get EDA Plots¶

Retrieve rendered plots for a specific feature. The feature_id must be one of the features included in the EDA.

response = client.get(
    f"/eda/{feature_id}/plots",
    params={
        "use_case_id": use_case_id,
        "height": 500,
        "width": 1000,
        "font_size": 16,
        "output_format": "html",
    },
)
plots = response.json()

Parameters:

Parameter	Type	Required	Description
`use_case_id`	string	No	ID of the use case
`context_id`	string	No	ID of the context (alternative to `use_case_id`)
`target_category`	any	No	Filter by target category (from OPTIONS response)
`feature_category`	any	No	Filter by dictionary key or embedding dimension (from OPTIONS response)
`height`	integer	No	Plot height in pixels (default: 500)
`width`	integer	No	Plot width in pixels (default: 1000)
`font_size`	integer	No	Font size in pixels (default: 16)
`output_format`	string	No	`"html"` (default) or `"json"`

Response fields (each item):

Field	Type	Description
`plot_type`	string	Type of plot (e.g., distribution, feature vs target)
`plots`	array	List of rendered plot objects with `content` (HTML string)

The response is a plot list. See Displaying Plots for how to render plots in Jupyter, save as HTML, or embed in a web application.