Evaluation¶

Validation Leaderboard¶

The validation leaderboard ranks all models trained with a validation set. Models are automatically added to the leaderboard when they are trained with a validation observation table.

Find the Leaderboard¶

client = fb.Configurations().get_client()

response = client.get(
    "/catalog/leaderboard",
    params={
        "observation_table_id": validation_table_id,
        "observation_table_purpose": "validation",
        "role": "OUTCOME",
    },
)
leaderboard = response.json()["data"][0]
leaderboard_id = leaderboard["_id"]
primary_metric = leaderboard["primary_metric"]
sort_dir = leaderboard.get("sort_order", "desc")

print(f"Leaderboard: {leaderboard['name']}, metric: {primary_metric}")

Leaderboard query parameters:

Parameter	Type	Description
`observation_table_id`	string	Filter by associated observation table
`observation_table_purpose`	string	Filter by purpose: `"validation"`, `"training"`, `"holdout"`
`role`	string	Leaderboard role: `"OUTCOME"`

Leaderboard response fields:

Field	Type	Description
`id`	string	Leaderboard ID
`name`	string	Leaderboard name
`primary_metric`	string	Metric used for ranking
`sort_order`	string	`"asc"` (lower is better) or `"desc"` (higher is better)
`evaluation_metrics`	array	List of metrics computed

List Models in the Leaderboard¶

Use GET /catalog/ml_model with the leaderboard_id to list models sorted by metric:

response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": primary_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "show_refits": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

# Best model is first (already sorted by metric)
best_model_id = models[0]["_id"]

Query parameters:

Parameter	Type	Description
`leaderboard_id`	string	Filter to models in this leaderboard
`sort_by`	string	Metric name to sort by (e.g., `"auc"`, `"rmse"`, `"gini_norm"`)
`sort_dir`	string	`"asc"` or `"desc"`
`sort_by_metric`	boolean	Must be `true` to sort by evaluation metric name
`show_refits`	boolean	Include refit models (default: `false`)
`leaderboard_role`	string	`"OUTCOME"`

Model response fields (each item in data):

Field	Type	Description
`id`	string	Model ID
`name`	string	Model name
`evaluation_scores`	array	List of `{metric_name, score}` pairs
`feature_list_id`	string	Feature list used
`model_template_type`	string	Template type (e.g., `"LIGHTGBM"`)
`is_pipeline_generated`	boolean	Whether the model was created by a pipeline

Holdout Leaderboard¶

A holdout leaderboard is created automatically when predictions are generated on a holdout observation table that has a target. Generate predictions to trigger it, then retrieve the leaderboard.

response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": holdout_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

View Holdout Leaderboard Results¶

List models in the holdout leaderboard sorted by the primary metric:

response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": "rmse",
        "sort_dir": "asc",
        "sort_by_metric": True,
        "show_refits": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

Preview Leaderboard Predictions¶

Inspect the predictions a specific model made on the holdout set:

response = client.post(
    f"/leaderboard/{leaderboard_id}/ml_model/{ml_model_id}/preview",
)
preview = response.json()

Response fields:

Field	Type	Description
`columns`	array	Column names in the preview
`data`	array	Rows of prediction data (entity IDs, point in time, predicted values, actuals)

Evaluation Plots¶

The API generates interactive Bokeh plots for model evaluation. The endpoint returns self-contained HTML with embedded JavaScript — no external dependencies needed.

Get Available Plot Options¶

The available plot types depend on the model type:

response = client.request("OPTIONS", f"/ml_model/{ml_model_id}/evaluate")
options = response.json()

Response fields:

Field	Type	Description
`options`	array	Available plot types for this model (e.g., `"distribution"`, `"roc_curve"`)
`holdout_tables`	array	Observation tables available for evaluation, each with `table_type`, `table_id`, and `table_name`

Regression models:

distribution — predicted vs actual distributions
predicted_vs_actual — scatter plot of predicted vs actual values
predicted_vs_actual_per_bin — binned predicted vs actual

Binary classification models:

roc_curve — ROC curve with AUC
precision_recall_curve — precision-recall tradeoff
ks_and_gain_curve — KS statistic and gain curve
lift_curve — lift chart
gain_report — cumulative gain report
predicted_vs_actual_per_bin — calibration plot
distribution — score distributions
confusion_matrix — confusion matrix

Uplift models:

incremental_uplift — incremental uplift curve
qini_curve — Qini coefficient curve
gain_report — uplift gain report
predicted_vs_actual_per_bin — binned uplift
distribution — uplift score distributions

The response also includes holdout_tables, a list of observation tables available for evaluation.

Create an Evaluation Plot¶

response = client.post(
    f"/ml_model/{ml_model_id}/evaluate",
    json={
        "option": "predicted_vs_actual",
        "plot_params": {
            "height": 500,
            "width": 1000,
            "font_size": 16,
        },
        "holdout_table": {
            "table_type": "observation_table",
            "table_id": validation_table_id,
        },
    },
)
html_content = response.json()["content"]

Parameters:

Parameter	Type	Required	Description
`option`	string	Yes	Plot type (see options list above)
`plot_params`	object	No	Plot sizing configuration
`plot_params.height`	integer	No	Plot height in pixels (default: 500)
`plot_params.width`	integer	No	Plot width in pixels (default: 1000)
`plot_params.font_size`	integer	No	Font size in pixels (default: 16)
`holdout_table`	object	No	Observation table to evaluate against
`holdout_table.table_type`	string	Yes	Must be `"observation_table"`
`holdout_table.table_id`	string	Yes	ID of the observation table

The content field contains a self-contained Bokeh HTML document. See Displaying Plots for how to render it in Jupyter, save as HTML, or embed in a web application.

Forecast Comparison¶

For forecast use cases, the API can generate interactive plots comparing predictions vs actual target values across forecast points. The plot shows one prediction line per point-in-time, with an optional target (actual) line overlay.

List Available Entities¶

Before creating a forecast comparison, retrieve the distinct entity values in the prediction table:

# Submit entity extraction task (first time only)
response = client.post(f"/prediction_table/{prediction_table_id}/prediction_entities")
task_id = response.json()["id"]
wait_for_task(client, task_id)

# Get available entity values
response = client.get(f"/prediction_table/{prediction_table_id}/prediction_entities")
entities = response.json()

# columns: entity column names (serving names)
# data: distinct entity value combinations
print(f"Entity columns: {entities['entity_data']['columns']}")
for row in entities["entity_data"]["data"][:5]:
    print(f"  {row}")

Response fields:

Field	Type	Description
`prediction_table_id`	string	Prediction table ID
`entity_data.columns`	array	Entity column names (serving names)
`entity_data.data`	array	Distinct entity value combinations (each row is a list of values matching the columns)

Create a Forecast Comparison Plot¶

response = client.post(
    f"/prediction_table/{prediction_table_id}/forecast_comparison",
    json={
        "entity_filter": {
            "item_store_id": "FOODS_3_001_CA_1",
        },
        "plot_params": {
            "height": 500,
            "width": 1000,
            "font_size": 16,
        },
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Parameters:

Parameter	Type	Required	Description
`entity_filter`	object	Yes	Key-value pairs mapping entity column names to values (e.g., `{"item_store_id": "FOODS_3_001_CA_1"}`)
`plot_params`	object	No	Plot sizing configuration
`plot_params.height`	integer	No	Plot height in pixels (default: 500)
`plot_params.width`	integer	No	Plot width in pixels (default: 1000)
`plot_params.font_size`	integer	No	Font size in pixels (default: 16)

The entity_filter specifies which entity to plot (e.g., a specific item-store combination). The plot is generated asynchronously.

Retrieve the Plot¶

# List forecast comparisons for a prediction table
response = client.get(
    f"/prediction_table/{prediction_table_id}/forecast_comparison",
)
comparisons = response.json()["data"]

# Get a specific forecast comparison result
forecast_comparison_id = comparisons[0]["id"]
response = client.get(
    f"/prediction_table/{prediction_table_id}/forecast_comparison/{forecast_comparison_id}",
)
html_content = response.json()["content"]

The content field contains a self-contained Bokeh HTML plot, just like evaluation plots. It includes:

Target line (grey) — actual values, when the observation table has a target
Prediction lines (colored) — one line per point-in-time, showing how the model predicted each forecast point
Interactive widgets — "From" and "To" point-in-time selectors to filter which prediction lines are visible

See Displaying Plots for how to render the plot in Jupyter, save as HTML, or embed in a web application.