Skip to content

Evaluation

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

This page covers how to evaluate models using the validation leaderboard, holdout leaderboard, and evaluation plots.

Validation Leaderboard

The validation leaderboard ranks all models trained with a validation set. Models are automatically added to the leaderboard when they are trained with a validation observation table.

Find the Leaderboard

client = fb.Configurations().get_client()

response = client.get(
    "/catalog/leaderboard",
    params={
        "observation_table_id": validation_table_id,
        "observation_table_purpose": "validation",
        "role": "OUTCOME",
    },
)
leaderboard = response.json()["data"][0]
leaderboard_id = leaderboard["_id"]
primary_metric = leaderboard["primary_metric"]
sort_dir = leaderboard.get("sort_order", "desc")

print(f"Leaderboard: {leaderboard['name']}, metric: {primary_metric}")

Leaderboard query parameters:

Parameter Type Description
observation_table_id string Filter by associated observation table
observation_table_purpose string Filter by purpose: "validation", "training", "holdout"
role string Leaderboard role: "OUTCOME"

Leaderboard response fields:

Field Type Description
id string Leaderboard ID
name string Leaderboard name
primary_metric string Metric used for ranking
sort_order string "asc" (lower is better) or "desc" (higher is better)
evaluation_metrics array List of metrics computed

List Models in the Leaderboard

Use GET /catalog/ml_model with the leaderboard_id to list models sorted by metric:

response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": primary_metric,
        "sort_dir": sort_dir,
        "sort_by_metric": True,
        "show_refits": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

# Best model is first (already sorted by metric)
best_model_id = models[0]["_id"]

Query parameters:

Parameter Type Description
leaderboard_id string Filter to models in this leaderboard
sort_by string Metric name to sort by (e.g., "auc", "rmse", "gini_norm")
sort_dir string "asc" or "desc"
sort_by_metric boolean Must be true to sort by evaluation metric name
show_refits boolean Include refit models (default: false)
leaderboard_role string "OUTCOME"

Model response fields (each item in data):

Field Type Description
id string Model ID
name string Model name
evaluation_scores array List of {metric_name, score} pairs
feature_list_id string Feature list used
model_template_type string Template type (e.g., "LIGHTGBM")
is_pipeline_generated boolean Whether the model was created by a pipeline

Holdout Leaderboard

A holdout leaderboard is created automatically when predictions are generated on a holdout observation table that has a target. Generate predictions to trigger it, then retrieve the leaderboard.

response = client.post(
    f"/ml_model/{ml_model_id}/prediction_table",
    json={
        "request_input": {
            "request_type": "observation_table",
            "table_id": holdout_table_id,
        },
        "include_input_features": False,
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

View Holdout Leaderboard Results

List models in the holdout leaderboard sorted by the primary metric:

response = client.get(
    "/catalog/ml_model",
    params={
        "leaderboard_id": leaderboard_id,
        "sort_by": "rmse",
        "sort_dir": "asc",
        "sort_by_metric": True,
        "show_refits": True,
        "leaderboard_role": "OUTCOME",
        "page_size": 100,
    },
)
models = response.json()["data"]

for m in models:
    scores = {s["metric_name"]: round(s["score"], 4) for s in m.get("evaluation_scores", []) if s.get("score") is not None}
    print(f"  {m['name']}: {scores}")

Preview Leaderboard Predictions

Inspect the predictions a specific model made on the holdout set:

response = client.post(
    f"/leaderboard/{leaderboard_id}/ml_model/{ml_model_id}/preview",
)
preview = response.json()

Response fields:

Field Type Description
columns array Column names in the preview
data array Rows of prediction data (entity IDs, point in time, predicted values, actuals)

Evaluation Plots

The API generates interactive Bokeh plots for model evaluation. The endpoint returns self-contained HTML with embedded JavaScript — no external dependencies needed.

Get Available Plot Options

The available plot types depend on the model type:

response = client.request("OPTIONS", f"/ml_model/{ml_model_id}/evaluate")
options = response.json()

Response fields:

Field Type Description
options array Available plot types for this model (e.g., "distribution", "roc_curve")
holdout_tables array Observation tables available for evaluation, each with table_type, table_id, and table_name

Regression models:

  • distribution — predicted vs actual distributions
  • predicted_vs_actual — scatter plot of predicted vs actual values
  • predicted_vs_actual_per_bin — binned predicted vs actual

Binary classification models:

  • roc_curve — ROC curve with AUC
  • precision_recall_curve — precision-recall tradeoff
  • ks_and_gain_curve — KS statistic and gain curve
  • lift_curve — lift chart
  • gain_report — cumulative gain report
  • predicted_vs_actual_per_bin — calibration plot
  • distribution — score distributions
  • confusion_matrix — confusion matrix

Uplift models:

  • incremental_uplift — incremental uplift curve
  • qini_curve — Qini coefficient curve
  • gain_report — uplift gain report
  • predicted_vs_actual_per_bin — binned uplift
  • distribution — uplift score distributions

The response also includes holdout_tables, a list of observation tables available for evaluation.

Create an Evaluation Plot

response = client.post(
    f"/ml_model/{ml_model_id}/evaluate",
    json={
        "option": "predicted_vs_actual",
        "plot_params": {
            "height": 500,
            "width": 1000,
            "font_size": 16,
        },
        "holdout_table": {
            "table_type": "observation_table",
            "table_id": validation_table_id,
        },
    },
)
html_content = response.json()["content"]

Parameters:

Parameter Type Required Description
option string Yes Plot type (see options list above)
plot_params object No Plot sizing configuration
plot_params.height integer No Plot height in pixels (default: 500)
plot_params.width integer No Plot width in pixels (default: 1000)
plot_params.font_size integer No Font size in pixels (default: 16)
holdout_table object No Observation table to evaluate against
holdout_table.table_type string Yes Must be "observation_table"
holdout_table.table_id string Yes ID of the observation table

The content field contains a self-contained Bokeh HTML document. See Displaying Plots for how to render it in Jupyter, save as HTML, or embed in a web application.

Forecast Comparison

For forecast use cases, the API can generate interactive plots comparing predictions vs actual target values across forecast points. The plot shows one prediction line per point-in-time, with an optional target (actual) line overlay.

List Available Entities

Before creating a forecast comparison, retrieve the distinct entity values in the prediction table:

# Submit entity extraction task (first time only)
response = client.post(f"/prediction_table/{prediction_table_id}/prediction_entities")
task_id = response.json()["id"]
wait_for_task(client, task_id)

# Get available entity values
response = client.get(f"/prediction_table/{prediction_table_id}/prediction_entities")
entities = response.json()

# columns: entity column names (serving names)
# data: distinct entity value combinations
print(f"Entity columns: {entities['entity_data']['columns']}")
for row in entities["entity_data"]["data"][:5]:
    print(f"  {row}")

Response fields:

Field Type Description
prediction_table_id string Prediction table ID
entity_data.columns array Entity column names (serving names)
entity_data.data array Distinct entity value combinations (each row is a list of values matching the columns)

Create a Forecast Comparison Plot

response = client.post(
    f"/prediction_table/{prediction_table_id}/forecast_comparison",
    json={
        "entity_filter": {
            "item_store_id": "FOODS_3_001_CA_1",
        },
        "plot_params": {
            "height": 500,
            "width": 1000,
            "font_size": 16,
        },
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Parameters:

Parameter Type Required Description
entity_filter object Yes Key-value pairs mapping entity column names to values (e.g., {"item_store_id": "FOODS_3_001_CA_1"})
plot_params object No Plot sizing configuration
plot_params.height integer No Plot height in pixels (default: 500)
plot_params.width integer No Plot width in pixels (default: 1000)
plot_params.font_size integer No Font size in pixels (default: 16)

The entity_filter specifies which entity to plot (e.g., a specific item-store combination). The plot is generated asynchronously.

Retrieve the Plot

# List forecast comparisons for a prediction table
response = client.get(
    f"/prediction_table/{prediction_table_id}/forecast_comparison",
)
comparisons = response.json()["data"]

# Get a specific forecast comparison result
forecast_comparison_id = comparisons[0]["id"]
response = client.get(
    f"/prediction_table/{prediction_table_id}/forecast_comparison/{forecast_comparison_id}",
)
html_content = response.json()["content"]

The content field contains a self-contained Bokeh HTML plot, just like evaluation plots. It includes:

  • Target line (grey) — actual values, when the observation table has a target
  • Prediction lines (colored) — one line per point-in-time, showing how the model predicted each forecast point
  • Interactive widgets — "From" and "To" point-in-time selectors to filter which prediction lines are visible

See Displaying Plots for how to render the plot in Jupyter, save as HTML, or embed in a web application.