Skip to content

Model Training

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

This page covers how to train models using the FeatureByte API, including resolving model templates, configuring parameters, and training on different observation tables.

Resolve Model Template and Settings

Before training, you need to determine the appropriate objective, metric, and model template. The API provides endpoints to suggest these automatically.

Step 1: Get Suggested Settings

client = fb.Configurations().get_client()

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template_setting",
    params={
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
        "feature_list_id": feature_list_id,  # or use feature_selection_id instead
        "machine_learning_role": "OUTCOME",
    },
)
settings = response.json()
# Returns: objective, metric, calibration_method, use_naive_as_offset

Parameters:

Parameter Type Required Description
training_table_id string Yes ID of the training observation table
validation_table_id string No ID of the validation observation table
feature_list_id string One of ID of the feature list to train with
feature_selection_id string One of ID of a feature selection (alternative to feature_list_id)
machine_learning_role string Yes Role of the model: "OUTCOME"

Response fields:

Field Type Description
objective string Suggested optimization objective (e.g., "regression", "binary")
metric string Suggested evaluation metric (e.g., "rmse", "auc")
calibration_method string Suggested calibration method, if any
use_naive_as_offset boolean Whether to use naive prediction as offset (for forecast use cases)

Step 2: Get Available Templates

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template",
    params={
        "feature_list_id": feature_list_id,  # or use feature_selection_id instead
        "training_table_id": training_table_id,
        "objective": settings["objective"],
        "metric": settings["metric"],
        "machine_learning_role": "OUTCOME",
    },
)
templates = response.json()["data"]

Parameters:

Parameter Type Required Description
feature_list_id string One of ID of the feature list
feature_selection_id string One of ID of a feature selection (alternative to feature_list_id)
training_table_id string No ID of the training observation table
objective string Yes Optimization objective (from Step 1)
metric string Yes Evaluation metric (from Step 1)
machine_learning_role string Yes Role of the model: "OUTCOME"

Each template in the response includes:

Field Type Description
type string Template type (e.g., "LIGHTGBM")
id string Template ID
preprocessors array Preprocessor nodes with node_name and parameters_metadata
model object Model node with node_name and parameters_metadata

Each entry in parameters_metadata contains name, default_value, type, and description.

Step 3: Extract Default Parameters

Each template contains preprocessor and model parameters with default values:

template = templates[0]  # or select a specific template by type

node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
    params = {
        p["name"]: p["default_value"]
        for p in preprocessor.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[preprocessor["node_name"]] = params

model_info = template.get("model", {})
if model_info:
    params = {
        p["name"]: p["default_value"]
        for p in model_info.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[model_info["node_name"]] = params

Train a Model

You can train from either a feature list or a feature selection (from an ideation). When training from a feature selection, the system automatically creates a feature list during training.

From a Feature List

payload = {
    "use_case_id": use_case_id,
    "model_name": "My Model",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_list_id": feature_list_id,
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}

# Optional parameters
if settings.get("calibration_method"):
    payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
    payload["use_naive_as_offset"] = settings["use_naive_as_offset"]

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

# Get the model ID from the task result
task_result = client.get(f"/task/{task_id}").json()
ml_model_id = task_result.get("payload", {}).get("output_document_id")

From a Feature Selection (Ideation)

Train directly from a feature selection produced by an ideation pipeline, without creating a feature list first:

payload = {
    "use_case_id": use_case_id,
    "model_name": "My Model - from Selection",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_source": {
        "type": "feature_selection",
        "document_id": feature_selection_id,
    },
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

ml_model_id = task.get("payload", {}).get("output_document_id")

The feature list is automatically created from the selection during training and cached for reuse.

The trained model can be retrieved with GET /catalog/ml_model/{ml_model_id}:

Model response fields:

Field Type Description
id string Model ID
name string Model name
use_case_id string Associated use case
feature_list_id string Feature list used for training
feature_list_name string Feature list name
model_template_type string Template type (e.g., "LIGHTGBM")
target_type string Target type ("REGRESSION", "BINARY_CLASSIFICATION", etc.)
feature_importance array Per-feature importance with feature, importance, importance_percent, cumulative_importance_percent
feature_key_importance array Per-feature-key importance (groups feature variants)
data_source object Training data configuration
is_pipeline_generated boolean Whether the model was created by an ideation pipeline

Parameters:

Parameter Type Required Description
use_case_id string Yes ID of the use case
model_name string Yes Display name for the model
data_source object Yes Training data configuration
data_source.type string Yes Data source type: "train_valid_observation_tables"
data_source.training_table_id string Yes ID of the training observation table
data_source.validation_table_id string Yes ID of the validation observation table
feature_list_id string One of ID of the feature list to use (shorthand for feature_source with type: "feature_list")
feature_source object One of Feature source — use instead of feature_list_id to train from a feature selection
feature_source.type string "feature_list" or "feature_selection"
feature_source.document_id string ID of the feature list or feature selection
model_template_type string Yes Template type from Step 2 (e.g., "LIGHTGBM")
objective string Yes Optimization objective from Step 1
metric string Yes Evaluation metric from Step 1
node_name_to_parameters object No Hyperparameter overrides per node (from Step 3)
role string Yes Machine learning role: "OUTCOME"
calibration_method string No Calibration method (from Step 1, if applicable)
use_naive_as_offset boolean No Use naive prediction as offset (forecast use cases)

Refit a Model

Retrain an existing model on new data while keeping the same feature list and template:

response = client.post(
    f"/ml_model/{ml_model_id}/refit",
    json={
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": new_training_table_id,
            "validation_table_id": None,
        },
        "model_name": "Daily Sales Forecast - Refit Q2",
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

# Get the new model ID from the completed task
new_model_id = task.get("payload", {}).get("output_document_id")

Parameters:

Parameter Type Required Description
data_source object Yes Training data configuration
data_source.type string Yes Data source type: "train_valid_observation_tables"
data_source.training_table_id string Yes ID of the new training observation table
data_source.validation_table_id string | null No Set to None for refit — the tuned parameters (including number of trees from early stopping) are reused from the original model, so no validation set is needed
model_name string No Name for the refit model (defaults to original name with suffix)
decision_threshold float No Decision threshold for classification models (between 0 and 1)