Model Training¶
See also
UI Tutorial: Create New Feature Lists and Models | UI Tutorial: Refit Model | Concepts: ML Model | API Tutorial: Credit Default — Steps 11-12b
Prerequisites
This page uses the client and wait_for_task helpers defined in API Overview.
This page covers how to train models using the FeatureByte API, including resolving model templates, configuring parameters, and training on different observation tables.
Resolve Model Template and Settings¶
Before training, you need to determine the appropriate objective, metric, and model template. The API provides endpoints to suggest these automatically.
Step 1: Get Suggested Settings¶
client = fb.Configurations().get_client()
response = client.get(
f"/use_case/{use_case_id}/ml_model_template_setting",
params={
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
"feature_list_id": feature_list_id, # or use feature_selection_id instead
"machine_learning_role": "OUTCOME",
},
)
settings = response.json()
# Returns: objective, metric, calibration_method, use_naive_as_offset
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
training_table_id |
string | Yes | ID of the training observation table |
validation_table_id |
string | No | ID of the validation observation table |
feature_list_id |
string | One of | ID of the feature list to train with |
feature_selection_id |
string | One of | ID of a feature selection (alternative to feature_list_id) |
machine_learning_role |
string | Yes | Role of the model: "OUTCOME" |
Response fields:
| Field | Type | Description |
|---|---|---|
objective |
string | Suggested optimization objective (e.g., "regression", "binary") |
metric |
string | Suggested evaluation metric (e.g., "rmse", "auc") |
calibration_method |
string | Suggested calibration method, if any |
use_naive_as_offset |
boolean | Whether to use naive prediction as offset (for forecast use cases) |
Step 2: Get Available Templates¶
response = client.get(
f"/use_case/{use_case_id}/ml_model_template",
params={
"feature_list_id": feature_list_id, # or use feature_selection_id instead
"training_table_id": training_table_id,
"objective": settings["objective"],
"metric": settings["metric"],
"machine_learning_role": "OUTCOME",
},
)
templates = response.json()["data"]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
feature_list_id |
string | One of | ID of the feature list |
feature_selection_id |
string | One of | ID of a feature selection (alternative to feature_list_id) |
training_table_id |
string | No | ID of the training observation table |
objective |
string | Yes | Optimization objective (from Step 1) |
metric |
string | Yes | Evaluation metric (from Step 1) |
machine_learning_role |
string | Yes | Role of the model: "OUTCOME" |
Each template in the response includes:
| Field | Type | Description |
|---|---|---|
type |
string | Template type (e.g., "LIGHTGBM") |
id |
string | Template ID |
preprocessors |
array | Preprocessor nodes with node_name and parameters_metadata |
model |
object | Model node with node_name and parameters_metadata |
Each entry in parameters_metadata contains name, default_value, type, and description.
Step 3: Extract Default Parameters¶
Each template contains preprocessor and model parameters with default values:
template = templates[0] # or select a specific template by type
node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
params = {
p["name"]: p["default_value"]
for p in preprocessor.get("parameters_metadata", [])
if p.get("default_value") is not None
}
if params:
node_name_to_parameters[preprocessor["node_name"]] = params
model_info = template.get("model", {})
if model_info:
params = {
p["name"]: p["default_value"]
for p in model_info.get("parameters_metadata", [])
if p.get("default_value") is not None
}
if params:
node_name_to_parameters[model_info["node_name"]] = params
Train a Model¶
You can train from either a feature list or a feature selection (from an ideation). When training from a feature selection, the system automatically creates a feature list during training.
From a Feature List¶
payload = {
"use_case_id": use_case_id,
"model_name": "My Model",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
"feature_list_id": feature_list_id,
"model_template_type": template["type"],
"objective": settings["objective"],
"metric": settings["metric"],
"node_name_to_parameters": node_name_to_parameters,
"role": "OUTCOME",
}
# Optional parameters
if settings.get("calibration_method"):
payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
payload["use_naive_as_offset"] = settings["use_naive_as_offset"]
response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)
# Get the model ID from the task result
task_result = client.get(f"/task/{task_id}").json()
ml_model_id = task_result.get("payload", {}).get("output_document_id")
From a Feature Selection (Ideation)¶
Train directly from a feature selection produced by an ideation pipeline, without creating a feature list first:
payload = {
"use_case_id": use_case_id,
"model_name": "My Model - from Selection",
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": training_table_id,
"validation_table_id": validation_table_id,
},
"feature_source": {
"type": "feature_selection",
"document_id": feature_selection_id,
},
"model_template_type": template["type"],
"objective": settings["objective"],
"metric": settings["metric"],
"node_name_to_parameters": node_name_to_parameters,
"role": "OUTCOME",
}
response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
ml_model_id = task.get("payload", {}).get("output_document_id")
The feature list is automatically created from the selection during training and cached for reuse.
The trained model can be retrieved with GET /catalog/ml_model/{ml_model_id}:
Model response fields:
| Field | Type | Description |
|---|---|---|
id |
string | Model ID |
name |
string | Model name |
use_case_id |
string | Associated use case |
feature_list_id |
string | Feature list used for training |
feature_list_name |
string | Feature list name |
model_template_type |
string | Template type (e.g., "LIGHTGBM") |
target_type |
string | Target type ("REGRESSION", "BINARY_CLASSIFICATION", etc.) |
feature_importance |
array | Per-feature importance with feature, importance, importance_percent, cumulative_importance_percent |
feature_key_importance |
array | Per-feature-key importance (groups feature variants) |
data_source |
object | Training data configuration |
is_pipeline_generated |
boolean | Whether the model was created by an ideation pipeline |
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
use_case_id |
string | Yes | ID of the use case |
model_name |
string | Yes | Display name for the model |
data_source |
object | Yes | Training data configuration |
data_source.type |
string | Yes | Data source type: "train_valid_observation_tables" |
data_source.training_table_id |
string | Yes | ID of the training observation table |
data_source.validation_table_id |
string | Yes | ID of the validation observation table |
feature_list_id |
string | One of | ID of the feature list to use (shorthand for feature_source with type: "feature_list") |
feature_source |
object | One of | Feature source — use instead of feature_list_id to train from a feature selection |
feature_source.type |
string | — | "feature_list" or "feature_selection" |
feature_source.document_id |
string | — | ID of the feature list or feature selection |
model_template_type |
string | Yes | Template type from Step 2 (e.g., "LIGHTGBM") |
objective |
string | Yes | Optimization objective from Step 1 |
metric |
string | Yes | Evaluation metric from Step 1 |
node_name_to_parameters |
object | No | Hyperparameter overrides per node (from Step 3) |
role |
string | Yes | Machine learning role: "OUTCOME" |
calibration_method |
string | No | Calibration method (from Step 1, if applicable) |
use_naive_as_offset |
boolean | No | Use naive prediction as offset (forecast use cases) |
Refit a Model¶
Retrain an existing model on new data while keeping the same feature list and template:
response = client.post(
f"/ml_model/{ml_model_id}/refit",
json={
"data_source": {
"type": "train_valid_observation_tables",
"training_table_id": new_training_table_id,
"validation_table_id": None,
},
"model_name": "Daily Sales Forecast - Refit Q2",
},
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
# Get the new model ID from the completed task
new_model_id = task.get("payload", {}).get("output_document_id")
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
data_source |
object | Yes | Training data configuration |
data_source.type |
string | Yes | Data source type: "train_valid_observation_tables" |
data_source.training_table_id |
string | Yes | ID of the new training observation table |
data_source.validation_table_id |
string | null | No | Set to None for refit — the tuned parameters (including number of trees from early stopping) are reused from the original model, so no validation set is needed |
model_name |
string | No | Name for the refit model (defaults to original name with suffix) |
decision_threshold |
float | No | Decision threshold for classification models (between 0 and 1) |