Model Training¶

Resolve Model Template and Settings¶

Before training, you need to determine the appropriate objective, metric, and model template. The API provides endpoints to suggest these automatically.

Step 1: Get Suggested Settings¶

client = fb.Configurations().get_client()

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template_setting",
    params={
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
        "feature_list_id": feature_list_id,  # or use feature_selection_id instead
        "machine_learning_role": "OUTCOME",
    },
)
settings = response.json()
# Returns: objective, metric, calibration_method, use_naive_as_offset

Parameters:

Parameter	Type	Required	Description
`training_table_id`	string	Yes	ID of the training observation table
`validation_table_id`	string	No	ID of the validation observation table
`feature_list_id`	string	One of	ID of the feature list to train with
`feature_selection_id`	string	One of	ID of a feature selection (alternative to `feature_list_id`)
`machine_learning_role`	string	Yes	Role of the model: `"OUTCOME"`

Response fields:

Field	Type	Description
`objective`	string	Suggested optimization objective (e.g., `"regression"`, `"binary"`)
`metric`	string	Suggested evaluation metric (e.g., `"rmse"`, `"auc"`)
`calibration_method`	string	Suggested calibration method, if any
`use_naive_as_offset`	boolean	Whether to use naive prediction as offset (for forecast use cases)

Step 2: Get Available Templates¶

response = client.get(
    f"/use_case/{use_case_id}/ml_model_template",
    params={
        "feature_list_id": feature_list_id,  # or use feature_selection_id instead
        "training_table_id": training_table_id,
        "objective": settings["objective"],
        "metric": settings["metric"],
        "machine_learning_role": "OUTCOME",
    },
)
templates = response.json()["data"]

Parameters:

Parameter	Type	Required	Description
`feature_list_id`	string	One of	ID of the feature list
`feature_selection_id`	string	One of	ID of a feature selection (alternative to `feature_list_id`)
`training_table_id`	string	No	ID of the training observation table
`objective`	string	Yes	Optimization objective (from Step 1)
`metric`	string	Yes	Evaluation metric (from Step 1)
`machine_learning_role`	string	Yes	Role of the model: `"OUTCOME"`

Each template in the response includes:

Field	Type	Description
`type`	string	Template type (e.g., `"LIGHTGBM"`)
`id`	string	Template ID
`preprocessors`	array	Preprocessor nodes with `node_name` and `parameters_metadata`
`model`	object	Model node with `node_name` and `parameters_metadata`

Each entry in parameters_metadata contains name, default_value, type, and description.

Step 3: Extract Default Parameters¶

Each template contains preprocessor and model parameters with default values:

template = templates[0]  # or select a specific template by type

node_name_to_parameters = {}
for preprocessor in template.get("preprocessors", []):
    params = {
        p["name"]: p["default_value"]
        for p in preprocessor.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[preprocessor["node_name"]] = params

model_info = template.get("model", {})
if model_info:
    params = {
        p["name"]: p["default_value"]
        for p in model_info.get("parameters_metadata", [])
        if p.get("default_value") is not None
    }
    if params:
        node_name_to_parameters[model_info["node_name"]] = params

Train a Model¶

You can train from either a feature list or a feature selection (from an ideation). When training from a feature selection, the system automatically creates a feature list during training.

From a Feature List¶

payload = {
    "use_case_id": use_case_id,
    "model_name": "My Model",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_list_id": feature_list_id,
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}

# Optional parameters
if settings.get("calibration_method"):
    payload["calibration_method"] = settings["calibration_method"]
if settings.get("use_naive_as_offset") is not None:
    payload["use_naive_as_offset"] = settings["use_naive_as_offset"]

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
wait_for_task(client, task_id)

# Get the model ID from the task result
task_result = client.get(f"/task/{task_id}").json()
ml_model_id = task_result.get("payload", {}).get("output_document_id")

From a Feature Selection (Ideation)¶

Train directly from a feature selection produced by an ideation pipeline, without creating a feature list first:

payload = {
    "use_case_id": use_case_id,
    "model_name": "My Model - from Selection",
    "data_source": {
        "type": "train_valid_observation_tables",
        "training_table_id": training_table_id,
        "validation_table_id": validation_table_id,
    },
    "feature_source": {
        "type": "feature_selection",
        "document_id": feature_selection_id,
    },
    "model_template_type": template["type"],
    "objective": settings["objective"],
    "metric": settings["metric"],
    "node_name_to_parameters": node_name_to_parameters,
    "role": "OUTCOME",
}

response = client.post("/ml_model", json=payload)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

ml_model_id = task.get("payload", {}).get("output_document_id")

The feature list is automatically created from the selection during training and cached for reuse.

The trained model can be retrieved with GET /catalog/ml_model/{ml_model_id}:

Model response fields:

Field	Type	Description
`id`	string	Model ID
`name`	string	Model name
`use_case_id`	string	Associated use case
`feature_list_id`	string	Feature list used for training
`feature_list_name`	string	Feature list name
`model_template_type`	string	Template type (e.g., `"LIGHTGBM"`)
`target_type`	string	Target type (`"REGRESSION"`, `"BINARY_CLASSIFICATION"`, etc.)
`feature_importance`	array	Per-feature importance with `feature`, `importance`, `importance_percent`, `cumulative_importance_percent`
`feature_key_importance`	array	Per-feature-key importance (groups feature variants)
`data_source`	object	Training data configuration
`is_pipeline_generated`	boolean	Whether the model was created by an ideation pipeline

Parameters:

Parameter	Type	Required	Description
`use_case_id`	string	Yes	ID of the use case
`model_name`	string	Yes	Display name for the model
`data_source`	object	Yes	Training data configuration
`data_source.type`	string	Yes	Data source type: `"train_valid_observation_tables"`
`data_source.training_table_id`	string	Yes	ID of the training observation table
`data_source.validation_table_id`	string	Yes	ID of the validation observation table
`feature_list_id`	string	One of	ID of the feature list to use (shorthand for `feature_source` with `type: "feature_list"`)
`feature_source`	object	One of	Feature source — use instead of `feature_list_id` to train from a feature selection
`feature_source.type`	string	—	`"feature_list"` or `"feature_selection"`
`feature_source.document_id`	string	—	ID of the feature list or feature selection
`model_template_type`	string	Yes	Template type from Step 2 (e.g., `"LIGHTGBM"`)
`objective`	string	Yes	Optimization objective from Step 1
`metric`	string	Yes	Evaluation metric from Step 1
`node_name_to_parameters`	object	No	Hyperparameter overrides per node (from Step 3)
`role`	string	Yes	Machine learning role: `"OUTCOME"`
`calibration_method`	string	No	Calibration method (from Step 1, if applicable)
`use_naive_as_offset`	boolean	No	Use naive prediction as offset (forecast use cases)

Refit a Model¶

Retrain an existing model on new data while keeping the same feature list and template:

response = client.post(
    f"/ml_model/{ml_model_id}/refit",
    json={
        "data_source": {
            "type": "train_valid_observation_tables",
            "training_table_id": new_training_table_id,
            "validation_table_id": None,
        },
        "model_name": "Daily Sales Forecast - Refit Q2",
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

# Get the new model ID from the completed task
new_model_id = task.get("payload", {}).get("output_document_id")

Parameters:

Parameter	Type	Required	Description
`data_source`	object	Yes	Training data configuration
`data_source.type`	string	Yes	Data source type: `"train_valid_observation_tables"`
`data_source.training_table_id`	string	Yes	ID of the new training observation table
`data_source.validation_table_id`	string \| null	No	Set to `None` for refit — the tuned parameters (including number of trees from early stopping) are reused from the original model, so no validation set is needed
`model_name`	string	No	Name for the refit model (defaults to original name with suffix)
`decision_threshold`	float	No	Decision threshold for classification models (between 0 and 1)