Skip to content

Semantic Detection

Prerequisites

This page uses the client and wait_for_task helpers defined in API Overview.

Semantic detection runs automatically as part of the ideation pipeline. However, you can run it beforehand to review the recommendations and set the ground truth at the table level. Once semantics are set on a table, the ideation pipeline will use them directly instead of guessing.

The typical workflow is:

  1. Run semantic detection to get recommendations
  2. Review the results and apply the correct semantics to your table columns
  3. Run ideation — it will use the established semantics without re-detecting

Run Semantic Detection

client = fb.Configurations().get_client()

response = client.post(
    "/semantic_detection/column_semantic_detection",
    json={
        "use_case_id": use_case_id,
        "table_id": table_id,
        "sample_enabled": True,
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

Parameters:

Parameter Type Required Description
use_case_id string One of ID of the use case — detects across all tables in the use case
table_id string One of ID of a specific registered table to analyze
sample_enabled boolean No Whether to sample the table for faster detection (default: true)

Get Detection Results

semantic_detection_id = task.get("payload", {}).get("output_document_id")

response = client.get(f"/semantic_detection/{semantic_detection_id}")
detection = response.json()

Response fields:

Field Type Description
id string Semantic detection ID
suggested_semantics array Per-column semantic tag recommendations (see below)
suggested_alias array Column alias suggestions
suggested_transforms object Suggested data transforms
entity_selection object Entity selection recommendations

Each item in suggested_semantics contains:

Field Type Description
table_id string ID of the table
table_name string Name of the table
column_name string Column being analyzed
column_description string Column description
existing_semantic_tag string Current semantic tag (if any)
proposed_semantic_tag string AI-recommended semantic tag
final_semantic_tag string Final semantic tag to apply
final_semantic_tag_description string Description of the final tag

Apply Semantics to Table Columns

After reviewing the detection results, set the ground truth on your table columns so ideation uses them directly. Each table type has its own endpoint:

Table type Endpoint
Event table PATCH /event_table/{id}/column_semantic
Item table PATCH /item_table/{id}/column_semantic
Dimension table PATCH /dimension_table/{id}/column_semantic
SCD table PATCH /scd_table/{id}/column_semantic
Time series table PATCH /time_series_table/{id}/column_semantic
Calendar table PATCH /calendar_table/{id}/column_semantic
Snapshots table PATCH /snapshots_table/{id}/column_semantic

Example:

response = client.patch(
    f"/event_table/{table_id}/column_semantic",
    json={
        "column_semantic_updates": [
            {"column_name": "amount", "semantic": "currency"},
            {"column_name": "user_id", "semantic": "user_id"},
        ],
    },
)

Parameters:

Parameter Type Required Description
column_semantic_updates array Yes List of column semantic updates
column_semantic_updates[].column_name string Yes Name of the column to update
column_semantic_updates[].semantic string Yes Semantic tag to apply. See Semantic Types Reference for valid values.

Applying all suggestions from detection:

for item in detection.get("suggested_semantics", []):
    final_tag = item.get("final_semantic_tag")
    if final_tag:
        table_obj = catalog.get_table(item["table_name"])
        table_type = table_obj.type.lower()  # e.g., "event_table"
        tid = str(table_obj.id)
        client.patch(
            f"/{table_type}/{tid}/column_semantic",
            json={
                "column_semantic_updates": [
                    {"column_name": item["column_name"], "semantic": final_tag}
                ],
            },
        )

print("Semantic tags applied")

List Semantic Detections

response = client.get(
    "/semantic_detection",
    params={
        "use_case_id": use_case_id,
        "page": 1,
        "page_size": 20,
    },
)
detections = response.json()["data"]

Returns a paginated response. Each item in data contains:

Field Type Description
id string Semantic detection ID
use_case_id string Associated use case ID
table_id string Analyzed table ID
created_at datetime When the detection was created