Skip to content

Source Data Exploration

Prerequisites

This page uses the wait_for_task helper defined in API Overview.

Before registering tables, the API provides tools to explore your warehouse, generate AI-powered descriptions, and analyze source tables to detect their type. These operations help you understand your data before registration.

Generate Table Summaries

Use AI to generate descriptions for your warehouse tables. This is a two-step process: first generate the summaries, then list the tables to see them.

Step 1: Generate Summaries

import featurebyte as fb

client = fb.Configurations().get_client()

# Get the feature store ID
feature_store = fb.FeatureStore.get("MY_FEATURE_STORE")
feature_store_id = str(feature_store.id)

response = client.post(
    "/table/source_table_summary",
    json={
        "feature_store_id": feature_store_id,
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
        "table_names": ["SALES", "CALENDAR", "STORE_STATE"],
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Parameters:

Parameter Type Required Description
feature_store_id string Yes ID of the feature store
database_name string Yes Database name in the warehouse
schema_name string Yes Schema name in the warehouse
table_names array Yes List of table names to generate summaries for

Step 2: List Tables with Summaries

Once summaries are generated, they are included in the table listing response:

response = client.get(
    f"/feature_store/{feature_store_id}/table",
    params={
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
    },
)
tables = response.json()

for t in tables:
    print(f"{t['table_name']}: {t.get('summary', '(no summary)')}")

Response fields (each item in the array):

Field Type Description
table_name string Table name
summary string AI-generated description of the table (may be null if not yet generated)

Analyze Source Tables

Analyze a source table to detect its type and validate its columns before registration.

response = client.post(
    "/table/source_table_analysis",
    json={
        "feature_store_id": feature_store_id,
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
        "table_name": "SALES",
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

# Get the analysis results from the completed task
analysis_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/table/source_table_analysis/{analysis_id}")
analysis = response.json()

print(f"Table: SALES\n")
print("-"*8)
print(f'Suggested type:\n{analysis["table_type"]}\n')
print("-"*8)
print(f'Type explanation:\n{analysis["type_explanation"]}\n')
print("-"*8)
print(f'Setting explanation:\n{analysis["setting_explanation"]}\n')
print("-"*8)
print(f'Warnings:{analysis["warnings"]}')

The analysis detects the likely table type and suggests which columns to use for timestamps, keys, and series IDs. Use these suggestions when registering the table via the SDK.

Parameters:

Parameter Type Required Description
feature_store_id string Yes ID of the feature store
database_name string Yes Database name in the warehouse
schema_name string Yes Schema name in the warehouse
table_name string Yes Table name to analyze

Response fields (GET /table/source_table_analysis/{id}):

Field Type Description
id string Analysis ID
fully_qualified_name string Fully qualified table name (DB.SCHEMA.TABLE)
table_name string Analyzed table name
table_type string Detected table type (see values below)
type_explanation string Why this table type was detected
setting_explanation string Explanation of suggested settings
warnings string Any warnings about the analysis
errors string Any errors encountered
event_id_column string Suggested event ID column (event tables)
dimension_id_column string Suggested dimension ID column (dimension tables)
item_id_column string Suggested item ID column (item tables)
series_id_column string Suggested series ID column (time series tables)
natural_key_column string Suggested natural key column (SCD tables)
event_timestamp_column string Suggested event timestamp column (event tables)
effective_timestamp_column string Suggested effective timestamp column (SCD tables)
end_timestamp_column string Suggested end timestamp column (SCD tables)
snapshot_datetime_column string Suggested snapshot datetime column (snapshots tables)
calendar_datetime_column string Suggested calendar datetime column (calendar tables)
reference_datetime_column string Suggested reference datetime column (time series tables)
record_creation_timestamp_column string Suggested record creation timestamp column
event_timestamp_schema object Timestamp schema for the event timestamp (format, timezone)
time_interval_unit string Suggested time interval unit (time series tables)

Only fields relevant to the detected table_type will be populated; others will be null.

Table type values:

Value Description
"event_table" Event log with timestamps
"item_table" Item-level data linked to events
"scd_table" Slowly changing dimension
"dimension_table" Static reference data
"snapshots_table" Periodic snapshots of state
"calendar_table" Calendar/date features
"time_series_table" Regular time series data