Source Data Exploration¶

Generate Table Summaries¶

Use AI to generate descriptions for your warehouse tables. This is a two-step process: first generate the summaries, then list the tables to see them.

Step 1: Generate Summaries¶

import featurebyte as fb

client = fb.Configurations().get_client()

# Get the feature store ID
feature_store = fb.FeatureStore.get("MY_FEATURE_STORE")
feature_store_id = str(feature_store.id)

response = client.post(
    "/table/source_table_summary",
    json={
        "feature_store_id": feature_store_id,
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
        "table_names": ["SALES", "CALENDAR", "STORE_STATE"],
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Parameters:

Parameter	Type	Required	Description
`feature_store_id`	string	Yes	ID of the feature store
`database_name`	string	Yes	Database name in the warehouse
`schema_name`	string	Yes	Schema name in the warehouse
`table_names`	array	Yes	List of table names to generate summaries for

Step 2: List Tables with Summaries¶

Once summaries are generated, they are included in the table listing response:

response = client.get(
    f"/feature_store/{feature_store_id}/table",
    params={
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
    },
)
tables = response.json()

for t in tables:
    print(f"{t['table_name']}: {t.get('summary', '(no summary)')}")

Response fields (each item in the array):

Field	Type	Description
`table_name`	string	Table name
`summary`	string	AI-generated description of the table (may be `null` if not yet generated)

Analyze Source Tables¶

Analyze a source table to detect its type and validate its columns before registration.

response = client.post(
    "/table/source_table_analysis",
    json={
        "feature_store_id": feature_store_id,
        "database_name": "MY_DB",
        "schema_name": "MY_SCHEMA",
        "table_name": "SALES",
    },
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)

# Get the analysis results from the completed task
analysis_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/table/source_table_analysis/{analysis_id}")
analysis = response.json()

print(f"Table: SALES\n")
print("-"*8)
print(f'Suggested type:\n{analysis["table_type"]}\n')
print("-"*8)
print(f'Type explanation:\n{analysis["type_explanation"]}\n')
print("-"*8)
print(f'Setting explanation:\n{analysis["setting_explanation"]}\n')
print("-"*8)
print(f'Warnings:{analysis["warnings"]}')

The analysis detects the likely table type and suggests which columns to use for timestamps, keys, and series IDs. Use these suggestions when registering the table via the SDK.

Parameters:

Parameter	Type	Required	Description
`feature_store_id`	string	Yes	ID of the feature store
`database_name`	string	Yes	Database name in the warehouse
`schema_name`	string	Yes	Schema name in the warehouse
`table_name`	string	Yes	Table name to analyze

Response fields (GET /table/source_table_analysis/{id}):

Field	Type	Description
`id`	string	Analysis ID
`fully_qualified_name`	string	Fully qualified table name (`DB.SCHEMA.TABLE`)
`table_name`	string	Analyzed table name
`table_type`	string	Detected table type (see values below)
`type_explanation`	string	Why this table type was detected
`setting_explanation`	string	Explanation of suggested settings
`warnings`	string	Any warnings about the analysis
`errors`	string	Any errors encountered
`event_id_column`	string	Suggested event ID column (event tables)
`dimension_id_column`	string	Suggested dimension ID column (dimension tables)
`item_id_column`	string	Suggested item ID column (item tables)
`series_id_column`	string	Suggested series ID column (time series tables)
`natural_key_column`	string	Suggested natural key column (SCD tables)
`event_timestamp_column`	string	Suggested event timestamp column (event tables)
`effective_timestamp_column`	string	Suggested effective timestamp column (SCD tables)
`end_timestamp_column`	string	Suggested end timestamp column (SCD tables)
`snapshot_datetime_column`	string	Suggested snapshot datetime column (snapshots tables)
`calendar_datetime_column`	string	Suggested calendar datetime column (calendar tables)
`reference_datetime_column`	string	Suggested reference datetime column (time series tables)
`record_creation_timestamp_column`	string	Suggested record creation timestamp column
`event_timestamp_schema`	object	Timestamp schema for the event timestamp (format, timezone)
`time_interval_unit`	string	Suggested time interval unit (time series tables)

Only fields relevant to the detected table_type will be populated; others will be null.

Table type values:

Value	Description
`"event_table"`	Event log with timestamps
`"item_table"`	Item-level data linked to events
`"scd_table"`	Slowly changing dimension
`"dimension_table"`	Static reference data
`"snapshots_table"`	Periodic snapshots of state
`"calendar_table"`	Calendar/date features
`"time_series_table"`	Regular time series data