Source Data Exploration¶
See also
UI Tutorial: Register Tables | Concepts: Source Tables | API Tutorial: Credit Default — Step 1b
Prerequisites
This page uses the wait_for_task helper defined in API Overview.
Before registering tables, the API provides tools to explore your warehouse, generate AI-powered descriptions, and analyze source tables to detect their type. These operations help you understand your data before registration.
Generate Table Summaries¶
Use AI to generate descriptions for your warehouse tables. This is a two-step process: first generate the summaries, then list the tables to see them.
Step 1: Generate Summaries¶
import featurebyte as fb
client = fb.Configurations().get_client()
# Get the feature store ID
feature_store = fb.FeatureStore.get("MY_FEATURE_STORE")
feature_store_id = str(feature_store.id)
response = client.post(
"/table/source_table_summary",
json={
"feature_store_id": feature_store_id,
"database_name": "MY_DB",
"schema_name": "MY_SCHEMA",
"table_names": ["SALES", "CALENDAR", "STORE_STATE"],
},
)
task_id = response.json()["id"]
wait_for_task(client, task_id)
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
feature_store_id |
string | Yes | ID of the feature store |
database_name |
string | Yes | Database name in the warehouse |
schema_name |
string | Yes | Schema name in the warehouse |
table_names |
array | Yes | List of table names to generate summaries for |
Step 2: List Tables with Summaries¶
Once summaries are generated, they are included in the table listing response:
response = client.get(
f"/feature_store/{feature_store_id}/table",
params={
"database_name": "MY_DB",
"schema_name": "MY_SCHEMA",
},
)
tables = response.json()
for t in tables:
print(f"{t['table_name']}: {t.get('summary', '(no summary)')}")
Response fields (each item in the array):
| Field | Type | Description |
|---|---|---|
table_name |
string | Table name |
summary |
string | AI-generated description of the table (may be null if not yet generated) |
Analyze Source Tables¶
Analyze a source table to detect its type and validate its columns before registration.
response = client.post(
"/table/source_table_analysis",
json={
"feature_store_id": feature_store_id,
"database_name": "MY_DB",
"schema_name": "MY_SCHEMA",
"table_name": "SALES",
},
)
task_id = response.json()["id"]
task = wait_for_task(client, task_id)
# Get the analysis results from the completed task
analysis_id = task.get("payload", {}).get("output_document_id")
response = client.get(f"/table/source_table_analysis/{analysis_id}")
analysis = response.json()
print(f"Table: SALES\n")
print("-"*8)
print(f'Suggested type:\n{analysis["table_type"]}\n')
print("-"*8)
print(f'Type explanation:\n{analysis["type_explanation"]}\n')
print("-"*8)
print(f'Setting explanation:\n{analysis["setting_explanation"]}\n')
print("-"*8)
print(f'Warnings:{analysis["warnings"]}')
The analysis detects the likely table type and suggests which columns to use for timestamps, keys, and series IDs. Use these suggestions when registering the table via the SDK.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
feature_store_id |
string | Yes | ID of the feature store |
database_name |
string | Yes | Database name in the warehouse |
schema_name |
string | Yes | Schema name in the warehouse |
table_name |
string | Yes | Table name to analyze |
Response fields (GET /table/source_table_analysis/{id}):
| Field | Type | Description |
|---|---|---|
id |
string | Analysis ID |
fully_qualified_name |
string | Fully qualified table name (DB.SCHEMA.TABLE) |
table_name |
string | Analyzed table name |
table_type |
string | Detected table type (see values below) |
type_explanation |
string | Why this table type was detected |
setting_explanation |
string | Explanation of suggested settings |
warnings |
string | Any warnings about the analysis |
errors |
string | Any errors encountered |
event_id_column |
string | Suggested event ID column (event tables) |
dimension_id_column |
string | Suggested dimension ID column (dimension tables) |
item_id_column |
string | Suggested item ID column (item tables) |
series_id_column |
string | Suggested series ID column (time series tables) |
natural_key_column |
string | Suggested natural key column (SCD tables) |
event_timestamp_column |
string | Suggested event timestamp column (event tables) |
effective_timestamp_column |
string | Suggested effective timestamp column (SCD tables) |
end_timestamp_column |
string | Suggested end timestamp column (SCD tables) |
snapshot_datetime_column |
string | Suggested snapshot datetime column (snapshots tables) |
calendar_datetime_column |
string | Suggested calendar datetime column (calendar tables) |
reference_datetime_column |
string | Suggested reference datetime column (time series tables) |
record_creation_timestamp_column |
string | Suggested record creation timestamp column |
event_timestamp_schema |
object | Timestamp schema for the event timestamp (format, timezone) |
time_interval_unit |
string | Suggested time interval unit (time series tables) |
Only fields relevant to the detected table_type will be populated; others will be null.
Table type values:
| Value | Description |
|---|---|
"event_table" |
Event log with timestamps |
"item_table" |
Item-level data linked to events |
"scd_table" |
Slowly changing dimension |
"dimension_table" |
Static reference data |
"snapshots_table" |
Periodic snapshots of state |
"calendar_table" |
Calendar/date features |
"time_series_table" |
Regular time series data |