Table EDA¶

Run Table EDA¶

client = fb.Configurations().get_client()

response = client.post(
    "/table_eda",
    json={
        "table_id": table_id,
        "analysis_size": 10000,
        "seed": 42,
    },
)
task_id = response.json()["id"]
wait_for_task(client, task_id)

Parameters:

Parameter	Type	Required	Description
`table_id`	string	Yes	ID of the registered table
`analysis_size`	integer	Yes	Number of rows to sample for analysis
`seed`	integer	No	Random seed for reproducibility
`from_timestamp`	string	No	Start timestamp filter (ISO format)
`to_timestamp`	string	No	End timestamp filter (ISO format)
`timestamp_column`	string	No	Column to use for timestamp filtering
`overwrite`	boolean	No	Overwrite existing EDA results (default: `false`)

List Table EDA Runs¶

response = client.get(
    "/table_eda",
    params={"table_id": table_id, "page": 1, "page_size": 20},
)
eda_runs = response.json()["data"]
table_analysis_id = eda_runs[0]["_id"]

Response fields (each item in data):

Field	Type	Description
`id`	string	Table EDA ID
`table_id`	string	Table that was analyzed
`analysis_size`	integer	Number of rows sampled
`seed`	integer	Random seed used
`last_seen`	datetime	When the analysis was last run
`from_timestamp`	datetime	Start of analysis time range (if filtered)
`to_timestamp`	datetime	End of analysis time range (if filtered)

View Column-Level Analysis¶

List Column Analyses¶

response = client.patch(
    "/column_analysis",
    json={
        "table_analysis_id": table_analysis_id,
    },
)
columns = response.json()["data"]

Response fields (each item in data):

Field	Type	Description
`id`	string	Column analysis ID
`table_analysis_id`	string	Parent table EDA ID
`table_id`	string	Table ID
`column_name`	string	Column being analyzed
`column_dtype`	string	Data type (e.g., `"INT"`, `"FLOAT"`, `"VARCHAR"`)
`column_description`	string	Column description
`issues`	array	Detected data quality issues (missing values, outliers, etc.)
`cleaning_reco`	array	Recommended cleaning operations
`semantic_level_1`	string	Detected semantic type
`summary`	object	Summary statistics (count, unique, min, max, mean, etc.)
`outdated`	boolean	Whether the analysis needs to be refreshed

Filter to only show columns with issues:

response = client.patch(
    "/column_analysis",
    json={
        "table_analysis_id": table_analysis_id,
        "issues": True,
    },
)

Get Detailed Column Analysis¶

column_analysis_id = columns[0]["_id"]

response = client.patch(f"/column_analysis/{column_analysis_id}")
analysis = response.json()

The response contains the same fields as each item in the column analysis list above.

Column EDA Plots¶

Generate interactive Bokeh plots for a column's data distribution.

Get Plot Options¶

Check what categories are available for the column (e.g., dictionary keys or embedding dimensions):

response = client.request(
    "OPTIONS",
    f"/column_eda/{column_analysis_id}",
    params={"cleaned": False},
)
options = response.json()

Parameters:

Parameter	Type	Required	Description
`cleaned`	boolean	Yes	`false` for raw data categories, `true` for categories after cleaning operations

Response fields:

Field	Type	Description
`column_categories`	array	Available categories for the column (dictionary keys or embedding dimensions to filter by)

Get Rendered Plots¶

response = client.patch(
    f"/column_eda/{column_analysis_id}",
    params={
        "cleaned": False,
        "height": 500,
        "width": 1000,
        "font_size": 16,
        "output_format": "html",
    },
)
plots = response.json()

Parameters:

Parameter	Type	Required	Description
`cleaned`	boolean	Yes	`false` for raw data, `true` for data after cleaning operations are applied
`column_category`	any	No	Filter by dictionary key or embedding dimension (from OPTIONS response)
`height`	integer	No	Plot height in pixels (default: 500)
`width`	integer	No	Plot width in pixels (default: 1000)
`font_size`	integer	No	Font size in pixels (default: 16)
`output_format`	string	No	Output format: `"html"` (default) or `"json"`

Response fields (each item in the response array):

Field	Type	Description
`plot_type`	string	Type of plot (e.g., `"distribution"`, `"histogram"`)
`plots`	array	List of rendered plot objects, each with a `content` field containing self-contained Bokeh HTML

The response is a plot list. See Displaying Plots for how to extract and render plots in Jupyter, save as HTML, or embed in a web application.

Compare Raw vs Cleaned¶

Generate plots for both raw and cleaned data to see the effect of cleaning operations:

# Raw data
raw_plots = client.patch(
    f"/column_eda/{column_analysis_id}",
    params={"cleaned": False},
).json()

# After cleaning operations
cleaned_plots = client.patch(
    f"/column_eda/{column_analysis_id}",
    params={"cleaned": True},
).json()