featurebyte.SourceTable.create_snapshots_table¶
create_snapshots_table(
name: str,
snapshot_datetime_column: str,
snapshot_datetime_schema: TimestampSchema,
time_interval: TimeInterval,
series_id_column: Optional[str],
record_creation_timestamp_column: Optional[str]=None,
description: Optional[str]=None,
datetime_partition_column: Optional[str]=None,
datetime_partition_schema: Optional[TimestampSchema]=None
) -> SnapshotsTableDescription¶
Creates and adds to the catalog an SnapshotsTable object from a source table. To create a SnapshotsTable, you need to identify the columns representing the entity being snapshotted key and snapshot datetime.
After creation, the table can optionally incorporate additional metadata at the column level to further aid feature engineering. This can include identifying columns that identify or reference entities, providing information about the semantics of the table columns, specifying default cleaning operations, or furnishing descriptions of its columns.
Parameters¶
- name: str
The desired name for the new table. - snapshot_datetime_column: str
Column representing the datetime of the snapshot. - snapshot_datetime_schema: TimestampSchema
The schema of the snapshot datetime column. Timezone column is not supported. - time_interval: TimeInterval
Specifies the frequency of snapshots. Note that only intervals defined with a single time unit (e.g., 1 day, 1 week) are supported. - series_id_column: Optional[str]
Represents the entity being snapshotted. Must be unique within each snapshot datetime. - record_creation_timestamp_column: Optional[str]
The optional column for the timestamp when a record was created. - description: Optional[str]
The optional description for the new table. - datetime_partition_column: Optional[str]
The optional column for the datetime column used for partitioning the snapshots table. - datetime_partition_schema: Optional[TimestampSchema]
The optional timestamp schema for the datetime partition column.
Returns¶
- SnapshotsTable
SnapshotsTable created from the source table.
Examples¶
Create an snapshots table from a source table.
>>> # Register GROCERYPROFILE as a snapshots table
>>> source_table = ds.get_source_table(
... database_name="spark_catalog", schema_name="GROCERY", table_name="GROCERYPROFILES"
... )
>>> sales_table = source_table.create_snapshots_table(
... name="GROCERYPROFILES",
... snapshot_datetime_column="Date",
... snapshot_datetime_schema=TimestampSchema(timezone="Etc/UTC"),
... time_interval=TimeInterval(value=1, unit="DAY"),
... series_id_column="StoreGuid",
... record_creation_timestamp_column="record_available_at",
... )
See Also¶
- TimestampSchema: Schema for a timestamp column that can include timezone information.
- TimeIntervalUnit: Time interval unit for the time series.