featurebyte.FeatureList.compute_historical_features¶
compute_historical_features(
observation_set: DataFrame,
serving_names_mapping: Optional[Dict[str, str]]=None
) -> DataFrameDescription¶
Returns a DataFrame with feature values for analysis, model training, or evaluation. The historical features request data consists of an observation set that combines historical points-in-time and key values of the primary entity from the feature list.
Associated serving entities can also be utilized.
Initial computation might take more time, but following calls will be faster due to pre-computed and saved partially aggregated data (tiles).
A training data observation set should typically meet the following criteria:
- be collected from a time period that does not start until after the earliest data availability timestamp plus longest time window in the features
- be collected from a time period that ends before the latest data timestamp less the time window of the target value
- uses points in time that align with the anticipated timing of the use case inference, whether it's based on a regular schedule, triggered by an event, or any other timing mechanism.
- does not have duplicate rows
- has for the same entity, key points in time that have time intervals greater than the horizon of the target to avoid leakage.
Parameters¶
- observation_set: DataFrame
Observation set DataFrame which combines historical points-in-time and values of the feature primary entity or its descendant (serving entities). The column containing the point-in-time values should be namedPOINT_IN_TIME
, while the columns representing entity values should be named using accepted serving names for the entity. - serving_names_mapping: Optional[Dict[str, str]]
Optional serving names mapping if the training events table has different serving name columns than those defined in Entities, mapping from original serving name to new name.
Returns¶
- DataFrame
Materialized historical features.
Note: POINT_IN_TIME
values will be converted to UTC time.
Examples¶
Create a feature list with two features.
>>> feature_list = fb.FeatureList(
... [
... catalog.get_feature("InvoiceCount_60days"),
... catalog.get_feature("InvoiceAmountAvg_60days"),
... ],
... name="InvoiceFeatures",
... )
>>> observation_set = pd.DataFrame({
... "POINT_IN_TIME": pd.date_range(start="2022-04-15", end="2022-04-30", freq="2D"),
... "GROCERYCUSTOMERGUID": ["a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3"] * 8,
... })
>>> feature_list.compute_historical_features(observation_set)
POINT_IN_TIME GROCERYCUSTOMERGUID InvoiceCount_60days InvoiceAmountAvg_60days
0 2022-04-15 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 9.0 10.223333
1 2022-04-17 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 9.0 10.223333
2 2022-04-19 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 9.0 10.223333
3 2022-04-21 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 10.0 9.799000
4 2022-04-23 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 10.0 9.799000
5 2022-04-25 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 9.0 9.034444
6 2022-04-27 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 10.0 9.715000
7 2022-04-29 a2828c3b-036c-4e2e-9bd6-30c9ee9a20e3 10.0 9.715000
Retrieve materialized historical features with serving names mapping.
See Also¶
- FeatureGroup.preview: Preview feature group.
- Feature.preview: Preview feature group.