featurebyte.view.GroupBy.forward_aggregate_asat¶
Description¶
The forward_aggregate_asat method of a GroupBy instance returns an Aggregate ""as at"" Target object. The object aggregates data from the column specified by the value_column parameter using the aggregation method provided by the method parameter. By default, the aggrgegation is done on rows active at the point-in-time indicated in the feature request. The primary entity of the Feature is determined by the grouping key of the GroupBy instance, These aggregation operations are exclusively available for Slowly Changing Dimension (SCD) views, and the grouping key used in the GroupBy instance should not be the natural key of the SCD view.
For instance, a possible example of an aggregate ‘as at’ target from a Credit Cards table could be the count of credit cards held by a customer at the point-in-time indicated in the target request.
If an offset is defined, the aggregation uses the active rows of the SCD view's data at the point-in-time indicated in the feature request, plus the specified offset.
If the GroupBy instance involves computation across a categorical column, the returned Target object is a Cross Aggregate "as at" Target. In this scenario, the target value after materialization is a dictionary with keys representing the categories of the categorical column and their corresponding values indicating the aggregated values for each category. You may choose to fill the target value with a default value if the column to be aggregated is empty.
It is possible to perform additional transformations on the Target object, and the Target object is added to the catalog solely when explicitly saved.
Parameters¶
- value_column: Optional[str]
Column to be aggregated - method: Optional[Literal["sum", "avg", "min", "max", "count", "na_count", "std", "latest"]]
Aggregation method - target_name: Optional[str]
Output feature name - offset: Optional[str]
Optional offset to apply to the point in time column in the target request. The aggregation result will be as at the point in time adjusted by this offset. Format of offset is "{size}{unit}", where size is a positive integer and unit is one of the following:
"ns": nanosecond
"us": microsecond
"ms": millisecond
"s": second
"m": minute
"h": hour
"d": day
"w": week - fill_value: Union[StrictInt, StrictFloat, StrictStr, bool, NoneType]
Value to fill if the value in the column is empty - skip_fill_na: bool
default: False
Whether to skip filling NaN values
Returns¶
- Target
Examples¶
Count number of active cards per customer at a point-in-time.
>>> # Filter active cards
>>> cond = credit_card_accounts['status'] == "active"
>>> # Group by customer
>>> active_credit_card_by_cust = credit_card_accounts[cond].groupby(
... "CustomerID"
... )
>>> target = active_credit_card_by_cust.forward_aggregate_asat(
... method=fb.AggFunc.COUNT,
... feature_name="Number of Active Credit Cards",
... )
Count number of active cards per customer 12 weeks after a point-in-time