16. Manage Feature Life Cycle
Manage Feature Life Cycle¶
In this section, you'll learn to adjust the readiness of features, ensuring that the system and its users know whether a feature is primed for production.
As data evolves, it’s essential to:
- Update the table's default settings to mirror these changes.
- Introduce a new version of the feature that syncs with the updated settings.
- Curate a fresh feature list version to harness these new default feature versions.
When there are modifications in the source table's availability or freshness, we can generate new feature versions with adjusted feature job settings. If data quality is compromised, a new feature version can be crafted with specific cleaning operations to mitigate the emerging quality challenges.
For undisturbed Machine Learning operations relying on these features, it's imperative to maintain the availability of old feature versions. This ensures that any ML tasks dependent on them continue smoothly.
Important Note for FeatureByte Enterprise Users with Approval Flow¶
In Catalogs with Approval Flow enabled, changes in table metadata initiate a review process. This process recommends new versions of features and lists linked to these tables, ensuring that new models and deployments use versions that address any data issues.
For a deeper dive, check out the 'Deploy and Serve a Feature List' and 'Manage Feature Life Cycle' sections of our UI tutorials.
import featurebyte as fb
import pandas as pd
# Set your profile to the tutorial environment
fb.use_profile("tutorial")
catalog_name = "Grocery Dataset Tutorial"
catalog = fb.Catalog.activate(catalog_name)
16:16:27 | WARNING | Service endpoint is inaccessible: http://featurebyte-server:8088 16:16:27 | INFO | Using profile: tutorial 16:16:27 | INFO | Using configuration file at: /Users/gxav/.featurebyte/config.yaml 16:16:27 | INFO | Active profile: tutorial (https://tutorials.featurebyte.com/api/v1) 16:16:27 | WARNING | Remote SDK version (1.1.0.dev7) is different from local (1.1.0.dev1). Update local SDK to avoid unexpected behavior. 16:16:27 | INFO | No catalog activated. 16:16:27 | INFO | Catalog activated: Grocery Dataset Tutorial
Get feature from catalog¶
# Get CUSTOMER_Avg_of_invoice_Amount_28d
customer_avg_of_invoice_amount_28d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_28d")
Check feature definition file¶
customer_avg_of_invoice_amount_28d.definition
# Generated by SDK version: 1.1.0.dev7
from bson import ObjectId
from featurebyte import ColumnCleaningOperation
from featurebyte import DisguisedValueImputation
from featurebyte import EventTable
from featurebyte import FeatureJobSetting
from featurebyte import ValueBeyondEndpointImputation
# event_table name: "GROCERYINVOICE"
event_table = EventTable.get_by_id(ObjectId("666956c38080c62d0dc616e0"))
event_view = event_table.get_view(
view_mode="manual",
drop_column_names=["record_available_at"],
column_cleaning_operations=[
ColumnCleaningOperation(
column_name="Amount",
cleaning_operations=[
DisguisedValueImputation(
imputed_value=None, disguised_values=[-99.0, -98.0]
),
ValueBeyondEndpointImputation(
type="less_than", end_point=0.0, imputed_value=0.0
),
ValueBeyondEndpointImputation(
type="greater_than", end_point=2000.0, imputed_value=2000.0
),
],
)
],
)
grouped = event_view.groupby(
by_keys=["GroceryCustomerGuid"], category=None
).aggregate_over(
value_column="Amount",
method="avg",
windows=["28d"],
feature_names=["CUSTOMER_Avg_of_invoice_Amount_28d"],
feature_job_setting=FeatureJobSetting(
blind_spot="120s", period="3600s", offset="120s"
),
skip_fill_na=True,
offset=None,
)
feat = grouped["CUSTOMER_Avg_of_invoice_Amount_28d"]
output = feat
output.save(_id=ObjectId("6669575033eb5cd5aebc1fee"))
Update Feature Readiness¶
# List features and readiness
display(catalog.list_features())
id | name | dtype | readiness | online_enabled | tables | primary_tables | entities | primary_entities | created_at | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 666957cd3fab5208644858b2 | CUSTOMER_Mean_vector_of_item_product_ProductGr... | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [GROCERYINVOICE, INVOICEITEMS] | [customer] | [customer] | 2024-06-12T08:11:35.077000 |
1 | 6669578ded0c9d417ba58fff | CUSTOMER_vs_OVERALL_item_TotalCost_across_prod... | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer] | [customer] | 2024-06-12T08:08:56.428000 |
2 | 6669577ca1b61f71af4710cd | CUSTOMER_Latest_invoice_Amount_Z_Score_to_invo... | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:34.434000 |
3 | 6669575033eb5cd5aebc1ff0 | CUSTOMER_Std_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:09.891000 |
4 | 6669575033eb5cd5aebc1fef | CUSTOMER_Std_of_invoice_Amount_14d | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:09.341000 |
5 | 6669575033eb5cd5aebc1fee | CUSTOMER_Avg_of_invoice_Amount_28d | FLOAT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:08.818000 |
6 | 6669575033eb5cd5aebc1fed | CUSTOMER_Avg_of_invoice_Amount_14d | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:08.270000 |
7 | 6669575033eb5cd5aebc1fec | CUSTOMER_Count_of_invoice_28d | INT | DRAFT | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:07.750000 |
8 | 6669575033eb5cd5aebc1feb | CUSTOMER_Count_of_invoice_14d | INT | PRODUCTION_READY | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:07.361000 |
9 | 6669575033eb5cd5aebc1fea | CUSTOMER_Latest_invoice_Amount | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE] | [GROCERYINVOICE] | [customer] | [customer] | 2024-06-12T08:08:06.967000 |
10 | 6669575033eb5cd5aebc1fe6 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_28d | FLOAT | DRAFT | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:06.426000 |
11 | 6669575033eb5cd5aebc1fe5 | CUSTOMER_x_PRODUCTGROUP_Sum_of_item_TotalCost_14d | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:05.758000 |
12 | 6669575033eb5cd5aebc1fe9 | CUSTOMER_x_PRODUCTGROUP_Time_Since_Latest_Time... | FLOAT | PRODUCTION_READY | False | [GROCERYINVOICE, INVOICEITEMS, GROCERYPRODUCT] | [INVOICEITEMS] | [customer, productgroup] | [customer, productgroup] | 2024-06-12T08:08:05.091000 |
13 | 666957381ecbdd152339ded8 | CUSTOMER_Age_band | VARCHAR | PRODUCTION_READY | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-06-12T08:07:33.012000 |
14 | 666957381ecbdd152339dece | CUSTOMER_Age | INT | DRAFT | False | [GROCERYCUSTOMER] | [GROCERYCUSTOMER] | [customer] | [customer] | 2024-06-12T08:07:26.610000 |
# Update the readiness of the feature you want to share with others to Public Draft
customer_avg_of_invoice_amount_28d.update_readiness("PUBLIC_DRAFT")
# Check readiness
print(
f" {customer_avg_of_invoice_amount_28d.name} readiness:",
customer_avg_of_invoice_amount_28d.readiness,
)
CUSTOMER_Avg_of_invoice_Amount_28d readiness: PUBLIC_DRAFT
Collect additional information on a feature¶
# Get metadata on the feature
customer_avg_of_invoice_amount_28d.info()
name | CUSTOMER_Avg_of_invoice_Amount_28d | ||||||||||||||||||||||||||||||||||||||||||||
created_at | 2024-06-12 08:08:08 | ||||||||||||||||||||||||||||||||||||||||||||
updated_at | 2024-06-12 08:16:28 | ||||||||||||||||||||||||||||||||||||||||||||
description | None | ||||||||||||||||||||||||||||||||||||||||||||
entities |
|
||||||||||||||||||||||||||||||||||||||||||||
primary_entity |
|
||||||||||||||||||||||||||||||||||||||||||||
tables |
|
||||||||||||||||||||||||||||||||||||||||||||
version_count | 1 | ||||||||||||||||||||||||||||||||||||||||||||
catalog_name | Grocery Dataset Tutorial | ||||||||||||||||||||||||||||||||||||||||||||
dtype | FLOAT | ||||||||||||||||||||||||||||||||||||||||||||
primary_table |
|
||||||||||||||||||||||||||||||||||||||||||||
default_version_mode | AUTO | ||||||||||||||||||||||||||||||||||||||||||||
default_feature_id | 6669575033eb5cd5aebc1fee | ||||||||||||||||||||||||||||||||||||||||||||
version |
|
||||||||||||||||||||||||||||||||||||||||||||
readiness |
|
||||||||||||||||||||||||||||||||||||||||||||
table_feature_job_setting |
|
||||||||||||||||||||||||||||||||||||||||||||
table_cleaning_operation |
|
||||||||||||||||||||||||||||||||||||||||||||
versions_info | None | ||||||||||||||||||||||||||||||||||||||||||||
metadata |
|
||||||||||||||||||||||||||||||||||||||||||||
namespace_description | Avg of invoice Amount for the customer over a 28d period. |
Update Default Feature Job Setting at the table level¶
# Get GROCERYINVOICE table
invoice_table = catalog.get_table("GROCERYINVOICE")
# Get current Default Feature Job Setting
invoice_table.default_feature_job_setting
FeatureJobSetting(blind_spot='120s', period='3600s', offset='120s', execution_buffer='0s')
# List past analysis
past_analysis = invoice_table.list_feature_job_setting_analysis()
# Get past analysis
analysis_id = past_analysis.id.to_list()[0]
analysis = fb.FeatureJobSettingAnalysis.get_by_id(analysis_id)
# Backtest new setting
new_feature_job_setting = fb.FeatureJobSetting(
blind_spot='240s',
period=invoice_table.default_feature_job_setting.period,
offset=invoice_table.default_feature_job_setting.offset
)
backtest_result = analysis.backtest(feature_job_setting=new_feature_job_setting)
Done! |████████████████████████████████████████| 100% in 6.1s (0.17%/s)
- Period = 3600 s / Offset = 120 s / Blind spot = 240 s
The backtest found that all records would have been processed on time.
# Update Default Feature Job Setting
invoice_table.update_default_feature_job_setting(new_feature_job_setting)
Note:
- This new Default Feature Job Setting will be used by default by any new feature using the table.
- In the Enterprise platform, an approval process is associated with the change in the Default Feature Job Setting and a request to create new versions of existing features is automatically triggered.
Update Default Cleaning Operations at the table level¶
# Get GROCERYINVOICE table
invoice_table = catalog.get_table("GROCERYINVOICE")
# Get Info on columns
columns_info = pd.DataFrame(invoice_table.info(verbose=True)['columns_info'])
display(columns_info)
name | dtype | entity | semantic | critical_data_info | description | |
---|---|---|---|---|---|---|
0 | GroceryInvoiceGuid | VARCHAR | invoice | event_id | None | Unique identifier of each row in the table, in... |
1 | GroceryCustomerGuid | VARCHAR | customer | None | None | Unique identifier for each customer, in GUID f... |
2 | Timestamp | TIMESTAMP | None | event_timestamp | None | The GMT timestamp of when this invoice transac... |
3 | tz_offset | VARCHAR | None | time_zone | None | The local timezone offset of the invoice event. |
4 | record_available_at | TIMESTAMP | None | record_creation_timestamp | None | A timestamp for when this row was added to the... |
5 | Amount | FLOAT | None | None | {'cleaning_operations': [{'imputed_value': Non... | The total amount of the invoice, including all... |
# Get Current Cleaning Operation for Amount column
for info in columns_info.loc[columns_info.name=="Amount"]["critical_data_info"]:
print(info)
{'cleaning_operations': [{'imputed_value': None, 'type': 'disguised', 'disguised_values': [-99.0, -98.0]}, {'imputed_value': 0.0, 'type': 'less_than', 'end_point': 0.0}, {'imputed_value': 2000.0, 'type': 'greater_than', 'end_point': 2000.0}]}
# Update Cleaning Operations by adding -96 as a new disguised missing value
new_cleaning_operations = [
fb.DisguisedValueImputation(disguised_values=[-99, -98, -96], imputed_value=None),
fb.ValueBeyondEndpointImputation(
type="less_than", end_point=0, imputed_value=0
),
fb.ValueBeyondEndpointImputation(
type="greater_than", end_point=2000, imputed_value=2000
),
]
invoice_table["Amount"].update_critical_data_info(
cleaning_operations=new_cleaning_operations
)
Note:
- This new Default Cleaning Operations will be used by default by any new feature using the table column.
- In the Enterprise platform, an approval process is associated with the change in the Default Cleaning Operations and a request to create new versions of existing features using the table column is automatically triggered.
Change feature job setting and cleaning operations of a feature¶
# Get feature CUSTOMER_Avg_of_invoice_Amount_14d
customer_avg_of_invoice_amount_14d = catalog.get_feature("CUSTOMER_Avg_of_invoice_Amount_14d")
# Get current feature job setting
customer_avg_of_invoice_amount_14d.info()["table_feature_job_setting"]
{'this': [{'table_name': 'GROCERYINVOICE', 'feature_job_setting': {'blind_spot': '120s', 'period': '3600s', 'offset': '120s', 'execution_buffer': '0s'}}], 'default': [{'table_name': 'GROCERYINVOICE', 'feature_job_setting': {'blind_spot': '120s', 'period': '3600s', 'offset': '120s', 'execution_buffer': '0s'}}]}
# Get current cleaning operations
customer_avg_of_invoice_amount_14d.info()["table_cleaning_operation"]
{'this': [{'table_name': 'GROCERYINVOICE', 'column_cleaning_operations': [{'column_name': 'Amount', 'cleaning_operations': [{'imputed_value': None, 'type': 'disguised', 'disguised_values': [-99.0, -98.0]}, {'imputed_value': 0.0, 'type': 'less_than', 'end_point': 0.0}, {'imputed_value': 2000.0, 'type': 'greater_than', 'end_point': 2000.0}]}]}], 'default': [{'table_name': 'GROCERYINVOICE', 'column_cleaning_operations': [{'column_name': 'Amount', 'cleaning_operations': [{'imputed_value': None, 'type': 'disguised', 'disguised_values': [-99.0, -98.0]}, {'imputed_value': 0.0, 'type': 'less_than', 'end_point': 0.0}, {'imputed_value': 2000.0, 'type': 'greater_than', 'end_point': 2000.0}]}]}]}
# Deprecate current default version
customer_avg_of_invoice_amount_14d.update_readiness("DEPRECATED")
# Create new version
new_version = customer_avg_of_invoice_amount_14d.create_new_version(
table_feature_job_settings=[
fb.TableFeatureJobSetting(
table_name="GROCERYINVOICE",
feature_job_setting=new_feature_job_setting
)
],
table_cleaning_operations=[
fb.TableCleaningOperation(
table_name="GROCERYINVOICE",
column_cleaning_operations=[
fb.ColumnCleaningOperation(
column_name="Amount",
cleaning_operations=new_cleaning_operations
)
]
)
]
)
# Check new version is the default
print(
f"version_name: {new_version.version} \n",
f"is the new version the new default? {new_version.is_default}"
)
version_name: V240612_1 is the new version the new default? True