FeatureByte's Declarative Feature Engineering Framework¶
FeatureByte SDK is the core engine of FeatureByte's Self-Service Feature Platform. It is a free and source available package designed to:
-
Create state-of-the-art features, not data pipelines: Create features for Machine Learning with just a few lines of code. Leave the plumbing and pipelining to FeatureByte. We take care of orchestrating the data ops - whether it’s time-window aggs or backfilling, so you can deliver more value from data.
-
Improve Accuracy through data: Use the intuitive feature declaration framework to transform creative ideas into training data in minutes. Ditch the limitations of ad-hoc pipelines for features with much more scale, complexity and freshness.
-
Streamline machine learning data pipelines: Get more value from AI. Faster. Deploy and serve features in minutes, instead of weeks or months. Declare features in Python and automatically generate optimized data pipelines — all using tools you love like Jupyter Notebooks.
Take charge of the entire ML feature lifecycle¶
Feature Engineering and management doesn’t have to be complicated. Take charge of the entire ML feature lifecycle. With FeatureByte, you can create, experiment, serve and manage your features in one tool.
- Create and share state-of-the-art ML features effortlessly
- Search and reuse features to create feature lists tailored to your use case
# Get view from catalog
invoice_view = catalog.get_view("GROCERYINVOICE")
# Declare features of total spent by customer
# in the past 7 and 28 days
customer_purchases = invoice_view.groupby(
"GroceryCustomerGuid"
).aggregate_over(
"Amount",
method="sum",
feature_names=[
"CustomerTotalSpent_7d",
"CustomerTotalSpent_28d"
],
fill_value=0,
windows=['7d', '28d']
)
customer_purchases.save()
# Get feature list from the catalog
feature_list = catalog.get_feature_list(
"200 Features on Active Customers"
)
# Get an observation set from the catalog
observation_set = catalog.get_observation_table(
"5M rows of active Customers in 2021-2022"
)
# Compute training data and
# store it in the feature store for reuse and audit
training = \
feature_list.compute_historical_feature_table(
observation_set,
name="Training set to predict purchases next 2w"
)
- Immediately access historical features through automated backfilling - let FeatureByte handle the complexity of time-aware SQL
- Experiment on live data at scale, innovating faster
- Iterate rapidly with different feature lists to create more accurate models
- Deploy AI data pipelines and serve features in minutes
- Access features with low latency
- Reduce costs and security risk by performing computations in your existing data platform
- Ensure data consistency between model training and inferencing
# Get feature list from the catalog
feature_list = catalog.get_feature_list(
"200 Features on Active Customers"
)
# Create deployment
deployment = feature_list.deploy(
name="Features for customer purchases next 2w",
)
# Activate deployment
deployment.enable()
# Get shell script template for online serving
deployment.get_online_serving_code(language="sh")
# Get table from catalog
items_table = catalog.get_table("INVOICEITEMS")
# Discount must not be negative
items_table.Discount.update_critical_data_info(
cleaning_operations=[
fb.MissingValueImputation(
imputed_value=0
),
fb.ValueBeyondEndpointImputation(
type="less_than",
end_point=0,
imputed_value=0
),
]
)
- Organize feature engineering assets with domain-specific catalogs
- Centralize cleaning operations and feature job configurations
- Differentiate features that are prototype versus production ready
- Create new versions of your features to handle changes in data
- Keep full lineage of your training data and features in production
- Monitor the health of feature pipelines centrally
Unlock AI at scale in your enterprise¶
Want to scale your AI operations?
Benefit from the FeatureByte SDK and much more with FeatureByte Enterprise:
- AI-Powered Copilot: Automatically generate state-of-the-art features tailored to your use case.
- User-Friendly Interface: Ensure effortless collaboration and efficient management.
- Self-Organizing Feature Catalog: Promote feature reuse and reduce redundancy.
- Real-Time Monitoring: Oversee the health and costs of your feature pipeline centrally.
- Robust Governance: Implement role-based access controls and adhere to best-practice workflows.