Skip to content

featurebyte.View.join

join(
other_view: View,
on: Union[str, NoneType]=None,
how: Literal["left", "inner"]="left",
rsuffix: str="",
rprefix: str=""
) -> ViewT

Description

To join two views, use the join() method of the left view and specify the right view object in the other_view parameter. The method will match rows from both views based on a shared key, which is either the primary key of the right view or the natural key if the right view is a Slowly Changing Dimension (SCD) view.

If the shared key identifies an entity that is referenced in the left view or the column name of the shared key is the same in both views, the join() method will automatically identify the column in the left view to use for the join.

By default, a left join is performed, and the resulting view will have the same number of rows as the left view. However, you can set the how parameter to 'inner' to perform an inner join. In this case, the resulting view will only contain rows where there is a match between the columns in both tables.

When the right view is an SCD view, the event timestamp of the left view determines which record of the right view to join.

Parameters

  • other_view: View
    The right view that you want to join. Its primary key (natural key if the view is an SCD) must be represented by a column in the left view.

  • on: Union[str, NoneType]
    Column name in the left view to use for the join. Note the ‘on’ argument is optional if:
    - the column name of the shared key is the same in both views.
    - the shared key identifies an entity that is referenced in the left view and the columns in the two views have been tagged with the entity reference.

  • how: Literal["left", "inner"]
    default: "left"
    The argument is optional. The default value is ‘left’, which indicates a left join. The resulting view will have the same number of rows as the left view. If ‘inner’ is selected, the resulting view will only contain rows where there is a match between the columns in both tables.

  • rsuffix: str
    default: ""
    The argument is used to disambiguate overalapping column names after join by adding the suffix to the right view's column names. The default rsuffix is an empty string.

  • rprefix: str
    default: ""
    The argument is used to disambiguate overalapping column names after join by adding the prefix to the right view's column names. The default rprefix is an empty string.

Returns

  • ViewT
    The joined view. Its type is the same as the left view.

Examples

Use the automated mode if one of the 2 following conditions are met:

  • the column name of the shared key is the same in both views.
  • the shared key identifies an entity that is referenced in the left view and the columns in the two views have been tagged with the entity reference.

>>> items_view = catalog.get_view("INVOICEITEMS")
>>> product_view = catalog.get_view("GROCERYPRODUCT")
>>> items_view_with_product_group = items_view.join(product_view)
If not, specify the column name in the left view that represents the right view’s primary key (natural key if the view is an SCD).

>>> items_view = catalog.get_view("INVOICEITEMS")
>>> product_view = catalog.get_view("GROCERYPRODUCT")
>>> items_view_with_product_group = items_view.join(
...   product_view, on="GroceryProductGuid"
... )

Use an inner join if you want the returned view to be filtered and contain only the rows that have matching values in both views.

>>> items_view = catalog.get_view("INVOICEITEMS")
>>> product_view = catalog.get_view("GROCERYPRODUCT")
>>> items_view_with_non_missing_product_group = items_view.join(
...   product_view, on="GroceryProductGuid", how="inner"
... )