Metrics Stores
vs
Feature Stores
Ramon Perez
With ❤️ from Sydney, AU
Hi! 👋🏼 I'm Ramon
- Senior Product Developer @ Decoded
- Research Associate @ INSEAD
- Data Scientist @ Coder Academy
- Developer Advocate @ Transform
- Statistical Programming Instructor @ London Business School
Former
Currently
Learning Outcomes
By the end of this session, you will,
- Understand what metrics, features, and their stores are.
- Learn about the problems these tools solve.
- Understand what to look for in either store.
- Become aware of how companies are using these solutions.
- Get a sense of the requirements for adopting either.
Agenda
Why...
should you understand them, and
be excited about what they offer?
How...
do stores add value,
what solutions are out there, and
how do I figure out what to look for in them?
What are...
metrics, features, and their stores?
Metrics
If you can not measure it, you cannot improve it.
Lord Kelvin
Metrics are what companies use to measure, track and improve the components that have a direct effect on their value proposition.
A standard for measuring or evaluating something, especially one that uses figures or statistics.
A standard of measurement."
A quantifiable measure that is used to track and assess the status of a specific process.
What are metrics?
Metrics are Everywhere
Dating Apps
Daily Active Users
Premium Memberships
Churn
Clothing Brands
Customer Lifetime Value
News Outlets
Web Traffic
All Employers Should
Happiness (Employee Satisfaction)
As an Example
Imagine that you owned a bar...
Idx | Item | Price | Date | Quantity | Signature |
---|---|---|---|---|---|
0 | Old Fashion | 17 | 3-Jan-22 | 2 | 0 |
1 | Shiny Palace (S) | 24 | 3-Jan-22 | 4 | 1 |
2 | Multi-vodkaTini (S) | 23 | 3-Jan-22 | 1 | 1 |
3 | Pale Ale | 7 | 3-Jan-22 | 3 | 0 |
4 | Lager | 6 | 4-Jan-22 | 3 | 0 |
5 | Crazy Tiki Tower (S) | 28 | 4-Jan-22 | 2 | 1 |
6 | Bananalicious Martini (S) | 25 | 4-Jan-22 | 4 | 1 |
7 | Aperol Spritz | 15 | 4-Jan-22 | 2 | 0 |
8 | Caribbean Sugarum (S) | 26 | 5-Jan-22 | 2 | 1 |
9 | Watermelicious (S) | 28 | 5-Jan-22 | 1 | 1 |
10 | Manhattan | 17 | 5-Jan-22 | 3 | 0 |
11 | Brown Ale | 7 | 5-Jan-22 | 4 | 0 |
What you get is the following...
But what you care most about is your metrics...
day_of_year | revenue_per_menu_cocktail | revenue_per_classic_drink |
---|---|---|
Day 3 | 55 | 119 |
Day 4 | 48 | 81 |
Day 5 | 79 | 80 |
A Recent Survey Found...
Source: How do organizations manage metrics today? by Transform.co
Metrics are...
KPIs are...
and
A Quick Note on these 2, which are often used interchangeably
Tied to the success of the company.
Data points aggregated over time.
Used to track a company's financial health, among many things.
Tied to a specific goal such as the success of an ad campaign.
Frequencies and aggregated numbers that provide direction towards achieving a goal.
Can change regularly for similar or different goals that may or may not affect revenue.
Characteristics of
good metrics
Have a clear goal
Have a consistent definition
Are actionable
Are relevant to the botton line
Are measurable
Are readable
Metrics Stores
What's a Metrics Store?
A Metrics Store is a tool that allows you to define metrics as code, govern them, and serve them to a variety of downstream applications.
Data Storage
Metrics Store
Data Sources
Data Analysis and other tools
This assumes an ETL approach took place.
Metrics Stores allow you to productionilise data for different use cases and stakeholders of varying technical levels.
Data Analysis and other tools
This assumes an ELT approach took place.
Data Storage
Metrics Store
Tranformations
Data Sources
Depending on the tool, these could happen together.
What was it like before
Metrics Stores?
Each use case would have its own metrics logic definition prior to the analytics stage.
Data Analysis and other tools
Data Storage
Metrics Logic +
Data Sources
How do they work?
Connect to the flavour(s) of warehouse or database(s) available at your organization.
Data producers define metrics as code in multiple ways.
Downstream Tools
BI
Metrics Store
SaaS
ML
Data Producers
Marketing
Product Team
Finance
Downstream tools connect to your Store.
Metrics Store Mechanics
1. Metrics Framework
2. Metrics Catalog
3. Metrics API
0. Connect to your Data Warehouse
What problems do they solve?
Inconsistent metric definitions across teams.
Wasted time writing queries/code rather than insights.
Inaccurate values at the time of reporting.
Untraceability of code/queries.
Lack of governance and trust. Who created what, when and how?
Duplicate data.
Increased costs with the use of cloud resources.
A Recent Survey Found...
Source: How do organizations manage metrics today? by Transform.co
Who uses them?
Airbnb has Minerva
LinkedIn has UMP
Uber has uMetric (and M3)
Features
Features are numeric representations of raw data that serve as the fuel for machine learning models.
What are features?
Student | Month | Income |
---|---|---|
Yes | Feb | $20K |
No | Mar | $75K |
No | Jul | $60K |
Yes | Jan | $22K |
Yes | Dec | $10K |
Not Features
Features
Student | Month | Income |
---|---|---|
1 | 2 | 20000 |
0 | 3 | 75000 |
0 | 7 | 60000 |
1 | 1 | 22000 |
1 | 12 | 10000 |
The Machine Learning Workflow
Problem/Goal
Data Sources
Prepare Data
Train Model
Evaluate
Fine Tune
Deploy
Monitor
Features and Labels
How do we create features? 🤔
By re-coding all non-numerical values into numerical ones, e.g. Likert scale-type question into numbers
Date_Time |
---|
31-12-2021 14:22:55 |
20-07-2021 11:40:13 |
By extracting information from different data points, e.g. dates (not those dates 👩❤️👨)
Day | Month | Year | Hour | Min | Secs |
---|---|---|---|---|---|
31 | 12 | 2021 | 14 | 22 | 55 |
20 | 7 | 2021 | 11 | 40 | 13 |
How satisfied were you with your chicken burger?
Very Dissatisfied | Dissatisfied | Neutral | Satisfied | Very Satisfied
Before | After |
---|---|
Satisfied | 4 |
Dissatisfied | 2 |
There are many other ways...
That was structured data, what about unstructured...
dad_jokes |
---|
How did the picture end up in jail? It was framed! |
I made a pencil with two erasers. It was pointless. |
Where do lizards go to fix their fallen tails? The retail shop. |
did | do | end | ... | to | two | up | was | |
---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | ... | 0 | 0 | 1 | 1 |
2 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 1 |
3 | 0 | 1 | 0 | ... | 1 | 0 | 0 | 0 |
What is feature engineering?
...is the process of formulating the most appropriate features given the data, the model, and the task. ~ Alice Zheng and Amanda Casari
...it is using domain knowledge of the data to create new features that increase the signal coming from the data.
Feature engineering is important because it helps us...
Capture subtle to complex relationships in the raw data.
Increase the accuracy of our models.
Get rid of duplicate features.
Reduce the magnitude and scale of the features.
Deal with of outliers.
Reduce the dimensionality of our data.
How to select a good feature set?
Strive for simplicity.
Select independent features.
Avoid useless features.
Avoid redundant features.
Pick a good starting point for the minimum number of features your problem will need.
Feature Stores
What's a feature store?
...an ML-specific data system that runs data pipelines that transform raw data into feature values, stores and manages the feature data itself, and serves data consistently for training and inference purposes. ~ Willem Pienaar & Mike Del Baso, 2021
What problems do they solve?
Lack of management of old features, and access to new ones.
Slow online serving (i.e. streaming and real-time predictions).
Inconsistent data transformations.
Data pipelines constant rebuilds.
The monitoring of a feature pipelines' health in production.
Having to provide extensive engineering support.
How do they work?
Problem/Goal
Get Data
Feature Store
Model
Deploy
Transform
Online
Batch
Serve
Define and Register
Share and Monitor
How do they work?
Problem/Goal
Get Data
Online
Web Applications
Logs
External APIs
Events
How do they work?
Problem/Goal
Get Data
Batch
Snowflake
BigQuery
Redshift
Azure Synapse Analytics
How do they work?
Problem/Goal
Get Data
Feature Store
Online
Batch
Define and Register
How do they work?
Problem/Goal
Get Data
Feature Store
Transform
Online
Batch
How do they work?
Problem/Goal
Get Data
Feature Store
Online
Batch
Serve
How do they work?
Problem/Goal
Get Data
Feature Store
Online
Batch
Share and Monitor
Who uses them?
Spotify has JukeBox
Airbnb has Zipline
Gojek has Feast
Uber has Michelangelo Palette
Agenda
Why...
should you understand them, and
be excited about what they offer?
How...
do stores add value,
what solutions are out there, and
how do I figure out what to look for in them?
What are...
metrics, features, and their stores?
Why do we need metrics?
To measure and improve business outcomes.
To track how the evolution of our products and services affect the bottom line.
To test the effects of new products and services in different areas of the business.
To understand our customers behaviour.
To spot dips and spikes in performance and prevent churn and turnover.
Metrics are the "shared language" for organizations to make decisions on.
Why do we need features?
To represent raw data from the real world.
To improve our products and services with machine learning.
To train machine learning models.
To understand our customers' behaviour.
To provide recommendations to consumers.
Why do we need metrics stores?
To standardize the way in which goals are tracked within the organization.
To apply well-tested software engineering best practices to our analytics functions.
To let our visualization and reporting tools do what they do best and move all metrics' logic to a single place.
To stop duplicating tables at the warehouse level at the time of metrics logic definition.
Because the amount of teams doing and taking advantage of analytics across organisations continues to increase.
Why do we need feature stores?
Because serving predictions in real-time is hard.
To reduce the latency between getting raw data, transforming it, and making a prediction.
To stop duplicating features for the same purpose.
To automatically backfill newly selected features as needed.
To detect drift between data sources.
Stores Advantages
Feature
Metrics
Automate metrics/features creation.
Automatically backfill metrics/feature computation and logging.
Enable software engineering best practices.
Increases consistency between training and serving data
Enable the sharing of metrics/features across different teams.
Reduce costs.
Increase experimental/productionization velocity.
Build trust among end-users with consistent definitions.
Abstract away the complexity from multiple data pipelines.
Stores Disadvantages
Feature
Metrics
Can have a steep learning curve for data professionals without a coding background.
Advanced level of engineering required to set up.
Still early days which means a lot of testing and development is still in progress.
Complex feature creation in real-time is still a challenge.
It is easier to adopt for big companies.
Handling thousands of features is a challenge.
Handling large datasets can be challenging.
Similarities
In both, you can define things once and use them everywhere.
Update your definitions once and the changes happen globally, which means metrics/features get backfilled.
Both act as a centralized repository of knowledge to help create value from data.
Both work best and are optimized for, structured data.
Both have interfaces to similar tools like Jupyter Lab, R Studio, etc.
Differences
- Metrics can be fed to a machine learning model, but features might be meaningless as metrics.
- Feature stores are optimized for online usage while metrics stores work mostly (although not solely) with batch data.
- While a metrics store sits as a layer between the data storage and any downstream tool, feature stores work for and around the machine learning lifecycle.
- Metrics tend to have a long shelf-life while features can come and go as the products and services of the company mature.
- Metrics Stores should support simple-to-complex metrics while Feature Stores should make the simple ones easy to implement to iterate fast and effectively.
Metrics vs Features Example
Churn, defined in the metrics context, is the rate at which customers stop doing business with us. In terms of features and in the machine learning context, churn is a 1 or a 0.
For Metrics, Churn Looks Like...
month | total # | churned |
---|---|---|
12 | 7023 | 150 |
11 | 7090 | 110 |
10 | 6903 | 133 |
9 | 7541 | 98 |
8 | 7209 | 122 |
7 | 7387 | 170 |
churn rate |
---|
2.13% |
1.55% |
1.92% |
1.29% |
1.69% |
2.30% |
In Terms of ML, Churn looks...
ID | Gender | Tenure | Total Charges |
---|---|---|---|
1234 | 1 | 3 | 4150 |
5678 | 0 | 30 | 10110 |
9101 | 1 | 42 | 7133 |
1213 | 1 | 10 | 598 |
1415 | 0 | 25 | 9122 |
1617 | 1 | 13 | 2170 |
Churn |
---|
1 |
0 |
0 |
0 |
1 |
0 |
Agenda
Why...
should you understand them, and
be excited about what they offer?
How...
do stores add value,
what solutions are out there, and
how do I figure out what to look for in them?
What are...
metrics, features, and their stores?
Value Added
Metrics Stores
Feature Stores
- Single source of truth for any downstream tool.
- Having the metric logic separated from downstream tools allows more people to focus on deriving insights from data.
- Define metrics as code and make it so that changes to a definition have to follow software engineering best practices.
- Enable ownership of a metric.
- Hand over most of the feature creation process and serving to a single system and focus on creating and productionising ML.
- Monitor distribution shifts between training and incoming data.
- Low latency between incoming data, its transformation, and prediction.
What are our options?
...for metrics stores
...for feature stores
What to look for in either solution?
Metrics Stores
- Metrics Framework - Define and manage metrics once
- Metrics Catalog - An interface with metric knowledge
- Metric API - Meet consumers where they are at
Feature Stores
- Single feature definition for production and training
- Distribution consistency guarantee between serving and training
- Seamless integration between online and batch serving
When should you adopt a metrics store?
- If you want to empower more people at your organization to take advantage of your data.
- If different teams are making decisions based on the same metrics but with different definitions.
- If a lot of the analytics cycle is spent writing queries to get data.
When should you adopt a feature store?
- When ML is not a "nice-to-have" at your organization.
- When online serving is a crucial part of your business.
- When the number of features your models rely on is large, or;
- when you need a way to monitor distribution shifts between historical and incoming data.
to Recap...
Metrics allow us to track what matters for our business.
Features are the fuel of our machine learning models and they often need reshaping before we get to use them.
Metrics Stores provide us with a way to write, govern, and serve metrics in a common language.
Feature Stores abstract away the reshaping of features while providing scalability in both, online serving and offline training.
If you need consistency, scalability, reusability, and a performance boost in your analytics/ML operations then adopt...
Thanks!
Metrics Stores and Feature Stores are like the maestro/conductor of an orchestra, while the muscisians can still perform without one, you can only hope for good synchronization.