Metrics Stores

vs

Feature Stores

Ramon Perez

With ❤️ from Sydney, AU

Hi! 👋🏼 I'm Ramon

  • Senior Product Developer @ Decoded
  • Research Associate                @ INSEAD
  • Data Scientist @ Coder Academy
  • Developer Advocate              @ Transform
  • Statistical Programming Instructor @ London Business School

Former

Currently

Learning Outcomes

By the end of this session, you will,

  1. Understand what metrics, features, and their stores are.
  2. Learn about the problems these tools solve.
  3. Understand what to look for in either store.
  4. Become aware of how companies are using these solutions.
  5. Get a sense of the requirements for adopting either.

Agenda

Why... 

should you understand them, and

be excited about what they offer?

How...

do stores add value,

what solutions are out there, and

how do I figure out what to look for in them?

What are...

metrics, features, and their stores?

Metrics

If you can not measure it, you cannot improve it.

Lord Kelvin

Metrics are what companies use to measure, track and improve the components that have a direct effect on their value proposition.

A standard for measuring or evaluating something, especially one that uses figures or statistics.

A standard of measurement."

A quantifiable measure that is used to track and assess the status of a specific process.

What are metrics?

metric-analytic-analysis-statistic-data-presentation

Metrics are Everywhere

Dating Apps

Daily Active Users

Premium Memberships

Churn

Clothing Brands

Customer Lifetime Value

News Outlets

Web Traffic

All Employers Should

Happiness (Employee Satisfaction)

A Recent Survey Found...

Source: How do organizations manage metrics today? by Transform.co

Metrics are...

KPIs are...

and

A Quick Note on these 2, which are often used interchangeably

Tied to the success of the company.

Data points aggregated over time.

Used to track a company's financial health, among many things.

Tied to a specific goal such as the success of an ad campaign.

Frequencies and aggregated numbers that provide direction towards achieving a goal.

Can change regularly for similar or different goals that may or may not affect revenue.

Characteristics of

good metrics

Have a clear goal

Have a consistent definition

Are actionable

Are relevant to the botton line

Are measurable

Are readable

Metrics Stores

What's a Metrics Store?

A Metrics Store is a tool that allows you to define metrics as code, govern them, and serve them to a variety of downstream applications.

Data Storage

Metrics Store

Data Sources

Chart_20Icon_10px_grid

Data Analysis and other tools

This assumes an ETL approach took place.

In addition, Metrics Stores allow you to productionize data for diverse business use cases and stakeholders of varying technical levels.

Data Analysis and other tools

This assumes an ELT approach took place.

Data Storage

Metrics Store

Data Sources

Chart_20Icon_10px_grid

Tranformations

What was it like before

Metrics Stores?

Each use case would have its own metrics logic definition prior to the analytics stage.

Data Analysis and other tools

Data Storage

Metrics Logic       +

Data Sources

Chart_20Icon_10px_grid

What problems do they solve?

Inconsistent metrics definitions across teams.

Wasted time writing queries/code rather than insights.

Inaccurate values at the time of reporting.

Untraceability of code/queries.

Lack of governance and trust. Who created what, when and how?

Duplicate data.

Increased costs with the use of cloud resources.

A Recent Survey Found...

Source: How do organizations manage metrics today? by Transform.co

How do they work?

Connect to the flavour(s) of warehouse or database(s) available at your organization.

Data producers define metrics as code in multiple ways.

Downstream  Tools

BI

Metrics Store

SaaS

ML

Data Producers

Marketing

Product Team

Finance

Downstream tools connect to your Store.

Who uses them?

Airbnb has Minerva

LinkedIn has UMP

Uber has uMetric (and M3)

Features

The Machine Learning Workflow

Problem/Goal

Data Sources

Prepare Data

Train Model

Evaluate

Fine Tune

Deploy

Monitor

Features and Labels

Features are numeric representations of raw data that serve as the fuel for machine learning models.

What are features?

Student Month Income
Yes Feb $20K
No Mar $75K
No Jul $60K
Yes Jan $22K
Yes Dec $10K

Not Features

Features

Student Month Income
1 2 20000
0 3 75000
0 7 60000
1 1 22000
1 12 10000

How do we create features? 🤔

By re-coding all non-numerical values into numerical ones, e.g. Likert scale-type question into numbers

Date_Time
31-12-2021 14:22:55
20-07-2021 11:40:13

By extracting information from different data points, e.g. dates (not those dates 👩‍❤️‍👨)

Day Month Year Hour Min Secs
31 12 2021 14 22 55
20 7 2021 11 40 13

How satisfied were you with your chicken burger?

Very Dissatisfied | Dissatisfied | Neutral | Satisfied | Very Satisfied

Before After
Satisfied 4
Dissatisfied 2

That was structured data, what about unstructured...

dad_jokes
How did the picture end up in jail? It was framed!
I made a pencil with two erasers. It was pointless.
Where do lizards go to fix their fallen tails? The retail shop.
did do end ... to two up was
1 1 0 1 ... 0 0 1 1
2 0 0 0 ... 0 1 0 1
3 0 1 0 ... 1 0 0 0

What is feature engineering?

...is the process of formulating the most appropriate features given the data, the model, and the task. ~ Alice Zheng and Amanda Casari

...it is using domain knowledge of the data to create new features that increase the signal coming from the data.

Feature engineering is important because it helps us...

Capture subtle to complex relationships in the raw data.

Increase the accuracy of our models.

Get rid of duplicate features.

Reduce the magnitude and scale of the features.

Deal with of outliers.

Reduce the dimensionality of our data.

How to select a good feature set?

Strive for simplicity.

Select independent features.

Avoid useless features.

Avoid redundant features.

Pick a good starting point for the minimum number of features your problem will need.

Feature Stores

What's a feature store?

...an ML-specific data system that runs data pipelines that transform raw data into feature values, stores and manages the feature data itself, and serves data consistently for training and inference purposes. ~ Willem Pienaar & Mike Del Baso, 2021

What problems do they solve?

Redundancy when building features.

Slow online serving (i.e. real-time predictions).

The scaling large ML models.

Discovering new features at scale.

The monitoring of a feature pipelines' health in production.

Having to provide extensive engineering support.

How do they work?

Problem/Goal

Get Data

Feature Store

Model

Deploy

Transform

Online

Batch

Serve

Define and Register

Share and Monitor

Who uses them?

Spotify has JukeBox

Airbnb has Zipline

Gojek has Feast

Uber has Michelangelo Palette

Netflix has Metaflow

Agenda

Why... 

should you understand them, and

be excited about what they offer?

How...

do stores add value,

what solutions are out there, and

how do I figure out what to look for in them?

What are...

metrics, features, and their stores?

Why do we need metrics?

To measure and improve business outcomes.

To track how the evolution of our products and services affect the bottom line.

To test the effects of new products and services in different areas of the business.

To understand our customers behaviour.

To spot dips and spikes in performance and prevent churn and turnover.

Metrics are the "shared language" for organizations to make decisions on.

Why do we need features?

To represent raw data from the real world.

To improve our products and services with machine learning.

To train machine learning models.

To understand our customers' behaviour.

To provide recommendations to consumers.

Why do we need metrics stores?

To standardize the way in which goals are tracked within the organization.

To apply well-tested software engineering best practices to our analytics functions. 

To let our visualization and reporting tools do what they do best and move all metrics' logic to a single place.

To stop duplicating tables at the warehouse level at the time of metrics logic definition.

Because the amount of teams doing and taking advantage of analytics across organisations continues to increase.

Why do we need feature stores?

Because serving predictions in real-time is hard.

To reduce the latency between getting raw data, transforming it, and making a prediction.

To stop duplicating features for the same purpose.

To automatically backfill newly selected features as needed.

To detect drift between data sources.

Stores Advantages

Feature

Metrics

Automate metrics/features creation.

Automatically backfill metrics/feature computation and logging.

Enable software engineering best practices.

Increases consistency between training and serving data

Enable the sharing of metrics/features across different teams.

Reduce costs.

Increase experimental/productionization velocity.

Build trust among end-users with consistent definitions.

Abstract away the complexity from multiple data pipelines.

Stores Disadvantages

Feature

Metrics

Can have a steep learning curve for data professionals without a coding background.

Advanced level of engineering required to set up.

Still early days which means a lot of testing and development is still in progress.

Complex feature creation in real-time is still a challenge.

It is easier to adopt for big companies.

Handling thousands of features is a challenge.

Handling large datasets can be challenging.

Similarities

In both, you can define things once and use them everywhere.

Update your definitions once and the changes happen globally, which means metrics/features get backfilled.

Both act as a centralized repository of knowledge to help create value from data.

Both work best and are optimized for, structured data.

Both have interfaces to similar tools like Jupyter Lab, R Studio, etc.

Differences

  • Metrics can be fed to a machine learning model, but features might be meaningless as metrics.
  • Feature stores are optimized for online usage while metrics stores work mostly (although not solely) with batch data.
  • While a metrics store sits as a layer between the data storage and any downstream tool, feature stores work for and around the machine learning lifecycle.
  • Metrics tend to have a long shelf-life while features can come and go as the products and services of the company mature.
  • Metrics Stores should support simple-to-complex metrics while Feature Stores should make the simple ones easy to implement to iterate fast and effectively.

Metrics vs Features Example

Churn, defined in the metrics context, is the rate at which customers stop doing business with us. In terms of features and in the machine learning context, churn is a 1 or a 0.

For Metrics, Churn Looks Like...

month total # churned
12 7023 150
11 7090 110
10 6903 133
9 7541 98
8 7209 122
7 7387 170
churn rate
2.13%
1.55%
1.92%
1.29%
1.69%
2.30%

In Terms of ML, Churn looks...

ID Gender Tenure Total Charges
1234 1 3 4150
5678 0 30 10110
9101 1 42 7133
1213 1 10 598
1415 0 25 9122
1617 1 13 2170
Churn
1
0
0
0
1
0

Agenda

Why... 

should you understand them, and

be excited about what they offer?

How...

do stores add value,

what solutions are out there, and

how do I figure out what to look for in them?

What are...

metrics, features, and their stores?

Value Added

Metrics Stores

Feature Stores

  • Single source of truth for any downstream tool.
  • Having the metric logic separated from downstream tools allows more people to focus on deriving insights from data.
  • Define metrics as code and make it so that changes to a definition have to follow software engineering best practices.
  • Enable ownership of a metric.
  • Hand over most of the feature creation process and serving to a single system and focus on creating and productionising ML.
  • Monitor distribution shifts between training and incoming data.
  • Low latency between incoming data, its transformation, and prediction.

What are our options?

...for metrics stores

...for feature stores

What to look for in either solution?

Metrics Stores

  • Metrics Framework - Define and manage metrics once
  • Metrics Catalog - An interface with metric knowledge
  • Metric API - Meet consumers where they are at

Feature Stores

  • Single feature definition for production and training
  • Distribution consistency guarantee between serving and training
  • Seamless integration between online and batch serving

When should you adopt a metrics store?

  • If you want to empower more people at your organization to take advantage of your data.
  • If different teams are making decisions based on the same metrics but with different definitions.
  • If a lot of the analytics cycle is spent writing queries to get data.

When should you adopt a feature store?

  • When ML is not a "nice-to-have" at your organization.
  • When online serving is a crucial part of your business.
  • When the number of features your models rely on is large, or;
  • when you need a way to monitor distribution shifts between historical and incoming data.

to Recap...

Metrics allow us to track what matters for our business.

Features are the fuel of our machine learning models and they often need reshaping before we get to use them.

Metrics Stores provide us with a way to write, govern, and serve metrics in a common language.

Feature Stores abstract away the reshaping of features while providing scalability in both, online serving and offline training.

If you need consistency, scalability, reusability, and a performance boost in your analytics/ML operations then adopt...

Thanks!

Metrics Stores and Feature Stores are like the maestro/conductor of an orchestra, while the muscisians can still perform without one, you can only hope for good synchronization.

Metrics Stores vs Feature Stores

By ramongz

Metrics Stores vs Feature Stores

  • 221