The modeling pipeline is a tool
that you can use to streamline
your model building workflow.
Your time is valuable; the pipeline saves you time.
If we all use the same methods to construct our models, comparative evaluation becomes much easier.
We will all have the same language to talk about our models.
Input:
Output:
INSERT OVERWRITE TABLE model_epmi02_training_prod
select cast(numerator as float)/cast(denominator as float)*cast(1000 as float) target_variable,
cast(denominator as float)/cast(1000 as float) weight,
course_epmv,course_rpmv,course_interest,subcat_interest,persona
from (select course_epmv,course_rpmv,course_interest,subcat_interest,persona,
sum(enrolled) numerator,
sum(impressions) denominator
from dm_dataset_epmi02_prod where push_flag=0 and search_flag=0 and dataset='training'
group by
course_epmv,course_rpmv,course_interest,subcat_interest,persona) x;
https://udemywiki.atlassian.net/wiki/display/ENG/Modeling+Pipeline