MASTER IN INFORMATICS AND COMPUTING ENGINEERING
Supervisor: Vera Lucia Miguéis Oliveira e Silva
Second supervisor: Ivo Pereira
Maria João dos Santos Aguiar e Mira Paulo
1
- Easy Recovery of Investments.
- Supports an easy way of measuring campaign success.
- The most cost-effective way of reach customers.
EMAIL MARKETING
2
OPEN RATES
EMAIL OVERLOAD
MARKETING STRATEGIES FAIL
9999...
3
SENDER NAME & SUBJECT-LINE
Using Data Mining techniques, develop a model capable of classifying a certain subject-line, regarding its quality, from 1 to 5 stars.
The only factors recipients will see at first glance when opening the mail box are:
4
- Studies using Data Mining techniques are limited to a set of techniques;
- Studies using secondary data do not evaluate the impact of using personalized messages, neither the impact of the country and business sector to which the campaign is sent;
- Secondary studies are not applied to specific business problems.
5
- There exists several websites seeking to help customers on choosing the best subject-line, but contain merely descriptive guidelines;
- MailChimp, Market leader in marketing automation, offers an insufficient tool;
- Solutions in the market continue to be scarce and become easily outdated.
6
1.
2.
3.
To predict the subject quality taking into account not only structural but also content features, which could impact the overall subject quality.
To compare performance results regarding distinct data mining techniques.
To increase E-goi customers engagement with the E-goi platform, through an innovative and helpful service capable of analyzing a subject-line quality.
7
DATA RESTRICTIONS
- Comprised of 140. 000 email campaigns;
- Sent to at least 100 subscribers;
- Sent over one week before data collection;
- PT, EN or ES languages.
VARIABLES
- Subject-line,
- Open Rate,
- Country,
- Sector
Most relevant languages
8
Equal width binning technique.
9
Open Rate Distribution
SUBJECT LINE
STRUCTURAL FEATURES
CONTENT FEATURES
PAST PERFORMANCE
BAG OF WORDS
10
- Number of words
- Number of characters
- Upper Case Percentage
- Punctuation
- Prefixes
- Emojis
- Personalization
- Special characters
- Numbers
- Currency
FEATURES
11
1st Approach: Past Performance
- Number of lemmas
- Lemmas Past Performance
- Average
- Weighted Average
- Maximum
FEATURES
12
2nd Approach: Bag of Words
- List of TF-IDF values (relavence)
TERM FREQUENCY–INVERSE DOCUMENT FREQUENCY.
FEATURES
13
Feature: Number of words
14
Feature: Currency
15
Feature: Lemmas Past Performance
16
17
1.
Sector + Country + Structural Features
2.
2.1
2.2
2.3
Sector + Country + Structural Features + Past Performance Features
Weighted Average
Average
Maximum
3.
Sector + Country + Structural Features + Bag of Words Features
- Naive Bayes
- Support Vector Machine
- Random Forest
- Decision Tree
- Gradient Boosting
- Neural Network
18
19
What is the most accurate model we can get regarding the business challenge?
Percentage of the correctly labeled subjects to the total of subjects.
Seek a balance between Precision and Recall, avoiding false negatives and false positives, for each of the five classes.
ACCURACY
F1 SCORE
20
Perfect Recall
Perfect Precision
21
| Exp. 1 | Random Forest | 60.4% | 60.2% |
|---|---|---|---|
| Exp. 2.1 | Random Forest | 61.7% | 62.1% |
| Exp. 2.2 | Random Forest | 62.2% | 62.4% |
| Exp. 2.3 | Random Forest | 61.5% | 61.8% |
| Exp. 3 | Random Forest | 60.3% | 60.6% |
EXPERIMENT
ALGORITHM
ACCURACY
F1 SCORE
PREDICTED
ACTUAL
Confusion matrix : Experiment 2.2
22
23
1.
2.
3.
New agnostic tool for supporting customers on creating engaging and relevant subject-lines, contributing to more emails opened and, thus, more successful marketing campaigns.
First tool embedded into a marketing automation platform, and prepared for adjusting itself to the natural evolution of trends and patterns over time.
25
Effective prediction of the subject quality: 62.4% in terms of Accuracy and 62.2% in terms of F1 Score.
4.
Research paper submitted to International Journal of Production Economics.
26
- Time and day of the week the email campaign was delivered;
- Sender recognition and reputation;
- Customer database quality;
- Subjectiveness when talking about an interesting email campaign.
Maria João Mira Paulo