Movies and related products sales prediction

"My company wants to sell movie-related products. We need to choose the target movies ASAP!"

How should I choose the target movies?

Blu-ray

DVD

Should be estimated from the sales of Bluray/DVD

Data Source

  • Box office sales data (web scraping)
    • Boxofficemojo.com
  • DVD/Bluray sales (SQL)
    • ~2000 titles from year 2006-2015
    • Opusdata.com

Opening week gross?

Blu-ray/DVD total sales

Total

sold

units

(10 million)

Opening week gross (100 million $)

R^2 = 0.44
R2=0.44
BDDVD = \beta_0 + \beta_1x_1
BDDVD=β0+β1x1

Not good enough.

How about first month gross?

BDDVD = \beta_0 + \beta_1x_1
BDDVD=β0+β1x1

Wait.

It's not over yet.

Popularity Holding Index (PHI)

Popularity Holding Index (PHI) =

4th week gross per theater / average of first 2-week gross per theater

Weekly Gross ($)

Week

1.5

4

# of theaters

20%

100%

Most movie fades quickly

Weekly Gross ($)

Week

# of theaters

R^2 = 0.64
R2=0.64
BDDVD = \beta_0 + \beta_1x_1 + \beta_2x_2
BDDVD=β0+β1x1+β2x2

PHI

4week total gross (100 million $)

BD/DVD

(10million

units)

1. Four weeks total gross ($)

2. Popularity holding index (PHI)

3. Movie genre

Variables

Model

Total Blu-ray/DVD sales =

- 254700 + 0.348 x (4wk gross) + 530500 x (PHI) + (genre) 

genre example: Animation = 1420000

 

R^2 = 0.65
R2=0.65
BD DVD = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3
BDDVD=β0+β1x1+β2x2+β3x3
model = smf.ols('bddvdsale ~ total_4w_gross + w4_2w_phi + genres', df).fit()

"Bluray_DVD = beta0 + beta1*total_4w_gross + beta2*w4_2w_phi + beta3*genres"

PHI and 4week_gross are independent

Check, check

Music and History genre does not have enough sample cases

The final model predicted value vs true value

in 400 movie title from separate test data set

My suggestions

Bet on these movies now.

6108065

6008853

5937283

3706274

* Furious 7 and Home are not included as they have not pass week 4

2015 Box Office

DVD Prediction

root mean squared error(training_data) = 1669312
root mean squared error(testing_data) = 1779348

Popularity holding index (PHI) for rating

Thank you!

Movie lasting ever after

By tohei

Movie lasting ever after

How can we predict Blu-ray/DVD sales?

  • 1,303