A recommendation engine for WordPress in 100 lines

by Piero Savastano
feat. Pollo Watzlawick 

€€€

GOAL: Increase user engagement

  • deadline:            yesterday
  • infrastructure:  20 €/y hosting
  • budget:              1500 visibility points

Mr Watzlawick reasonable request

good old Collaborative filtering

Reality

check

  • not all users are logged in
  • CMS not built for batch processing
  • PHP good for apps but not for data science

Let's get creative

  • content based recommendation
    (no need for user data)
     
  • update only a few recommendations at a time (online training)
     
  • leverage existing WP actions & filters

Similarity graph

{
    "A":{
            "B": 0.1,
            "C": 0.8,
        },
    "B":{
            "A": 0.1,
            "C": 0.5,
        },
    "C":{
            "A": 0.8,
            "B": 0.5,
        }
}

Similarity graph

wp_postmeta table

Similarity between posts

Mr Watzlawick is a charming chicken delivering brilliant communication

Paul Watzlawick proposed a brilliant theory of human communication

Similarity between posts

Mr Watzlawick is a charming chicken delivering brilliant communication

Paul Watzlawick proposed a brilliant theory of human communication

count

intersection

count

union

Similarity between posts

Mr Watzlawick is a charming chicken delivering brilliant communication

Paul Watzlawick proposed a brilliant theory of human communication

{ mr, watzlawick, charming,

chicken, delivering, brilliant, communication, paul, proposed, theory, human }

{ watzlawick, brilliant,

communication }

count

intersection

count

union

Similarity between posts

Mr Watzlawick is a charming chicken delivering brilliant communication

Paul Watzlawick proposed a brilliant theory of human communication

{ mr, watzlawick, charming,

chicken, delivering, brilliant, communication, paul, proposed, theory, human }

{ watzlawick, brilliant,

communication }

count

intersection

count

union

3

11

= .27

Similarity between posts

    /**
     * Measure text similarity between two posts
     * using Jaccard/Tanimoto/Dice coefficient.
     */
    public static function similarity( $post_id_a, $post_id_b ) {
	
	// Transform each post in a bag of words
	$bow_a = self::post_2_bag_of_words( $post_id_a );
	$bow_b = self::post_2_bag_of_words( $post_id_b );
		
	if( empty( $bow_a ) || empty( $bow_b ) ){
		return 0;
	}
		
	// Compute Tanimoto
	$intersection = array_unique( array_intersect( $bow_a, $bow_b ) );
	$union        = array_unique( array_merge( $bow_a, $bow_b ) );
		
	return count( $intersection ) / count( $union );
    }

Similarity between posts

/**
 * Transform post content in a bag of words
 */
public static function post_2_bag_of_words( $post_id ) {
		
    $post = get_post( $post_id );
	
    if( is_null($post) ){
        return array();
    }
		
    $post_content  = $post->post_content;
    $clean_content = strtolower( wp_strip_all_tags( $post_content ) );
		
    $clean_content = str_replace(
        array(".", ",", ":", ";", "!", "?", "'", '"', "(", ")" ), "", $clean_content
    );   // No punctuation
		
    $tokens = array_unique( explode( " ", $clean_content ) );   // Separate words
		
    return array_diff( $tokens, self::$stopwords );             // No stopwords
}

Update the similarity graph

/**
 * Build recommendations at post save
 */
function dropout_update_recommendations( $post_id ) {
	
	// kick asses right here
}
add_action( 'save_post', 'dropout_update_recommendations' );

Update the similarity graph

    
   
    if( !in_array( get_post_type( $post_id ), array( 'post', 'page') ) ){
    	return;
    }
    
    // Take latest 50 posts
    $latest_posts = wp_get_recent_posts(
    	array(
    		'numberposts' => 50,
    		'post_type'   => array( 'post', 'page' ),
    		'post_satus'  => 'publish'
    	)
    );

Update the similarity graph


$similarities = array();
	
foreach( $latest_posts as $another_post ){
	
  if( $another_post['ID'] !== $post_id ) {
		
    // Measure similarity
    $similarity = Dropout::similarity( $post_id, $another_post['ID'] );
    $similarities[ $another_post['ID'] ] = $similarity;
    
    // ... more asses to kick here
  }
}
	
// Save recommendations for this post
update_post_meta( $post_id, 'dropout_recommendations', $similarities );

Update the similarity graph


$similarities = array();
	
foreach( $latest_posts as $another_post ){
	
  if( $another_post['ID'] !== $post_id ) {
		
    // Measure similarity
    $similarity = Dropout::similarity( $post_id, $another_post['ID'] );
    $similarities[ $another_post['ID'] ] = $similarity;
    
    // Update similarities for the other post
    $already_computed_similarities =
        get_post_meta( $another_post['ID'], 'dropout_recommendations', true );
    
    $already_computed_similarities[$post_id] = $similarity;
    
    update_post_meta( $another_post['ID'], 'dropout_recommendations', $already_computed_similarities );
  }
}
	
// Save recommendations for this post
update_post_meta( $post_id, 'dropout_recommendations', $similarities );

Query the similarity graph

/**
 * Get recommendations when a post is shown
 */
function dropout_get_recommendations( $content ) {
	
        // kick asses here like there is no tomorrow
}
add_filter( 'the_content', 'dropout_get_recommendations' );

Query the similarity graph


// Get similarities with other posts
$recommendations = get_post_meta( get_the_ID(), 'dropout_recommendations', true );
if( empty( $recommendations ) ) {
    return $content;
}
	
// Sort by similarity
arsort( $recommendations ); 

// Take n most similar                       
$recommendations = array_slice($recommendations, 0, 5, true);

Query the similarity graph


	
// Loop over recommendations and print link, title, similarity.
// TODO: this should be templatable
$html = '<h3>See also</h3><ul>';
foreach( $recommendations as $recom_id => $similarity ) {
	$recom_post = get_post( $recom_id );
	if($recom_post && ($recom_post->post_status=='publish') ) {
		$html .= '<li><a href="' . get_permalink($recom_post->ID) . '">'
                      . $recom_post->post_title
                      . '</a> (' . round($similarity, 5) .')</li>';
		}
	}
	$html .= '</ul>';
	
// Return content with added recommendations
return $content . $html;

Code tour

Conclusions

  • only recommending latest posts
  • not measuring word importance - TFIDF
  • it's more than 100 lines 
  • if you want to dig: k-nearest-neighbors

What is ML

"Field of study that gives computers the ability to learn without being explicitly programmed"

A. Samuel

Artificial Intelligence

Bio-inspired AI

Machine

Learning

Deep

Learning

Explicit Programming

f is coded by the programmer:

step by step instructions to map x into y

Machine Learning

programmer gives the machine examples on x and y:

the machine figures out f on its own

y = f(x)

Data flow

The Pollo must go on