Streamlining Site Migrations 

Agenda

  • Review of the migration tool/concept: Quack
    • Philosophy
    • Process 
    • Code
  • Discuss going forward

(Q)uack Philosphy

  • Distributed
    • Able to run on multiple machines  so we can control speed and so that it scales to any site
  • Source Agnostic
    • Able to import data from any site and source or mix of sources
      • DB Connection
      • Binary files
      • Site Scraping
  • Modular and Reusable
    • Written in a manner that allows us to reuse code from one migration to the next

Distributed

Uses AWS SQS to distribute the work across any number of machines / EC2 instances.

Cluster SSH running jobs

Source Agnostic

The data driver is modular so we can pull data using page scraping, a db connection, binary files,  or combination of multiple sources.  This lets you get the data faster.

function getImage(post_id, typeOfCall) {
    if ( typeof typeOfCall === 'undefined') {
        typeOfCall = "/?p=";
    } else {
        typeOfCall = "/?attachment_id=";
    }
    return new RSVP.Promise(function(resolve, reject) {
        curl.fromUrl(config.live_url + typeOfCall + post_id, function(err, res) {
            if (!err && typeOfCall == "/?attachment_id=" && res.images.length > 0) {
                res.image = res.images[6].replace('-300x300.jpg', '.jpg');
                res.link = config.live_url + typeOfCall + post_id;
                res.title = res.title.replace(' - Prima', '').replace(/-/g, ' ').trim();
                if (res.openGraphDescription == '') {
                    res.openGraphDescription = res.title;
                }
                resolve(res);
            }
            reject('Curl Failed');
        });
    });
}

Modular and Reusable

Quack is written in NodeJS and relies heavily on modules and promises. This allows us to reuse code between migrations and make use of the NPM

High Level Overview of Process 

 

main.js

  1. Define 'message' (ID and job_name)
  2. create corresponding job_name function that uses the message ID as a parameter 
  3. Create Q
  4. Populate Q with all messages
  5. Create EC2 Instances
  6. Upload self onto instance
  7. Run Consumer. JS

Practically Speaking

  1. Define URLs as message, and the job_name as process_url.
  2. Define process_url function to scrape the url given for the fields needed.  
  3. For Prima I did this manually and put it in a config file.
  4. Use a sitemap or crawler to populate the Q with all the URLS you want to import.
  5. For Prima I did this manually.
  6. For Prima I did this manually 
  7. For Prima I did this manually 

High Level Overview of Process 

 

Consumer.js

  1. Get message from Q
  2. Run job from message and pass ID as parameter
  3. Delete message only on success
  4. Get next Message

Practically Speaking

  1. get the URL and job from the Q
  2. Pass URL to the the process_url function
  3. Delete message from Q.
function processMessage(message) {
    var postMessage = JSON.parse(message.Body);
// postMessage.type decides what work we need to run
    switch (postMessage.type) {
        case 'article':
// postMessage.post_id is the message lookup ID to run the job
            getArticle(posts[postMessage.post_id]) 
                .then(getFeatureImage) 
                .then(insertImage)
                .then(insertEditor)
                .then(addRedirects)   
                .then(insertArticle) 
                .then(function(post){
//DELETE MESSAGE FROM Q
                 removeFromQueue(message);
//GET NEW MESSAGE
                 main();
                })
                .catch(handleErr);
            break;
        case 'gallery':
            getGallery(posts[postMessage.post_id])
                .then(getFeatureImage)
                .then(insertImage)
                .then(insertEditor)
                .then(addRedirects)
                .then(getGalleryImages)
                .then(insertGallery)
                .then(function(post) {
                    removeFromQueue(message);
                    main();
                })
                .catch(handleErr);
            break;
    }
}

Beta Code

only on success

Future Plans

  • Write a RAMS DB connector and remove current PHP dependency
  • refactor code so its more modular
  • automate the site scaffolding
  • create and update RAMS models to handle features as necessary
    • recipes
    • collections 
    • tv shows
  • Create frontend UI 
  • 2 to X  weeks of work?

now what.

Streamlining Site Migrations

By Daniel Nakhla

Streamlining Site Migrations

Quick Intro into the dev behind migrating site content into mediaos

  • 1,100