Task


  1. Legacy static online learning site handcrafted over 10 years
  2. Built with Dreamweaver + Adobe Director
  3. 1000+ different individual "learning modules"

Solution


  1. Django + Scrapy
  2. Screen scrape the legacy site, populate Django models



+

Django + Scrapy


requirements.txt:
Scrapy==0.16.4
settings.py:
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'ingest.settings')

INSTALLED_APPS = [
'ingest',
]

apps/ingest/settings.py:
BOT_NAME = 'ingest'
BOT_VERSION = '1.0'
SPIDER_MODULES = ['ingest.spiders']
NEWSPIDER_MODULE = 'ingest.spiders'
DEFAULT_ITEM_CLASS = 'ingest.items.IngestItem'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
ITEM_PIPELINES = [
'ingest.pipelines.DatabaseModelsPipeline',
]



INgest IT...


   apps/ingest/management/commands/ingest.py:

import os
from django.core.management.base import BaseCommand, CommandError
from scrapy.cmdline import execute

class Command(BaseCommand):
    help = 'Ingest the old project'
    def handle(self, *args, **options):
execute(['scrapy', 'runspider', os.path.join(settings.PROJECT_PATH, 'apps/ingest/spiders/ingest_spider.py')])


kCHEESETHXBYE


@eedeep

dan@commoncode.com.au

Scrape My Django

By eedeep

Scrape My Django

  • 1,500