Task
- Legacy static online learning site handcrafted over 10 years
- Built with Dreamweaver + Adobe Director
- 1000+ different individual "learning modules"

Solution
- Django + Scrapy
- Screen scrape the legacy site, populate Django models
+ 
Django + Scrapy
requirements.txt:Scrapy==0.16.4settings.py:
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'ingest.settings')apps/ingest/settings.py:
INSTALLED_APPS = [
'ingest',
]BOT_NAME = 'ingest'
BOT_VERSION = '1.0'
SPIDER_MODULES = ['ingest.spiders']
NEWSPIDER_MODULE = 'ingest.spiders'
DEFAULT_ITEM_CLASS = 'ingest.items.IngestItem'
USER_AGENT = '%s/%s' % (BOT_NAME, BOT_VERSION)
ITEM_PIPELINES = [
'ingest.pipelines.DatabaseModelsPipeline',
]
INgest IT...
apps/ingest/management/commands/ingest.py:
import os
from django.core.management.base import BaseCommand, CommandError
from scrapy.cmdline import execute
class Command(BaseCommand):
help = 'Ingest the old project'
def handle(self, *args, **options):
execute(['scrapy', 'runspider', os.path.join(settings.PROJECT_PATH, 'apps/ingest/spiders/ingest_spider.py')])
kCHEESETHXBYE
@eedeep
dan@commoncode.com.au
Scrape My Django
By eedeep
Scrape My Django
- 1,500