Victoria Shoes
Major Product Launch
Project Summary
Complete Website Overhaul
UX High Fidelity Mocks Complete
May 2021 MVP Delivery
August 2021 Product Launch
~250k Daily Page Views Current
~25M Daily Page Views Expected
Ensure Proper Site Reliability
Dynamic Deployment on Launch Date
Upskill/Train Customer Engineers
Restricted A/B Focus Testing Environment
Application Monitoring and Notifications
On-Demand Deployment
Main Concerns
Lithuanian Separatist Group
Previous DDOS Attacks
Focus Group Testing
Multiple Versions Available
CMO to Approve Each Feature
Preventing Leaks
Site Reliability
No Dedicated Database Administrators
Engineer Training
Hosting Platform
Public Cloud or On-Premises Hosting
Datacenter Investment
Concern:
Lithuanian Group
Knowns
Crude DDOS attack indicates unsophisticated technical knowledge
VMs crashed at approximately 2500 requests/min
Requests for large assets separate from page requests
Assumptions
Attacks will continue with new product launch
Attacks may become more sophisticated
Attacks will likely target commonly known exploits and attack vectors
Mitigation Plan
Store thumbnail/various scaled assets on CDN
Implement auto IP banning solutions for
Number of Authentication Requests
Number of Requests per Minute
Known Botnet Agents
Number of 404 Requests per Minute
Known bad routes
Implement pre-forking web services
Reduces the chance of memory leaks
Allows for parallel request processing
Blocking requests are killed without cross-interference
CAPTCHAS
Prevent brute force authentication attempts
Concern:
Focus Group Testing
Knowns
Focus group must be secured from internet traffic
Requires multiple versions for A/B testing and approval
CMO must approve before feature is accepted
Assumptions
The presentation layer will be the primary differentiation between focus group versions
Development velocity should be minimally impacted by focus group testing
Mitigation Plan
Scalable application architecture
CLEAN (Adapter/Port Pattern)
Allows for multiple versions and interchangeable components
Separates business logic from implementation details
Allows developers to continue with new features while awaiting approval from CMO
Variants can be deployed utilizing proper configuration management tools
Automated deployment and environment configurations allow for rapid setup with minimal staff intervention
Concern:
Site Reliability
Knowns
Site catalog and content stored in MySQL database
Relatively small (<500) catalog of products
Current database schema to be utilized going forward
Production support via customer engineers
Assumptions
The database schema is not likely optimized
On-premises data center cannot handle expected site traffic
Most of the site is fairly static in nature
Engineers are not currently equipped to support site
Mitigation Plan
A deep review of database optimizations
Proper indices, storage, column types, and sizes
Hardware evaluation
Query profiling and Caching
Self-documenting code
Documentation is never stale
Request/Response validation is automatic
Utilize in-memory caching services, e.g. Memcached
Minimizes disk reads
Improves response times
Application and system monitoring
User telemetry collecting
Monitoring and reporting tools
Concern:
Hosting Platform
Hosting Contention
Large capital expenditure has already been utilized to upgrade the data center.
It is assumed by some that a public cloud provider is the only cost-effective solution.
Launch date traffic increase is estimated to be 100x the current traffic.
On-prem and public cloud providers both have their strengths and challenges.
Any downtime is not an option.*
Downtime is Unavoidable
Inter-Connect Peering and Backbone Providers
Disputes over billing and data transmissions
Aging equipment
Malicious DDOS Attacks
Internet Service Providers
Electrical Grid Outages
Downed Transmission Lines
Construction Mishaps
Trans-oceanic Cables
Sharks...
Ship anchors, and other equipment
How to Maximize Uptime?
Host on-premises, hoping for the best.
Host development and testing environments on-premises, deploy production on AWS.
Host on-premises and scale to AWS as needed.
Host on AWS and fail-over to on-premises as needed
Host entirely on AWS, recouping some of the data center investment via the sale of equipment.
Pros
Scalability is only limited by operational cost
Many support options available
No single point of failure
Amazon retail integrations
Cons
The operational cost can quickly get out of hand
Hard to move away from AWS services once setup
Configuration and setup can be fairly confusing for the non-initiated
Public Cloud (AWS)
Pros
More granular control of the system
Easier to port to a public cloud provider later
Utilizes the updated data center
Lower operational cost
Cons
Limited support options
The scale is limited by hardware footprint, capital budget, physical resources
Points of failure are not distributed
Self-Hosting
Plan: Benchmark Tests
Unknowns:
The necessary level of support.
The capabilities of the customer's data center.
The skillset of engineers.
It is nearly impossible to determine the most cost-effective solution without data.
Cost analysis must be performed to determine the scale and operational cost of each option.
Maintainability will be determined by engineers' skillset.
A hybrid solution is likely the most effective option.
Suggestion:
Development Strategy
Clean Architecture
Automated Integration
Code is branched for new features
Peer-Reviewed
Automated unit, integration, regression tests
Create feature flag and artifact
Artifacts deployed to focus group
User Testing/Feedback
CMO approves artifact
The artifact is tagged and integrated
Application configuration updated and tagged
Production-ready
Final application configuration is deployed
Tools/Services
Declarative CI/CD pipelines
Drone.io
Concourse CI
Artifact repositories
Sonatype Nexus
JFrog
Containerized configurations
Suggestion:
Hosting Strategy
AWS
ELB
Highly Available
Security Services
Autoscaling
EC2
Application Components
CloudFront
Global CDN
Datacenter
CI/CD Pipelines
Artifact Repository
Libraries/tools
Container images
Focus Group
VPN secured
Mock environment using containerized components
Hybrid Solution*
Suggestion:
Deployment Strategy
Deployment
Deploy the current content, revamped when:
Passes focus group
Approved by CMO
All tests pass
This allows time for actual user feedback:
To fix bugs and user issues
Hardening of framework
New content should utilize same framework
Flagged as unavailable/inactive
Activate new content utilizing
Internal company chat services
Twitter bot
Webhook
Plan:
Production Support and Reliability Strategy
Proactive Monitoring
Leverage WWT partners for ingesting metrics from full-stack
Network bandwidth and request metrics
Application usage, bottlenecks and stack trace
System resources
Receive alerts before a problem occurs
Utilize visualization tools, e.g. Grafana, to recognize problematic patterns
Services like Chronograf to review and search logs related to customer issues
Site Reliability
Leverage AWS auto-scaling features to handle sudden increases in site traffic
Distribute traffic across multiple regions, and properly utilize CDNs to optimize site availability and responsiveness
Utilize Infrastructure as Code to enable deployment across various cloud providers if necessary
Implement tooling to detect malicious activity and automatic IP banning
Made with Slides.com