Victoria Shoes
Major Product Launch
Project Summary
- Complete Website Overhaul
- UX High Fidelity Mocks Complete
- May 2021 MVP Delivery
- August 2021 Product Launch
- ~250k Daily Page Views Current
- ~25M Daily Page Views Expected
- Ensure Proper Site Reliability
- Dynamic Deployment on Launch Date
- Upskill/Train Customer Engineers
- Restricted A/B Focus Testing Environment
- Application Monitoring and Notifications
- On-Demand Deployment
Main Concerns
- Lithuanian Separatist Group
- Previous DDOS Attacks
- Focus Group Testing
- Multiple Versions Available
- CMO to Approve Each Feature
- Preventing Leaks
- Site Reliability
- No Dedicated Database Administrators
- Engineer Training
- Hosting Platform
- Public Cloud or On-Premises Hosting
- Datacenter Investment
Concern:
Lithuanian Group
Knowns
- Crude DDOS attack indicates unsophisticated technical knowledge
- VMs crashed at approximately 2500 requests/min
- Requests for large assets separate from page requests
Assumptions
- Attacks will continue with new product launch
- Attacks may become more sophisticated
- Attacks will likely target commonly known exploits and attack vectors
Mitigation Plan
- Store thumbnail/various scaled assets on CDN
-
Implement auto IP banning solutions for
- Number of Authentication Requests
- Number of Requests per Minute
- Known Botnet Agents
- Number of 404 Requests per Minute
- Known bad routes
-
Implement pre-forking web services
- Reduces the chance of memory leaks
- Allows for parallel request processing
- Blocking requests are killed without cross-interference
-
CAPTCHAS
- Prevent brute force authentication attempts
Concern:
Focus Group Testing
Knowns
- Focus group must be secured from internet traffic
- Requires multiple versions for A/B testing and approval
- CMO must approve before feature is accepted
Assumptions
- The presentation layer will be the primary differentiation between focus group versions
- Development velocity should be minimally impacted by focus group testing
Mitigation Plan
- Scalable application architecture
- CLEAN (Adapter/Port Pattern)
- Allows for multiple versions and interchangeable components
- Separates business logic from implementation details
- Allows developers to continue with new features while awaiting approval from CMO
- Variants can be deployed utilizing proper configuration management tools
- Automated deployment and environment configurations allow for rapid setup with minimal staff intervention
Concern:
Site Reliability
Knowns
- Site catalog and content stored in MySQL database
- Relatively small (<500) catalog of products
- Current database schema to be utilized going forward
- Production support via customer engineers
Assumptions
- The database schema is not likely optimized
- On-premises data center cannot handle expected site traffic
- Most of the site is fairly static in nature
- Engineers are not currently equipped to support site
Mitigation Plan
- A deep review of database optimizations
- Proper indices, storage, column types, and sizes
- Hardware evaluation
- Query profiling and Caching
- Self-documenting code
- Documentation is never stale
- Request/Response validation is automatic
- Utilize in-memory caching services, e.g. Memcached
- Minimizes disk reads
- Improves response times
- Application and system monitoring
- User telemetry collecting
- Monitoring and reporting tools
Concern:
Hosting Platform
Hosting Contention
- Large capital expenditure has already been utilized to upgrade the data center.
- It is assumed by some that a public cloud provider is the only cost-effective solution.
- Launch date traffic increase is estimated to be 100x the current traffic.
- On-prem and public cloud providers both have their strengths and challenges.
- Any downtime is not an option.*
Downtime is Unavoidable
- Inter-Connect Peering and Backbone Providers
- Disputes over billing and data transmissions
- Aging equipment
- Malicious DDOS Attacks
- Internet Service Providers
- Electrical Grid Outages
- Downed Transmission Lines
- Construction Mishaps
- Trans-oceanic Cables
- Sharks...
- Ship anchors, and other equipment
How to Maximize Uptime?
- Host on-premises, hoping for the best.
- Host development and testing environments on-premises, deploy production on AWS.
- Host on-premises and scale to AWS as needed.
- Host on AWS and fail-over to on-premises as needed
- Host entirely on AWS, recouping some of the data center investment via the sale of equipment.
Pros
- Scalability is only limited by operational cost
- Many support options available
- No single point of failure
- Amazon retail integrations
Cons
- The operational cost can quickly get out of hand
- Hard to move away from AWS services once setup
- Configuration and setup can be fairly confusing for the non-initiated
Public Cloud (AWS)
Pros
- More granular control of the system
- Easier to port to a public cloud provider later
- Utilizes the updated data center
- Lower operational cost
Cons
- Limited support options
- The scale is limited by hardware footprint, capital budget, physical resources
- Points of failure are not distributed
Self-Hosting
Plan: Benchmark Tests
-
Unknowns:
- The necessary level of support.
- The capabilities of the customer's data center.
- The skillset of engineers.
-
It is nearly impossible to determine the most cost-effective solution without data.
- Cost analysis must be performed to determine the scale and operational cost of each option.
- Maintainability will be determined by engineers' skillset.
- A hybrid solution is likely the most effective option.
Suggestion:
Development Strategy
Clean Architecture
Automated Integration
- Code is branched for new features
- Peer-Reviewed
- Automated unit, integration, regression tests
- Create feature flag and artifact
- Artifacts deployed to focus group
- User Testing/Feedback
- CMO approves artifact
- The artifact is tagged and integrated
- Application configuration updated and tagged
- Production-ready
- Final application configuration is deployed
Tools/Services
- Declarative CI/CD pipelines
- Drone.io
- Concourse CI
- Artifact repositories
- Sonatype Nexus
- JFrog
- Containerized configurations
Suggestion:
Hosting Strategy
AWS
- ELB
- Highly Available
- Security Services
- Autoscaling
- EC2
- Application Components
- CloudFront
- Global CDN
Datacenter
- CI/CD Pipelines
- Artifact Repository
- Libraries/tools
- Container images
- Focus Group
- VPN secured
- Mock environment using containerized components
Hybrid Solution*
Suggestion:
Deployment Strategy
Deployment
- Deploy the current content, revamped when:
- Passes focus group
- Approved by CMO
- All tests pass
- This allows time for actual user feedback:
- To fix bugs and user issues
- Hardening of framework
- New content should utilize same framework
- Flagged as unavailable/inactive
- Activate new content utilizing
- Internal company chat services
- Twitter bot
- Webhook
Plan:
Production Support and Reliability Strategy
Proactive Monitoring
- Leverage WWT partners for ingesting metrics from full-stack
- Network bandwidth and request metrics
- Application usage, bottlenecks and stack trace
- System resources
- Receive alerts before a problem occurs
- Utilize visualization tools, e.g. Grafana, to recognize problematic patterns
- Services like Chronograf to review and search logs related to customer issues
Site Reliability
- Leverage AWS auto-scaling features to handle sudden increases in site traffic
- Distribute traffic across multiple regions, and properly utilize CDNs to optimize site availability and responsiveness
- Utilize Infrastructure as Code to enable deployment across various cloud providers if necessary
- Implement tooling to detect malicious activity and automatic IP banning
Victoria Shoes
By Teagan Glenn
Victoria Shoes
- 984