Sprint 42 Review

Spark/EMR

  • Gained better understanding of processing kinesis streams
  • Gained better understanding of logging and configuring logging within EMR
  • Created repeatable steps for dev for launching cluster with Spark
  • Tested writing of individual records to
    • S3
    • ElasticSearch
    • HDFS
    • Local Filesystem

CIS Architecture

  • Created document for CIS Architecture
    • Shows changes to Dovetail System (domain events and pub/sub subsystem)
    • Shows changes to Connect Insights for Students
  • Several meetings have achieved a general consensus
  • A "live query" approach is being considered instead of a CIS pipeline.
  • Initial testing soon to determine if this is feasible.
    • Sample data loaded into a production scale ES cluster that is actively indexing data is needed

MHE Metrics

https://github.mheducation.com/MHEducation/mhe-aws-metrics

  • A general purpose library to support the recording of customer metrics
  • Support for custom metrics
  • Caches metrics for bulk writes (saves costs)
  • Added to the dvtl-input-api

Caliper Utility Library

https://github.mheducation.com/MHEducation/mhe-caliper-utils

  • Created a small library to perform operations on one or many Caliper events simultaneously
    • getVersion(events) - Identifies Caliper events as 1.0beta or 1.0
    • getShortName(events) - Returns a short name used to refer to the event
    • upgradeVersion(events, version) - Updates a collection of events to the specified version

Amazon Elasticsearch

  • Experienced problems on Production where memory constraints took out the Audit API
    • Attempt will be made this week to upgrade the instances for this cluster

Dovetail Pipeline

  • Created a Kinesis/Lambda version of the Spark/EMR pipeline we have been investigating
    • Detects older versions of events
    • Upgrades events to version 1.0
    • Indexes events to Elasticsearch
    • Adds audit record
  • Coming soon
    • SNS publishing
    • Custom metrics

Multiple Indices ES Cluster

  • Documented strategy
  • Socialized the strategy with Richard

Output/Query API

  • Pass-through implementation of ES query
  • Utilizes the multi-indices alias

Sprint

By James Cook