ASOS Technical Assessment

George Kierstein, Diana Kantor
Richard Kauffold, Ryan Berkheimer

NCEI/SESB 2016

Outline

  • Technical Assessment
  • Options and Recommendations
  • ASOS Overview
  • Further Information

ASOS Overview 

  • Workflow
  • System Diagram
  • Core Functionality

Core Functionality

Ingests, qc's and formats ASOS/AWOS instrument data incorporating other data sets. 

  • Low-Resolution
  • High-Resolution
  • Types of QC
  • ASOS/AWOS

Core Functionality

ASOS/AWOS

ASOS: Automated Surface Observing System

  • At ~960 of the larger airports and some other locations
  • Collects elements that include temperature, precipitation, pressure, winds, visibility, present weather and others
  • Data transmitted hourly via METAR messages and also provided in Summary of Day and Month messages
  • Additional data on higher resolution (1-min/5-min also collected)

Core Functionality

ASOS/AWOS

AWOS: Automated Weather Observing System

  • At ~1000 of the smaller airports
  • Typically provide only METAR observations (hourly or 20-minute frequency)
  • Some observe fewer elements than the ASOS stations

Core Functionality

Low-Resolution Data Ingest

  • Transmitted over the GTS [backup through Modem dial-in] and ingested by Data Operations Branch for about 2000 stations
  • METAR (observations reported hourly, on 20-minute timescales, or whenever weather conditions require a Special report)
  • DSM/MSM (Daily and Monthly summary messages)
  • CF6 (Preliminary listing of Summary of Day/Month statistics for the month via NWS WFO AWIPS; may include values corrected by NWS personnel)
  • MAPSO (Observations taken at 6 Pacific Island stations)
  • CRN and Solar Data added before publication

Core Functionality

High-Resolution Data Ingest

  • Collected via modem dial-in to ~960 stations
  • 1-minute/5 minute data
  • Additional source for fill-in of otherwise missing data as described on previous slides
  • System Log information for NWS Diagnostics
  • Provided via ftp and in DSI-640x series

NOTE: This modem dial-in data source is referred to as ‘ASOS Instrument Network’ on subsequent slides.

Core Functionality

Types of QC

Interactive QC on a subset of ~480 ASOS stations
LCD Publications, CD Publications
Flags set in Level 1 are reviewed by Met Tech

Level 1

Automated checks; sets flags for all but CRN and SURFRAD networks

Level 2

ISD Format checking; run in DS and again by DAB All stations receive this level of QC at a minimum

Level 3

Workflow

3 Stages

Initial Ingest and automated QC

Computer-Aided Manual QC

Final Formatting, publishing and Archival

System Diagram

Stage 1 - Ingest & QC

Stage 2 - Manual QC

Stage 3 - Publishing

Workflow

Stages over time

System Diagram

Technical Assessment Findings

  • Workflow
  • Production/Maintence
  • "Low-Level"
  • General Observations
  • Code Metrics

 Findings: Low - Level

Codebase Lacks consistency across system.

Missing effective or comprehensive error handling.

Leverages approaches and language features that are no long preferred for system development.

Un-integrated polyglot system.
(FORTRAN, Java, Bash, Cron, etc.)

Findings: Workflow 

No comprehensive logging.

No error resilience and high reliability requirements.

Boundaries between workflow and computational tasks inconsistent.
(i.e. little internal uniformity)

Primarily manual introducing a high risk of un-recoverable human errors.

Findings: Production/Maintenance

Inflexible deployment approach.

Issues require human intervention from a skilled operator familiar with the system architecture, etc.

Single point of failure (Humbolt)

Missing means to monitor status in real-time without understanding the tasks being taken and where to look for intermediate results.

No resilience to intermittent ASOS device availability. 

Findings: General Observations

Nicely de-composable at the task and unit level.

Many tasks are not precision math with high computational cost.

Significant amounts of dead code in current codebase.

Output formats well-defined and slow-changing.

Findings: General Observations

ASOS Instrument Network

Changes very slowly.

Data format extremely stable.

Dial-in tech unchanged for almost a decade.

Upcoming improvements include direct networking access.

MADIS already provides almost all data we obtain via dial-up.

Findings: Code Metrics

Phase 1 (Ingest)
 

Each functional unit (i.e. CF6 ingest) is broken down separately followed by the total for the ingest phase:

    

     Sources

              METAR:                   2037
              DSM/MSM:             2146
              CF6:                           878
              ASOS Network:      2488
              MAPSO:                   3130

 Total LOC:   10,679

Findings: Code Metrics

NOTE : Phase 2 - Manual QC with a GUI isn't commenserable with the LOC metric

The codebase totals approximately 75k lines of executable code. Most of this is generated. Approximately 50% is business logic.

Findings: Code Metrics


This section is monolithic being a number of complex BASH scripts that are hard-coded into one piece of application logic.
 

 

 

 

Contents

    Scripts:                12,605
    Ish_drvr.f:                370  
    Blddfty.f:                  267
    Mk3505name.f:        53
    Fixpacisd.f:              146

Total LOC:                13,441

Phase 3 (Publishing)

Recommendations

  • Global Concerns
  • Language Specific
  • Architecture
  • ASOS Instrument Ingest
  • Summary

Architecture

  • Modules
  • Units
  • Terminology
  • Workflow

Terminology

Workflow Stages

Units

Modules

Modules

A module comprises a single computational task communicating results via a plugin-api.

  • Java/FORTRAN interop (JMA/JNI)
  • Logging
  • Error Handling
  • Event messaging

Error Handling

Logging

Events

Module API

Units

A unit is a conceptually grouped set of tasks comprised of modules and workflow.

  • Manages execution and monitoring of each module that is part of its internal workflow
  • Responds to errors or events that alter ​the internal workflow reported by modules.

Modules

Modules

Unit API

Ex: Unit Completed Action

Ex: Recoverable Module error

Workflow Pipeline

Encapsulates and executes business logic of set of functional units.

  • Responds to errors or events that alter the overall workflow reported by units.
  • Provides status updates, reporting and opportunity for immediate error recovery.
  • Handles external events from users who play a role in the workflow.

Unit

Unit

Unit

Enforces Business Logic

Handles Errors
 

Ingest System

Archive

Downstream Processing

User Activity

Reporting

Global Concerns

Remove dead code.

Move hard-coded paths, deployment/runtime configuration options into a system-global set of properties that can be set during deployment or at runtime.


Build a comprehensive and well-structured code repository.

Add automated build and deployment tools.

Language-Specific

See Assessment for detailed breakdown of tasks in each language.

Language-level concerns differ in scope and complexity for each language. 

Sample of Java Details

  • Delete commented-out or otherwise dead code.
  • Eliminate hard-coded paths, emails, etc., which are pervasive
    throughout most of the files.
  • Separate monolithic methods into separate functions for API integration.
  • Reorganize files according to related functionality.
  • Reorganize project architecture.

Sample of FORTRAN Details

  • Reformat for consistency and readability.

  • Delete commented-out code.

  • Add or enhance top-level documentation for each file.

  • Eliminate hard-coded paths, emails, etc., which are pervasive throughout most of the files.

  • Replace ‘include’ directives with ‘use’ statements pointing to well-organized modules.

  • Merge files containing only very small subroutines or variable declarations into larger files to eliminate the need for 80 separate Fortran files.

  • Reorganize files according to related functionality, rather than keeping all Fortran files together in a single directory.

  • Very minimal GOTO removal only where obvious and easy.

ASOS Instrument Ingest 

Using MADIS to upgrade ingesting
'High-Res' ASOS Instrument data.

Module

Unit API

  • Changed from executable to library loaded by module.
  • Add logging at decision-points and critical sections
  • Add status reporting for progress and events (such as errors).
  • Crashes contained and recoverable.

(ASOS Instrument Network)

Unit

Unit

Unit

Unit

Unit

Functional Module API

  • Each computational task encapsulated in a Module.
  • Business logic expressed declaratively as configuration files.
  • Functional library listens to events from Modules and responds accordingly.
  • Can re-run failed modules, etc.
  • Aggregates logging from each module, sends events to workflow.
     

Generalized Functional Module API and container implemented

(ASOS Instrument Network)

Workflow 

Workflow Engine designed to integrate into Ingest System

(ASOS Instrument Network)

Station Unreachable

Dial-Out Functional Unit

Put Station back on processing list

Modems Down

Unrecoverable Error

Process Station List

Log Event

Escalate Failure/Retry Unit

Halt Processing

Development Flexibility

API's allow targeted re-design that can be pushed to production.

Units

Unit API

Unit API

Data Retrieval

Parsing and formatting

MADIS

MADIS

Data Retrieval

Parsing and formatting

MADIS

Functional Unit API

  • Collects and distributes "High-Res" data. (1 Minute)
  • Eliminates "near-realtime" requirements.
  • Dial-out system reliability risks externalized.
  • Short-term: Still require modem system to retrieve 1 file. (device syslog)
  • Simplifies development reducing development and maintenance costs.

Dial Out System Upgrade

Summary 

(NOTE: All estimates are in 'Full-Time Employee' time. See Document for details)

An estimate for an upper bound of total work:

There are a number of 'orthogonal' improvements that could be done in arbitrary order (if done at all).

FORTRAN:         8-16 FTE months

JAVA:                  9–18 FTE months

BASH:                 ~7 FTE months

Architecture:      5‐10 FTE weeks

Total:  ~2.1 - 3.6 FTE years

Next Steps

  • Summary
  • Options

Options

  • Minimum Effort
  • Refactor
  • Re-architect

Option(1): Minimum Effort

  • Migrate off IBM Server

Pros

Cons

  • Initial cost is minimal
  • Deployed to modern 3-tier server environment.
  • Continue to maintain a high-risk system.
  • Key personnel retiring

Estimate: 3-6 FTE months 

Option(2): Refactor

  • Migrate off IBM Server

  • Refactor FORTRAN code

  • Refactor Java code

  • Re-write BASH

Pros

Cons

  • Produces a sustainable, well understood system.
  • Remains an isolated stove pipe system.
  • Does not address workflow issues.

Estimate:  17 - 27 FTE months

"Mission critical re-engineering, no architecture."

Option(3): Re-Architect

  • Migrate off IBM Server

Pros

Cons

  • Greatly enhanced logging and error handling.
  • Improved workflow.
  • Can incrementally address high-risk software.
  • Continued reliance on ASOS modems.

Estimate:  13 - 23 FTE months

(Recommended)

Summary

The approach we believe that is most likely to succeed in addressing both short-term concerns and longer-term organizational goals is an incremental approach that advances a mature architecture and is compatible with the common ingest system under development for future deployment on the 3-tier production environment.

 

  • Guidance is required to set the scope of the development roadmap. 
  • Once scope is provided a formal project plan with estimates can be created.

Many Thanks!

ASOS Final

By gatewayspectacle

ASOS Final

  • 283