2015 SpaceApps

Data Projects that

were awesome 

Jason Duley

Big (Data + Think) 2

04-21-2015

Topics

  • Why Data Challenge?
  • The Challenge 
  • Highlighted Solutions
  • Next Steps
  • Production Usage

Big (Data + Think) 2 - jason.duley@nasa.gov

Why Data Challenge?

Open Data publishers have limited time, resources and domain knowledge to optimally develop metadata to make datasets more easily accessible and discoverable for citizens working to solve data-driven problems

Big (Data + Think) 2 - jason.duley@nasa.gov

Why Data Challenge?

Background:

  • Many are interested in new ways to leverage open data and integrate these information assets into innovative databases, services and applications

​Problem:

  • Inconsistent metadata, such as keyword tagging impacts information engines such as Google, the ability to discover assets

Big (Data + Think) 2 - jason.duley@nasa.gov

The Challenge

Discover keywords to describe the potential, hidden, secondary uses of open data using any technique that might help discover new keywords.  Some seedling ideas:

  • Crowdsourcing Approaches
    • display data asset information online, query people about how the assets can be used
  • Predictive Analytics or Machine-Learning
    • ​compare metadata and the data of one information asset to another in order to find new keywords
  • Unique Identifier Analysis of Published Data
    • ​search on the web, discover who already used the data and for what purpose, then catalog it

We provided a Challenge  Starter Toolkit 

 

 

 

 

Big (Data + Think) 2 - jason.duley@nasa.gov

#humans, #datatreasurehunting, #advanced

Highlighted Solutions

20 projects submitted for the

Data Treasure Hunt challenge

              notables:

                      Degrees of Data

                      Open Data Gold Digger

                      Keyword Distillery

                      NYSpaceTag



Big (Data + Think) 2 - jason.duley@nasa.gov

Degrees Of Data

Approach:

  • Solve using Twitter API
  • Leverage people tagging their tweets with #hashtags
  1. Search for tweets containing the input keyword as a hashtag
  2. Examine all the other hashtags as "relevant" keywords.
  3. Look at amount of occurrences
  4. Average them to set a threshold.
  5. Any keywords over that threshold are output  

 

     http://bit.ly/1ISxXdW

Big (Data + Think) 2 - jason.duley@nasa.gov

Open Data Gold Digger

Approach:

  • Focused on taking the conglomerates of raw data and mining this data into smaller streams of useful, relevant data
  1. Search for tweets containing the input keyword as a hashtag
  2. Examine all the other hashtags as "relevant" keywords.
  3. Look at amount of occurrences
  4. Average them to set a threshold.
  5. Any keywords over that threshold are output  


      http://bit.ly/1zGhhOq

Big (Data + Think) 2 - jason.duley@nasa.gov

Keyword Distillery

Approach:

  • Keywords given scaled weights by inversing the number of search results returned
  • Public data was then crawled, each dataset was scanned for matching keywords
  • Keyword frequency was calculated and stored in the relationship map
  • Freqency was calculated by dividing the total number of keyword instances by the length of the document

 

     http://bit.ly/1ISxXdW

Big (Data + Think) 2 - jason.duley@nasa.gov

NYSpaceTag

Approach:

  • Using NLP scripts, 16k NASA datasets were run and keywords were generated from datasets titles and descriptions
  • Built a Recommendation Engine (
    • devised a metric for measuring similarity of projects
    • based on synonym sets
  • Visual Search Engine

 

      http://bit.ly/1yLS4a9

Big (Data + Think) 2 - jason.duley@nasa.gov

Next Steps

  • Work with Project team(s) to mature software
  • Compare keyword generation to Alchemy API based approaches
  • Package features as services
  • Integrate into production use
  • Calculate and show upper management the ROI from SpaceApps

Big (Data + Think) 2 - jason.duley@nasa.gov

SAC Data Projects

By Jason Duley

SAC Data Projects

From the 2015 SpaceApps Challenge code-a-thon event, and specifically for the Data Treasure Hunt Challenge, we've identified a few projects created by our participants worth highlighting.

  • 499