Expressing 

your self data

(for non-programmers)

Oct. 30, 2013

ejpark04@snu.ac.kr

http://lucypark.kr

Creative Commons License
1

Who are you?


  • I’m a PhDc at SNU and Data Scientist at Team POPONG.  
  • My interests lay on the usability, openness, and freedom of data. 
  • Friends call me a dreamer or romantist to a fault – but I enjoy living my life as such. 
  • In my free time, I play around surfing the web, or chuckling at xkcd.
  • I love continuous learning, and pursuit community action.
(Source: http://pic.twitter.com/KLVQNnATh8)

Our objective


(Source: http://courses2.cit.cornell.edu/info4302_2012fa/results.php)

A lot more out there...


(Source: http://www.programmableweb.com/mashup-tag-cloud)

The BIG Picture


1. Defining your problem

2. Searching the solution space

3. Locating your data sources

4. Crunching the data

5. Convincing others

1. Defining your problem


 

What are you interested in?

  • Is there a problem you've been wanting to solve? 
  • If you don't, it may be easier to start with exploring resources. 

1. Defining your problem


What is the scope of your problem?
  • What are the boundaries? 
  • What is it that you're not going to solve? 
  • What are the assumptions? Are they realistic? 
  • What are the constraints?

What are the challenges?
  • Do you have enough resources to solve your problem?
  • Time, ability, enthusiasm, ...

What happens when you solve this problem?
  • How does it benefit your life?

1. Defining your problem



    Define your problem 
    in one sentence.

    2. Searching the solution space


    What are the causes of the problem?
    • Why did this problem occur in the first place?
    • Is there a major cause, or do some work together?
    • This is where you brainstorm with your teammates.

    How have others approached your problem?
    • Is your problem something new?  (Probably not.)
    • Where is the main research community located? (the "domain")
    • What are the "jargons" they use? (literature survey)
    • Google Scholar at your service.

    2. Searching the solution space


    Choose your own approach.
    • Specify your hypothesis.
    • Try to solve one problem at a time. (divide and conquer)
    • The more time, the higher performance.
    • What ingredients (data) do you need?

    2. Searching the solution space




    Now, try defining your solution in one sentence.

    3. Locating your data sources


    So, where can you get the data?



    Data is practically everywhere.
    (Though  the data you really need is never there.)

    4. Crunching the data


    Get to know your data
    (a.k.a. "Data Exploration")

    (Source: http://www.biofortis.com/products/qiagram/)



    Tools
    Spotfire, Tableau, ...
    Google trends, Google graphs, Google fusion tables
    php, python, html, css, javascript


    Algorithms
    ...

    4. Crunching the data


    What's the type of your data?
    Numerical? Textual? Graphical? Spatial? Temporal? ...

    What are the dimensions?
    • Variables  & records (==the columns & rows)
    • How many variables? Type of variables? (e.g., Nominal, ordinal, interval, ratio)
    • How many records?

    4. Crunching the data


    Some very easy and useful tools

    Publishing on the Web


    cf. http://en.wikipedia.org/wiki/Programming_languages_used_in_most_popular_websites

    5. Convincing others


    Reporting & Presenting the results
    • Your conclusions (the performance) are important, but your reasoning counts too.
    • What were your options? Why did or didn't you choose an option?
    • What were your assumptions? How much are they valid?
    • What did you consider as variables, and what did you fix?
    • What have you considered, but didn't implement? (Future work)


    Galleries you can check out

    5. Convincing others


    Some more useful tools

      Some tips


      Plan ahead.
      You have deadlines. Make milestones, and keep them.
      Even if your results don't satisfy your standards, get over it.

      Keep the development cycle short.
      First make something that runs.
      Then make enhance the performance. Then add more components.

      Work as a team.
      Find our what your teammates do best.
      Know what they want.

      Documentation on-the-fly helps.
      It really does.

      Source: http://upload.wikimedia.org/wikipedia/commons/a/a9/Mars_Science_Laboratory_Curiosity_rover.jpg

      Korea National Assembly, Now

      Visualization of the seatings of the 18th National Assembly of Korea.


      http://labs.popong.com/codenamu/

      Now let's try one


      Made with Slides.com