Expressing
your self data
(for non-programmers)
1
Who are you?
- I’m a PhDc at SNU and Data Scientist at Team POPONG.
-
My interests lay on the usability, openness, and freedom of data.
-
Friends call me a dreamer or romantist to a fault – but I enjoy living my life as such.
-
In my free time, I play around surfing the web, or chuckling at xkcd.
-
I love continuous learning, and pursuit community action.
Our objective
A lot more out there...
The BIG Picture
1. Defining your problem
2. Searching the solution space
3. Locating your data sources
4. Crunching the data
5. Convincing others
1. Defining your problem
What are you interested in?
- Is there a problem you've been wanting to solve?
-
If you don't, it may be easier to start with exploring resources.
1. Defining your problem
What is the scope of your problem?
- What are the boundaries?
- What is it that you're not going to solve?
- What are the assumptions? Are they realistic?
- What are the constraints?
What are the challenges?
- Do you have enough resources to solve your problem?
- Time, ability, enthusiasm, ...
What happens when you solve this problem?
- How does it benefit your life?
1. Defining your problem
Define your problem
in one sentence.
2. Searching the solution space
What are the causes of the problem?
- Why did this problem occur in the first place?
- Is there a major cause, or do some work together?
- This is where you brainstorm with your teammates.
How have others approached your problem?
- Is your problem something new? (Probably not.)
-
Where is the main research community located? (the "domain")
-
What are the "jargons" they use? (literature survey)
- Google Scholar at your service.
2. Searching the solution space
Choose your own approach.
- Specify your hypothesis.
- Try to solve one problem at a time. (divide and conquer)
- The more time, the higher performance.
- What ingredients (data) do you need?
2. Searching the solution space
Now, try defining your solution in one sentence.
3. Locating your data sources
So, where can you get the data?
- Create your own data
- Gather data from the Web (via crawling)
- If you can't "crawl" try some automated tools such as import.io
- Use data organized by others
- Competitions: http://www.kaggle.com/competitions
- Text: http://trec.nist.gov/data.html
- Log data: http://dumps.wikimedia.org/
- APIs: http://www.programmableweb.com/apis
Data is practically everywhere.
(Though the data you really need is never there.)
4. Crunching the data
Get to know your data
(a.k.a. "Data Exploration")
Tools
Spotfire, Tableau, ...
Google trends, Google graphs, Google fusion tables
php, python, html, css, javascript
Algorithms
...
4. Crunching the data
What's the type of your data?
Numerical? Textual? Graphical? Spatial? Temporal? ...
What are the dimensions?
- Variables & records (==the columns & rows)
- How many variables? Type of variables? (e.g., Nominal, ordinal, interval, ratio)
- How many records?
4. Crunching the data
Some very easy and useful tools
- Desktop: Spotfire, Tableau Public
- Web: Google trends
Publishing on the Web
- Libraries: Google charts, Google fusion tables
- PHP, Python, Java, ...
- HTML, CSS, JavaScript, ...
-
...
5. Convincing others
Reporting & Presenting the results
- Your conclusions (the performance) are important, but your reasoning counts too.
- What were your options? Why did or didn't you choose an option?
- What were your assumptions? How much are they valid?
-
What did you consider as variables, and what did you fix?
- What have you considered, but didn't implement? (Future work)
Galleries you can check out
5. Convincing others
Some more useful tools
Some tips
Plan ahead.
You have deadlines. Make milestones, and keep them.
Even if your results don't satisfy your standards, get over it.
Keep the development cycle short.
First make something that runs.
Then make enhance the performance. Then add more components.
Work as a team.
Find our what your teammates do best.
Know what they want.
Documentation on-the-fly helps.
It really does.
Source: http://upload.wikimedia.org/wikipedia/commons/a/a9/Mars_Science_Laboratory_Curiosity_rover.jpg
Korea National Assembly, Now
Visualization of the seatings of the 18th National Assembly of Korea.
Now let's try one
- Problem: Korean Politics
- Data: https://github.com/teampopong/data-for-rnd
Expressing your data
By Eunjeong Lucy Park
Expressing your data
- 3,763