Program of the day

09:10 - 09:30  Talk by Emillie Van de Keulenaar
09:30 - 09:45  Talk by Bharath Ganesh
09:45 - 10:00  Data description
10:00 - 10:30  Group forming
10:30 - 17:30  Working
17:30 - 18:30  Presentations
18:30 ->       Drinks
- Food will be served at 12:30
- We meet at 12:30 and 15:30 to quickly discuss progress: What have we been doing? What did we get stuck with? What are we going to do?

 

Datathon

Output:

- 5 minutes presentation

- Open Science: Presentation + Code should be open access

Goals:

- State and answer a research question

- Learn from each other

Incentives:

- Certificate to best project

- Hopefully a 5 minutes presentation on Saturday at IC2S2 (but we only received informal confirmation)

Collaborative doc

https://docs.google.com/document/d/1fQdWl0MJDY7o4eMRMotJf9T_EYSocAPt2NipGrmFQbs/edit

 

(see #project_ideas)

This slides: slides.com/jgarciab/datathon

Data

Channels Comments Transcripts Videos
Left 788 74M 340K 35k
Right 950 29M 180K 260k

Not one-to-one relationship

Recommendations:

24M recommendations

Seed list + snow-sampling + manual curation

Channels

ID

Title

Description

View count

Subscription count

Video count

Topic IDs (link in doc)

Comments

ID

username

datum

video_id

channel

likes

dislikes

dataset (r/l)

comment

Videos

video_id
video_published
channel_id
video_title
video_description
video_tags  
video_category_id  (link in doc)
video_duration  
video_view_count    
video_comment_count    
video_likes_count  
video_dislikes_count
video_topic_ids    (link in doc)
   

Transcriptions

video_id

transcription
   

Recommendations

videoID: ID

targetVideoID: Recommended video

publishedAt: Date

channelID: channelID of the videoID

title: title of the targetVideoID

description: description of the targetVideoID

Data (subset)

Channels Comments Transcripts Videos
Left 5 6.4M 24k 24k
Right 5 3.7M 9k 11k

Recommendations:

861k left

481k right

Available:

- Download (check #data)

- USBs

Top 5 by count

Access to complete data

https://hpcn-fmg01.science.uva.nl/db

 

(Credentials in slack, see #data)

Access to server

24 cores

1.5Tb memory

Direct access to data

 

 

 

Using a computer from a UvA researcher

- Anna Keuchenius

- Javier Garcia-Bernardo

Datathon

By Javier GB

Datathon

  • 1,078