Program of the day
09:10 - 09:30 Talk by Emillie Van de Keulenaar 09:30 - 09:45 Talk by Bharath Ganesh 09:45 - 10:00 Data description 10:00 - 10:30 Group forming 10:30 - 17:30 Working 17:30 - 18:30 Presentations 18:30 -> Drinks
- Food will be served at 12:30 - We meet at 12:30 and 15:30 to quickly discuss progress: What have we been doing? What did we get stuck with? What are we going to do?
Datathon
Output:
- 5 minutes presentation
- Open Science: Presentation + Code should be open access
Goals:
- State and answer a research question
- Learn from each other
Incentives:
- Certificate to best project
- Hopefully a 5 minutes presentation on Saturday at IC2S2 (but we only received informal confirmation)
Collaborative doc
https://docs.google.com/document/d/1fQdWl0MJDY7o4eMRMotJf9T_EYSocAPt2NipGrmFQbs/edit
(see #project_ideas)
This slides: slides.com/jgarciab/datathon
Data
Channels | Comments | Transcripts | Videos | |
---|---|---|---|---|
Left | 788 | 74M | 340K | 35k |
Right | 950 | 29M | 180K | 260k |
Not one-to-one relationship
Recommendations:
24M recommendations
Seed list + snow-sampling + manual curation
Channels
ID
Title
Description
View count
Subscription count
Video count
Topic IDs (link in doc)
Comments
ID
username
datum
video_id
channel
likes
dislikes
dataset (r/l)
comment
Videos
video_id
video_published
channel_id
video_title
video_description
video_tags
video_category_id (link in doc)
video_duration
video_view_count
video_comment_count
video_likes_count
video_dislikes_count
video_topic_ids (link in doc)
Transcriptions
video_id
transcription
Recommendations
videoID: ID
targetVideoID: Recommended video
publishedAt: Date
channelID: channelID of the videoID
title: title of the targetVideoID
description: description of the targetVideoID
Data (subset)
Channels | Comments | Transcripts | Videos | |
---|---|---|---|---|
Left | 5 | 6.4M | 24k | 24k |
Right | 5 | 3.7M | 9k | 11k |
Recommendations:
861k left
481k right
Available:
- Download (check #data)
- USBs
Top 5 by count
Access to complete data
https://hpcn-fmg01.science.uva.nl/db
(Credentials in slack, see #data)
Access to server
24 cores
1.5Tb memory
Direct access to data
Using a computer from a UvA researcher
- Anna Keuchenius
- Javier Garcia-Bernardo
Datathon
By Javier GB
Datathon
- 1,078