Mapping Disasters in South - East Asia using Twitter

DEVANSHI VERMA

ABOUT ME

  • Graduated from NSIT, Delhi University, Class of 2018 with B.E. in Instrumentation and Control Engineering. 
  • Published a research paper in an International Journal on how to detect rumors on Twitter using supervised Machine Learning
  • Interned at Geoinformatics Center in Thailand to work on geospatial data science.
  • Currently, working as an analyst at EXL Analytics.

AGENDA

  • Why Twitter Data?

  • How to get the Data?

  • Problem and Solution

  • Framework

  • Python Libraries and Code 

  • Output

  • Testing 

  • Questions?

WHY TWITTER DATA ?

  • Twitter is the top source of breaking news averaged 335 million users in the first quarter of 2018 with an average of around 6,000 tweets per second.
  • It provides fast, real-time information about a large-scale disaster and can produce a map within around a minute of messages being posted.

HOW TO GET TWITTER DATA?

TWITTER API
  • API is a way to request and deliver information.
  • Twitter APIs that return Tweets provide that data encoded using JSON which is based on key-value pairs, with named attributes and associated values.
  • 2 types of API's 
    1. Search API
    2. Stream API

PROBLEM?

  • With a tweet, we have 4 types of Data dictionaries: Tweet object, User object, Twitter entities and extended entities.
  • In Tweet object dictionary we have the coordinates in geoJSON format
  • PROBLEM: A user has to enable the precise location to add this information. This feature is OFF by default!

SOLUTION

NAMED ENTITY RECOGNITION

SOLUTION

NAMED ENTITY RECOGNITION

FRAMEWORK

FETCH TWEETS

#Authorised access with the API
auth=tweepy.OAuthHandler(CONSUMER_KEY,CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN,OAUTH_TOKEN_SECRET)
api=tweepy.API(auth)

#extracting the tweets
keyword='#Flood'
public_tweets = api.search(q=keyword,lang='en',count=500,monitor_rate_limit=True)

#filtering out the tweets for asian countries
asian_counties=['cambodia','india','indonesia','malaysia','nepal','philippines','singapore','srilanka',
                'thailand','vietnam','myanmar','bangladesh','japan','china','bhutan','korea','australia',
                'taiwan','kazakhstan','pakistan','cook islands','fiji','vanuatu','kiribati','micronesia',
                'nauru','niue','samoa','solomon',' tonga','tuvalu','andaman','nicobar','papua']

for tweet in public_tweets:
    for i in asian_counties:
        if i in tweet.text.lower():
            listoftweets.append(tweet_text)
            time_stamp.append(tweet.created_at)  

#storing the data in a dataframe
df=pd.DataFrame(columns=['Text','Time_stamp'])
df['Text']=listoftweets
df['Time_stamp']=time_stamp

FETCH TWEETS

EXTRACT GPE,LOC AND FAC

EXTRACT GPE,LOC AND FAC

USE GEOCODER

#extracting all co-ordinates
lat=[]
long=[]
time_stamp=[]
names=[]
for i in range(len(dict1)):
    g=geocoder.google(list(dict1.keys())[i])
    if(g.latlng is not None and g.latlng[0] != 35.86166):
        lat.append(g.latlng[0])
        long.append(g.latlng[1])
        time_stamp.append(list(dict1.values())[i])
        names.append(list(dict1.keys())[i])

PLOT ON MAP

import folium
from folium.plugins import MarkerCluster
t=folium.Map(location=[11.88,124],zoom_start=4)

marker_cluster = MarkerCluster().add_to(t)
for i in range(len(lat)):
    folium.Marker([lat[i],long[i]],popup='<b>Flood : %s<br> Created on: %s</b>'%(names[i],
                  time_stamp[i]),icon=folium.Icon(color='blue',icon='info-sign')).add_to(marker_cluster)

t.add_child(folium.LatLngPopup())
folium.TileLayer('Mapbox Control Room').add_to(t)
folium.LayerControl().add_to(t)

t.save('Final_Time_Map.html')

OUTPUT 

TESTING

FLOODS

No Disaster Location ​​Status ​Source
1 Flood Japan - Kamo River, Hiroshima, Kyoto, Fukuoka, Okayama, Moyotama, Mabi town, Kurashiki, Nagasaki,Kyusyu island Detected Floodlist
2 ​Flood Pakistan - Lahore  Detected ​Floodlist
3 ​Flood Nepal Not Detected ​Floodlist
4 ​Flood India - Jammu, and Kashmir, Karimganj, Srinagar Detected ​Floodlist

TESTING

EARTHQUAKES

No Disaster Location ​​Status ​Source
1 Earthquake Japan - Chiba, Tokyo,  Fukushima Detected USGS
2 Earthquake Indonesia -  ​Sumatra  Detected USGS
3 Earthquake Japan - Osaka False Detected ----------
4 Earthquake India -  Rajasthan Detected USGS
5 Earthquake Australia - Adelaide SA Detected USGS
6 Earthquake  Taiwan- Taitung County Detected USGS

TESTING

LANDSLIDES

No Disaster Location ​​Status ​Source
1 Landslide India -  Tamenglong district, Manipur, Jammu, ​ Baltal Route, Jammu and Kashmir Detected Google
2 Landslide Japan - Hiroshima,  Kurashiki,  ​Kyushu Detected Google
3 Landslide China - ​Beichuan Qiang Autonomous County Detected Google

QUESTIONS?

thisisdevanshi

thisisdevanshi

thisisdevanshi

PyData

By Devanshi Verma

PyData

  • 655