name | |
---|---|
United Way Worldwide | UnitedWay |
Task Force for Global Health | TheTaskForceforGlobalHealth |
Feeding America | FeedingAmerica |
Salvation Army | SalvationArmyUSA |
YMCA of the USA | YMCA |
St. Jude Children’s Hospital | stjude |
Food for the Poor | FoodForThePoor |
Boys & Girls Club of America | bgca.clubs |
Catholic Charities USA | catholiccharitiesusa |
import facebook
class FacebookClient:
"""Simple class to get basic information on Facebook Pages."""
def __init__(self, access_token):
"""Initialize GraphAPI object."""
self.graph = facebook.GraphAPI(access_token=access_token,
version='2.7')
def get_page_fan_count(self, page_id):
"""Return number of fans for the given page."""
page = self.graph.get_object(id=page_id, fields='fan_count')
return page['fan_count']
def get_page_about(self, page_id):
"""Return some information about the given page."""
page = self.graph.get_object(id=page_id, fields='about')
return page['about']
This is not good code!
import os
import pandas as pd
from dotenv import load_dotenv
from facebook_client import FacebookClient
load_dotenv()
fb = FacebookClient(access_token=os.getenv('FACEBOOK_ACCESS_TOKEN'))
nonprofit_df = pd.read_csv('nonprofit_facebook.csv')
nonprofit_df['fan_count'] = nonprofit_df['facebook'].map(fb.get_page_fan_count)
nonprofit_df['about'] = nonprofit_df['facebook'].map(fb.get_page_about)
nonprofit_df.to_csv('output.csv', index=False)
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python main.py
name | fan_count | about | |
---|---|---|---|
United Way Worldwide | UnitedWay | 212501 | To live better, we must Live United. |
Task Force for Global Health | TheTaskForceforGlobalHealth | 1504 | The Task Force for Global Health provides all people with opportunities to lead healthy, productive lives. |
Feeding America | FeedingAmerica | 603864 | Our mission is to feed America's hungry through a nationwide network of member food banks and engage our country in the fight to end hunger. You can help. |
Salvation Army | SalvationArmyUSA | 307914 | The Salvation Army is committed to doing the most good for the most people in the most need. The nation's largest faith-based charity, The Salvation Army serves 30 million people each year through a broad array of social services. |
YMCA of the USA | YMCA | 351368 | The Y: We're for youth development, healthy living and social responsibility. |
St. Jude Children’s Hospital | stjude | 2216537 | Welcome to the St. Jude Children’s Research Hospital Facebook page. Before you post, please review our posting policy located on the About page. |
Food for the Poor | FoodForThePoor | 370988 | Food For The Poor feeds millions of hungry people throughout the countries we serve. www.foodforthepoor.org |
Boys & Girls Club of America | bgca.clubs | 204625 | Great Futures Start Here! |
Catholic Charities USA | catholiccharitiesusa | 95090 | Working to reduce poverty in America for over 100 years. |
import os
import time
import pandas as pd
from dotenv import load_dotenv
from facebook_client import FacebookClient
load_dotenv()
fb = FacebookClient(access_token=os.getenv('FACEBOOK_ACCESS_TOKEN'))
nonprofit_df = pd.read_csv('nonprofit_facebook.csv')
t0 = time.perf_counter()
nonprofit_df['facebook'].map(fb.get_page_fan_count)
nonprofit_df['facebook'].map(fb.get_page_about)
t1 = time.perf_counter()
print("Sequential map time elapsed: {time} seconds.".format(time=t1 - t0))
t0 = time.perf_counter()
nonprofit_df['facebook'].map(lambda x: x + 'lol')
nonprofit_df['facebook'].map(lambda x: 500)
t1 = time.perf_counter()
print("Non-network map time elapsed: {time} seconds.".format(time=t1 - t0))
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Sequential map time elapsed: 10.18712910101749 seconds.
Non-network map time elapsed: 0.0004884299705736339 seconds.
from multiprocessing import Pool
t0 = time.perf_counter()
with Pool(processes=4) as pool:
pool.map(fb.get_page_fan_count, nonprofit_df['facebook'])
pool.map(fb.get_page_about, nonprofit_df['facebook'])
t1 = time.perf_counter()
print("Multiprocessing pool time elapsed: {time} seconds.".format(time=t1 - t0))
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Sequential map time elapsed: 10.18712910101749 seconds.
Non-network map time elapsed: 0.0004884299705736339 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Multiprocessing pool time elapsed: 3.7498577430378646 seconds.
Dask is a flexible parallel computing library
for analytic computing.
import dask.dataframe as dd
nonprofit_df_dask = dd.read_csv('nonprofit_facebook.csv')
t0 = time.perf_counter()
nonprofit_df_dask['facebook'].map(fb.get_page_fan_count,
meta=('fan_count', int)).compute()
nonprofit_df_dask['facebook'].map(fb.get_page_about,
meta=('about', str)).compute()
t1 = time.perf_counter()
print("Dask time elapsed: {time} seconds.".format(time=t1 - t0))
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Sequential map time elapsed: 10.18712910101749 seconds.
Non-network map time elapsed: 0.0004884299705736339 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Multiprocessing pool time elapsed: 3.7498577430378646 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Dask time elapsed: 10.75200344400946 seconds.
import dask.dataframe as dd
nonprofit_df_dask = dd.read_csv('nonprofit_facebook.csv', blocksize=200)
t0 = time.perf_counter()
nonprofit_df_dask['facebook'].map(fb.get_page_fan_count,
meta=('fan_count', int)).compute()
nonprofit_df_dask['facebook'].map(fb.get_page_about,
meta=('about', str)).compute()
t1 = time.perf_counter()
print("Dask (fixed) time elapsed: {time} seconds.".format(time=t1 - t0))
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Sequential map time elapsed: 10.18712910101749 seconds.
Non-network map time elapsed: 0.0004884299705736339 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Multiprocessing pool time elapsed: 3.7498577430378646 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Dask time elapsed: 10.75200344400946 seconds.
(venv) Michaels-MacBook-Pro:parallel-python-tutorial michael$ python timing.py
Dask (fixed) time elapsed: 3.777425266976934 seconds.
import os
import pandas as pd
from dotenv import load_dotenv
from facebook_client import FacebookClient
load_dotenv()
fb = FacebookClient(access_token=os.getenv('FACEBOOK_ACCESS_TOKEN'))
nonprofit_df = pd.read_csv('nonprofit_facebook.csv')
nonprofit_df['fan_count'] = nonprofit_df['facebook'].map(fb.get_page_fan_count)
nonprofit_df['about'] = nonprofit_df['facebook'].map(fb.get_page_about)
nonprofit_df.to_csv('output.csv', index=False)
Make Pandas faster
by replacing one line of your code.