Practical programming

manipulating data

HW Review

A comment on comments


this_code_will_run = 10 + 20
# but_this_code_wont = 20
but_this_will = 200 / 20
'''
not = 10
a = 20
single = 30
line = 40
here = 50
will = 60
either = 70
'''
This is called a comment:
# a single-line comment
'''
A multi-line
comment.
'''

Lots of things


days_of_week = ['S', 'M', 'T', 'W', 'Th', 'F', 'Sa']

print(len(days_of_week))   # prints out "7"
print(days_of_week[3])     # prints out "W"
print(days_of_week[-1])    # prints out "Sa"
for day in days_of_week:
    if day == 'M':
        # Only for Mondays
        print('I hate Mondays')
    else:
        # Otherwise
        print('TGINM')
days_of_week.append('Sa')
# Now contains ['S', 'M', 'T', 'W', 'Th', 'F', 'Sa', 'Sa']

days_of_week.remove('M')
# Now contains ['S', 'T', 'W', 'Th', 'F', 'Sa', 'Sa']

Unique New York


webpage_views = ['mom', 'linda', 'mom', 'mom', 'joe', 'mom', 'elliott']
print(len(webpage_views))   # prints out "7"

snowflakes = set(webpage_views)
# contains {'elliott', 'joe', 'linda', 'mom'}
print(len(snowflakes))      # prints out "4"
for i in range(1000):
    snowflakes.add('mom')
    
# still contains {'elliott', 'joe', 'linda', 'mom'}
This is called a set, of the form:
unique_things = set(my_list)

A Webster special


defintions = {
    'Python':       'A really fun programming language',
    'dictionary':   'A set of terms with their corresponding definitions',
    'Jeremy Lucas': 'A great ninja with many skillz'
}

print(len(defintions))      # prints out "3"
print(defintions['Python']) # prints out "A really fun programming language"
defintions['data'] = 'A set of qualitative or quantitative values'
print(len(defintions))   # prints out "4"
This is called a dictionary, of the form:
d = {
    'key1': value1,
    'key2': value2,
    ...
}

Skimming the dictionary


defintions = {
    'Python':       'A really fun programming language',
    'dictionary':   'A set of terms with their corresponding definitions',
    'Jeremy Lucas': 'A great ninja with many skillz'
}

for term, definition in definitions.iteritems():
    print(term + ' = ' + definition)
    
'''
Python = A really fun programming language
dictionary = A set of terms with their corresponding definitions
Jeremy Lucas = A great ninja with many skillz
'''
To iterate over adictionary, use the form:
for k, v in d.iteritems():
    # Do stuff with each key and value

E I E I/O

  • "Input/Output"
  • The essence of digital communications
    • Keyboard input
    • Display output
    • print function
  • Working with "files"
    • Reading files
    • Writing files
    • In Unix systems, this could mean data streams, hardware devices, or even network sockets

Open for business


data_file = open('/tmp/worldcup.csv')

for line in data_file.readlines():
    # prints each line in the file
    print(line)

# make sure to free up  your resources
data_file.close()
# let Python close the file for us when we're done
with open('/tmp/worldcup.csv') as data_file:
    for line in data_file.readlines():
        # prints each line in the file
        print(line)
        
    # data_file is still open
    
# now it's closed

Gooooaaaaaaal


data_file = open('/tmp/galaxycup.csv', 'w')
# write the outcomes of the intergalactic matches
data_file.write('3000-07-02,Mars,10,Jupiter,1000\n')
data_file.write('3000-07-07,Uranus,2,Neptune,3\n')
data_file.write('3000-07-10,Pluto,0,Earth,1\n')

# make sure to free up  your resources
data_file.close()
data_file = open('/tmp/galaxycup.csv', 'a')
# write another match
data_file.write('3000-07-02,Venus,4,Mercury,2\n')

# make sure to free up  your resources
data_file.close()
with open('/tmp/galaxycup.csv', 'a') as data_file:
    # write another match
    data_file.write('3000-07-02,Venus,4,Mercury,2\n')

Struck sure

  • Delimited formats
    • CSV (commas)
    • TSV (tabs)
  • Self-describing formats
    • JSON (Javascript object notation)
    • YAML (yet another markup language)
    • HTML (hypertext markup language)
    • XML (eXtensible markup language)
  • Binary formats
    • Excel spreadsheets
    • JPEG images

Imported goods


import csv
# now we can use the CSV library, yayyyyyy

import json
# now we can use the JSON library, yayyyyyy

from __future__ import division
# now we can use division from the future!!
Use an import to include extra functionality:
import my_module
# use my_module
from my_other_module import my_cool_thing
# use my_cool_thing

CSV please


import csv

with open('/tmp/worldcup.csv') as data_file:
    structured = csv.reader(data_file)
    for record in structured:
        # print out the first part of each record in the file (match date)
        print(record[0])
        
date
2014-06-12
2014-06-13
2014-06-17
2014-06-18
2014-06-23
2014-06-23
...
    

Headers to the rescue!


import csv

with open('/tmp/worldcup.csv') as data_file:
    structured = csv.DictReader(data_file)
    for record in structured:
        # print out the teams involved in each match
        print(record['team_1'] + ' vs. ' + record['team_2'])
        
Brazil vs. Croatia
Mexico vs. Cameroon
Brazil vs. Mexico
Cameroon vs. Croatia
Cameroon vs. Brazil
Croatia vs. Mexico
Spain vs. Netherlands
...

HW: GOOOOOOAAAAAAL


Write a program to calculate the total number of goals scored by each team during the 2014 world cup (http://goo.gl/xmVDlu).

The scores should be output to a new CSV file with the following format:
team,total_goals
Brazil,114
Germany,103
United States,101
...

HW (Hints)


Python has a special function for treating a string as a number:

score_1 = '2'
score_2 = '4'

total = score_1 + score_2
# oops, this is "24"

total = int(score_1) + int(score_2)
# much better!

Practical programming: Manipulating data

By Jeremy Lucas

Practical programming: Manipulating data

  • 951