Introduction to Python for Data Science
Data structures in Python
- Lists -- a mutable ordered array of data
- Tuples -- an immutable ordered array of data
- Dictionaries -- an unordered mapping from keys to values
- Sets -- an unordered collection of unique elements
Tuples
Tuples are immutable and denoted by parentheses. They can be of mixed types.
t = ("banana", 3.1, False, 4)
t[1]
3.1
Indexing a tuple.
t[2:]
(False, 4)
Sets
A set is an unordered collection with no duplicate elements
One way to create a set is from a list. This allows for fast membership testing.
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
fruit = set(basket)
fruit
set(['orange', 'pear', 'apple', 'banana'])
'orange' in fruit
True
Text
Lists
Create a list.
l = [4,5,6]
l
[4, 5, 6]
Index into a list.
l[1:]
[5, 6]
Operations that can be performed on lists.
Append
l.append("peach")
l
[4, 5, 6, 'peach']
l.append((4.5,1))
l
[4, 5, 6, 'peach', (4.5, 1)]
Check length.
len(l)
5
Pop the second element.
l.pop(2)
l
[4, 5, 'peach', (4.5, 1)]
Sort the list.
l.sort()
l
[4, 5, 'peach', (4.5, 1)]
Obtain the index.
l.index('peach')
2
Insert into the list.
l.insert(3, "oranges")
l
[4, 5, 'peach', 'oranges', (4.5, 1)]
We can also build lists. Start with an empty one.
elements = []
for i in range(0, 6):
print "Adding %d to the list." % i
elements.append(i)
Adding 0 to the list.
Adding 1 to the list.
Adding 2 to the list.
Adding 3 to the list.
Adding 4 to the list.
Adding 5 to the list.
Iterating through lists.
words = ['tennis', 'court', 'match']
for x in words:
print x, len(x)
tennis 6
court 5
match 5
Using enumerate for iteration. This returns both position index and the value.
for i,x in enumerate(['tennis', 'court', 'match']):
print i, x, len(x)
0 tennis 6
1 court 5
2 match 5
Use range. Note that the endpoint is not included.
range(5, 10)
[5, 6, 7, 8, 9]
range(0, 10, 3)
[0, 3, 6, 9]
Dictionaries
There are several ways to create a dictionary.
Assign key value pairs.
d = {"color": "Blue",
"sport": "tennis"}
d
{'color': 'Blue', 'sport': 'tennis'}
Assign values
d = dict(number = 1, house="ranch", flower='rose')
d
{'flower': 'rose', 'house': 'ranch', 'number': 1}
Create an empty dictionary and add values.
d = {} # empty dictionary
d['number'] = 1
d['house'] = 'ranch'
d['flower'] = 'rose'
d
{'flower': 'rose', 'house': 'ranch', 'number': 1}
Add to a dictionary.
d = {'positano': 4098, 'paris': 4139}
d['rome'] = 4127 #add to dictionary
d
{'paris': 4139, 'positano': 4098, 'rome': 4127}
Return a dict value.
d['positano']
4098
Obtain the keys.
d.keys()
['paris', 'positano', 'rome']
Build dict from list of tuples.
dict([('paris', 4139), ('rome', 4127), ('positano', 4098)])
{'paris': 4139, 'positano': 4098, 'rome': 4127}
Dict comprehensions to create dicts.
{x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}
Iterating throught data structures
Comprehensions
List Comprehensions
squares = [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
List comprehension
squares = [x**2 for x in range(10)]
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Anonymous function gives same result
squares = map(lambda x: x**2, range(10)) #lambda functions
squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Built-in functions that can be used with lists include filter, map, and reduce.
def f(x): return x % 3 == 0 or x % 5 == 0
filter(f, range(2, 25))
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]
Set comprehensions
{x for x in 'hotairballoon' if x not in 'ola'}
{'b', 'h', 'i', 'n', 'r', 't'}
myset = set([1,2,3,4])
{x ** 2 for x in myset}
{1, 4, 9, 16}
Dictionary comprehensions
Build a dictionary.
{x: x**2 for x in (2, 4, 6)} #build a dictionary
{2: 4, 4: 16, 6: 36}
Iterating
Looping through a list.
for x in range(10):
squares.append(x**2)
print x
0
1
2
3
4
5
6
7
8
9
Looping with enumerate returns position and value.
for i, v in enumerate(['lemon', 'peach', 'raspberry']):
print i, v
0 lemon
1 peach
2 raspberry
Loop over two lists at same time.
questions = ['sport', 'flower', 'season']
answers = ['tennis', 'peony', 'summer']
for q, a in zip(questions, answers):
print 'What is your {0}? It is {1}.'.format(q, a)
What is your sport? It is tennis.
What is your flower? It is peony.
What is your season? It is summer.
Loop over a sequence in sorted order.
vegetables = ['potato', 'corn', 'potato', 'pepper', 'corn', 'onions']
for f in sorted(set(vegetables)):
print f
corn
onions
pepper
potato
Looping through dictionary.
Get key and value at same time using iteritems().
super = {'superman': 'man of steel', 'austinpowers': 'man of mystery'}
for k, v in super.iteritems():
print k, v
austinpowers man of mystery
superman man of steel
Strings
Strings can be indexed or sliced
Return character in position 0.
word = 'recommended'
word[0]
'r'
Return last character.
word[-1]
'd'
Return characters from position 0 (included) to 2 (excluded).
word[0:2]
're'
Return characters from the beginning to position 2 (excluded)
word[:2]
're'
Returns characters from position 4 (included) to the end.
word[4:]
'mmended'
Convert value to string using str().
x = 10 * 3.25
s = 'The value of x is ' + str(x) + ''
s
'The value of x is 32.5'
String formatting
Use str.format() method.
print 'The season is {} and the weather is "{}!"'.format('summer', 'sunny')
The season is summer and the weather is "sunny!"
Specify position using str.format method.
print '{0} and {1}'.format('mac', 'cheese')
print '{1} and {0}'.format('mac', 'cheese')
mac and cheese
cheese and mac
Using the semicolon format specifier.
x = "{0:.2f}".format(10)
##Older way
y = "%0.2f" % 10
x,y
('10.00', '10.00')
Learning Resources
- docs.python.org/2/tutorial/
- codecademy.com/en/tracks/python
- learnpython.org/
Thank you
Intro to Python for Data Science
By practicaldatascience
Intro to Python for Data Science
Intro to Python for Data Science
- 588