Introduction to Python for Data Science

 

Data structures in Python

  • Lists -- a mutable ordered array of data
  • Tuples -- an immutable ordered array of data
  • Dictionaries -- an unordered mapping from keys to values
  • Sets -- an unordered collection of unique elements

Tuples

Tuples are immutable and denoted by parentheses. They can be of mixed types.

    t = ("banana", 3.1, False, 4)
    t[1]
3.1

Indexing a tuple.

    t[2:]
(False, 4)

Sets

A set is an unordered collection with no duplicate elements

One way to create a set is from a list. This allows for fast membership testing.

    basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
    fruit = set(basket)
    fruit
    set(['orange', 'pear', 'apple', 'banana'])
    'orange' in fruit
True

Text

Lists

Create a list.

    l = [4,5,6]
    l
[4, 5, 6]

Index into a list.

    l[1:]
[5, 6]

Operations that can be performed on lists.

Append

    l.append("peach")
    l
[4, 5, 6, 'peach']
    l.append((4.5,1))
    l
[4, 5, 6, 'peach', (4.5, 1)]

Check length.

    len(l)
5

Pop the second element.

    l.pop(2)
    l
[4, 5, 'peach', (4.5, 1)]

Sort the list.

    l.sort()
    l
[4, 5, 'peach', (4.5, 1)]

Obtain the index.

    l.index('peach')
2

Insert into the list.

    l.insert(3, "oranges")
    l
[4, 5, 'peach', 'oranges', (4.5, 1)]

We can also build lists. Start with an empty one.

    elements = []
    for i in range(0, 6):
        print "Adding %d to the list." % i
        elements.append(i)
Adding 0 to the list.
Adding 1 to the list.
Adding 2 to the list.
Adding 3 to the list.
Adding 4 to the list.
Adding 5 to the list.

Iterating through lists.

    words = ['tennis', 'court', 'match']
    for x in words:
        print x, len(x)
tennis 6
court 5
match 5

Using enumerate for iteration. This returns both position index and the value.

 for i,x in enumerate(['tennis', 'court', 'match']):
     print i, x, len(x)
0 tennis 6
1 court 5
2 match 5

Use range. Note that the endpoint is not included.

    range(5, 10)
[5, 6, 7, 8, 9]
    range(0, 10, 3)
[0, 3, 6, 9]

Dictionaries

There are several ways to create a dictionary.

Assign key value pairs.

    d = {"color": "Blue",
         "sport": "tennis"}
    d
{'color': 'Blue', 'sport': 'tennis'}

Assign values

    d = dict(number = 1, house="ranch", flower='rose')
    d
{'flower': 'rose', 'house': 'ranch', 'number': 1}

Create an empty dictionary and add values.

    d = {}  # empty dictionary
    d['number'] = 1
    d['house'] = 'ranch'
    d['flower'] = 'rose'
    d
{'flower': 'rose', 'house': 'ranch', 'number': 1}

Add to a dictionary.

    d = {'positano': 4098, 'paris': 4139}
    d['rome'] = 4127  #add to dictionary
    d
{'paris': 4139, 'positano': 4098, 'rome': 4127}

Return a dict value.

    d['positano']
4098

Obtain the keys.

    d.keys()
['paris', 'positano', 'rome']

Build dict from list of tuples.

dict([('paris', 4139), ('rome', 4127), ('positano', 4098)])
{'paris': 4139, 'positano': 4098, 'rome': 4127}

Dict comprehensions to create dicts.

{x: x**2 for x in (2, 4, 6)}
{2: 4, 4: 16, 6: 36}

Iterating throught data structures

Comprehensions

List Comprehensions

    squares = [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

List comprehension

    squares = [x**2 for x in range(10)]
    squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Anonymous function gives same result

squares = map(lambda x: x**2, range(10)) #lambda functions

squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Built-in functions that can be used with lists include filter, map, and reduce.

    def f(x): return x % 3 == 0 or x % 5 == 0
    filter(f, range(2, 25))
    [3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]

Set comprehensions

    {x for x in 'hotairballoon' if x not in 'ola'}
{'b', 'h', 'i', 'n', 'r', 't'}
    myset = set([1,2,3,4])
    {x ** 2 for x in myset}
{1, 4, 9, 16}

Dictionary comprehensions

Build a dictionary.

{x: x**2 for x in (2, 4, 6)}  #build a dictionary
{2: 4, 4: 16, 6: 36}

Iterating

Looping through a list.

    for x in range(10):
        squares.append(x**2)
        print x
0
1
2
3
4
5
6
7
8
9

Looping with enumerate returns position and value.

 for i, v in enumerate(['lemon', 'peach', 'raspberry']):
     print i, v

    0 lemon
    1 peach
    2 raspberry

Loop over two lists at same time.

questions = ['sport', 'flower', 'season']
answers = ['tennis', 'peony', 'summer']
for q, a in zip(questions, answers):
    print 'What is your {0}?  It is                 {1}.'.format(q,  a)

    What is your sport?  It is tennis.
    What is your flower?  It is peony.
    What is your season?  It is summer.

Loop over a sequence in sorted order.

 vegetables = ['potato', 'corn', 'potato', 'pepper', 'corn', 'onions']
 for f in sorted(set(vegetables)):
     print f
corn
onions
pepper
potato

Looping through dictionary.

Get key and value at same time using iteritems().

 super = {'superman': 'man of steel', 'austinpowers': 'man of mystery'}
 for k, v in super.iteritems():
     print k, v
austinpowers man of mystery
superman man of steel

Strings

Strings can be indexed or sliced

Return character in position 0.

    word = 'recommended'
    word[0]
'r'

Return last character.

    word[-1]
'd'

Return characters from position 0 (included) to 2 (excluded).

    word[0:2]
're'

Return characters from the beginning to position 2 (excluded)

    word[:2]
're'

Returns characters from position 4 (included) to the end.

    word[4:]
'mmended'

Convert value to string using str().

    x = 10 * 3.25
    s = 'The value of x is ' + str(x) + ''
    s
'The value of x is 32.5'

String formatting

Use str.format() method.

print 'The season is {} and the weather is "{}!"'.format('summer', 'sunny')

    The season is summer and the weather is "sunny!"

Specify position using str.format method.

    print '{0} and {1}'.format('mac', 'cheese')
    print '{1} and {0}'.format('mac', 'cheese')
mac and cheese
cheese and mac

Using the semicolon format specifier.

    x = "{0:.2f}".format(10)
    ##Older way
    y = "%0.2f" % 10
    x,y
('10.00', '10.00')

Learning Resources

  • docs.python.org/2/tutorial/
  • codecademy.com/en/tracks/python
  • learnpython.org/

Thank you

Founder of Data Open Space

Data Scientist

@inspiredata

dataopenspace.github.io/studyGroup

(Mozilla Science)

Intro to Python for Data Science

By practicaldatascience

Intro to Python for Data Science

Intro to Python for Data Science

  • 572