Reversing the Python Data Analysis Lens

Cam Davidson-Pilon

 

Cameron Davidson-Pilon

  • Author of Bayesian Methods for Hackers (print version just released)
  • Data lead at Shopify

   @cmrn_dp

What is this talk about?

2-spaces vs 4-spaces indents?

topic modelling Python source files

what is the most controversial 

Python answer on StackOverflow?

data mining requirements.txt files

is flat really better than nested?

follow along at 

bit.ly/pycon2015-cdp

Walmart and Basket Analysis

N products => about N**2 pairs

Plus I want some guarantee of significance 

co-occurences in requirements.txt files

# after scraping thousands of repos, my dataset looks like this:

library1 = {matplotlib, numpy, scipy, lifelines}
library2 = {django, pyscopg2, django-markdown, lxml}   # django and pyscopg2    
library3 = {kombu, django, pyscopg2}                   # django and pyscopg2   
library4 = {django, naked, jsmin, jsbeautifier}        # django only
...  


| starting_with | ending_with      | probability     |
|---------------|------------------|-----------------|
| django        | psycopg2         |  0.411569638909 |
| django        | gunicorn         |  0.320191599116 |
| django        | dj-database-url  |  0.263448784083 |
| django        | six              |  0.245394252027 |
| django        | requests         |  0.243920412675 |
| django        | wheel            |  0.22402358143  |
\text{probability} = P(\; \cdot \; | \; \text{django present} )
probability=P(django present)\text{probability} = P(\; \cdot \; | \; \text{django present} )

Pipp

 

pip install pipp

"Get annoying recommendations after pip"

Demo?

 

Based on these relationships, can we create a network of Python packages?

Island of Data Analysis

Island of Annoying Your Friends

Celery Peninsula

Django Ecosystem

see more at

bit.ly/pycon2015-cdp

The plural of anecdote is data!

*so long as you can quickly back it up with thousands of additional anecdotes.


stackoverflow.com/questions/19622133    1173
stackoverflow.com/questions/279237       887
stackoverflow.com/questions/5658622      320
stackoverflow.com/questions/22019341     134
stackoverflow.com/questions/35817        117
stackoverflow.com/questions/1769332       89
stackoverflow.com/questions/377017        86
stackoverflow.com/questions/1189781       73
stackoverflow.com/questions/4124220       70
stackoverflow.com/questions/701802        66

Top 10 Occurrences

Speaking of StackOverflow...

what is the most controversial Python answer on StackOverflow?

Querying StackOverflow Data

declare @VoteStats table (parentid int, id int, U float, D float) 

insert @VoteStats
SELECT 
  a.parentid,
  a.id,
  CAST(SUM(case when (VoteTypeID = 2) then 1. else 0. end) + 1. as float) as U,
  CAST(SUM(case when (VoteTypeID = 3) then 1. else 0. end) + 1. as float) as D
FROM Posts q
JOIN PostTags qt 
  ON qt.postid = q.ID
JOIN Tags T 
  ON T.Id = qt.TagId
JOIN Posts a 
  ON q.id = a.parentid
JOIN Votes 
  ON Votes.PostId = a.Id
WHERE TagName  = 'python'
   and a.PostTypeID = 2 -- these are answers
Group BY a.id, a.parentid

set nocount off

SELECT 
 TOP 100
 parentid,
 id,
 U, D,
 ABS(0.5 - U/(U+D) - 3.5*SQRT(U*D / ((U+D) * (U+D) * (U+D+1)))) + 
   ABS(0.5 - U/(U+D) + 3.5*SQRT(U*D / ((U+D) * (U+D) * (U+D+1)))) as Score
FROM @VoteStats 
ORDER BY Score 

| parentid | id      | U   | D  | Score             |
|----------|---------|-----|----|-------------------|
| 1641219  | 1641305 | 100 | 58 | 0.267581687129904 |
| 366980   | 367082  | 55  | 29 | 0.360985397926758 |
| 904928   | 904941  | 44  | 40 | 0.379197639329681 |
| 1641219  | 1945699 | 49  | 23 | 0.382002382488145 |
| 734368   | 734910  | 48  | 30 | 0.38315203605798  |
| 7479442  | 7479473 | 46  | 23 | 0.394405318873308 |
| 620367   | 620397  | 42  | 24 | 0.411383595098925 |
| 969285   | 969324  | 49  | 20 | 0.420289855072464 |
| 1566266  | 1566285 | 39  | 24 | 0.424918292799399 |

http://stackoverflow.com/questions/<id>

see more at

bit.ly/pycon2015-cdp

Python Debates

 

Let's stop arguing on the internet and instead look at the data.

2-space vs 4-space indents?

Other?

8-spaces

Other?

8-spaces

wut?

Other?

1-space

wut?

what is the most popular testing library?

is flat better than nested?

Let's look at the "deepest import" we can find in a library. 

import os                       # 1
import java.utils as utils      # 2
from scipy.stats import beta    # 3

# maximum is 3 here

Call this number the maximum import depth


from com.sun.org.apache.xerces.internal.impl.io import \
            MalformedByteSequenceException

from com.sun.org.apache.xerces.internal.impl.io import \
            MalformedByteSequenceException

wut?

Functional Python

 

15% of Python repos import some library that implements some sort of functional syntax.

That includes imports of functools, itertools, toolz and other common functional libraries.

topic modelling .py files

Breakfast

Dessert

Cafe

topic 1

topic 2

topic 3

"This restaurant had great eggs, pancakes and coffee, and I loved their staff!"

80% topic 1

  0% topic 2

20% topic 3

"breakfast",

"eggs",

"tea", ...

"icing",

"apple",

"cake", ...

"coffee",

"cafe",

"baked", ...

"This restaurant had great eggs, pancakes and coffee, and I loved their staff!"

"We go here every Sunday, football and heroes - no further explanation necessary"

"It's a Hibachi restaurant, nothing special but it's good for groups, even without reservations, the entertainment will occupy the kids and the food is pretty good. I've been twice with my crew and we weren't…"

"This is probably one of the best sandwich spots around. They are what you imagine a dream sandwich to be."

"The bomb is THE BOMB! I did feel a little bloated from all the salt in the deli meats after I ate it, but it was well worth it.  We had it to share between two people and one half was more than enough for myself (along with potato chips of course)."

"I got the Nettie and it was the most glorious combination of fried chicken, bacon, and biscuit I've ever eaten"

"Welcoming atmosphere quaint restaurant  tucked away on s 2nd Street in Williamsburg brooklyn."

"First off, do not attempt to drive in this area, it took us only an hour to find parking. "

"Plated looked gorgeous, but sadly the food was mediocre at best. I was hugely disappointed and didn't know what the buzz was all about. "

"Great little restaurant, check in for a free beer! The decor is very bright with vibrant colors. "

Reviews

LDA

Topic 1: sauce, meal, meat, salad, side

Topic 2: egg, breakfast, bacon, juice, fruit

Topic 3: owner, year, family, business, company

...

...

let's apply this to Python files

from cryptography.fernet import Fernet

from confidant import app
from confidant import log


class CipherManager:
    '''
    Class for encrypting and decrypting strings.
    cipher = CipherManager(key)
    encrypted_text = cipher.encrypt('hello world')
    decrypted_text = cipher.decrypt(encrypted_text)
    '''
    def __init__(self, key, version=2):
        self.key = key
        self.version = version
import boto3
import logging
import redis

from flask import Flask
from flask.ext.session import Session
from flask_sslify import SSLify
from confidant import lru
from confidant import settings
"LRU cache"
import collections


class LRUCache(object):
    """
    Cheap LRU cache implementation.
    """

    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = collections.OrderedDict()

    def __contains__(self, key):
        return key in self.cache

    def __getitem__(self, key):
        value = self.cache.pop(key)
        self.cache[key] = value
        return value

    def __setitem__(self, key, value):
        try:
            self.cache.pop(key)
        except KeyError:
            if len(self.cache) >= self.capacity:
                self.cache.popitem(last=False)
        self.cache[key] = value
def get_key_arn(key_alias):
    if key_alias not in KEY_METADATA:
        KEY_METADATA[key_alias] = kms.describe_key(
            KeyId='alias/{0}'.format(key_alias)
        )
    return KEY_METADATA[key_alias]['KeyMetadata']['Arn']


def get_key_id(key_alias):
    if key_alias not in KEY_METADATA:
        KEY_METADATA[key_alias] = kms.describe_key(
            KeyId='alias/{0}'.format(key_alias)
        )
    return KEY_METADATA[key_alias]['KeyMetadata']['KeyId']

LDA

???

python, version, package, author, setup, description, language, copyright, packages, license

setup.py topic?

unittest topic?

test, equal, case, tests, foo, unittest, equals, suite, result, expected

matplotlib topic?

grid, color, plot, plt, label, step, data, width, ax, size

django config topic?

root, app, project, module, admin, application, django, url, site, static

license topics?

license, gnu, public, general, it, free, software, version, you, without

and

license, hep, apache, distributed, may, kind, basis, you, law, licensed

some unfortunate timezone topic

dt, datetime, iso, pipeline, time, zone, dst, timezone, utc, locale

annoying Python packaging topic?

version, dist, info, install, egg, download, pkg, path, os, distribution

see more at

bit.ly/pycon2015-cdp

(ns, store, software, party, other, no, without, oss, copyright, notice)
(troop, theater, no, ants, agent, food, colony, center, armor, water)
(req, callback, alias, cv, aliases, stat, mode, bookmark, gid, callbacks)
(it, are, an, will, on, you, we, can, use, which)
(ctypes, void, function, pointer, uint, char, device, struct, stream, info)
(url, folder, question, answer, urllib, data, title, survey, poll, questions)
(item, items, key, count, pos, curr, data, append, add, keys)
(grid, color, plot, plt, label, step, data, width, ax, size)
(view, photo, url, views, title, metrics, context, detail, photos, django)
(proxy, service, dbus, uri, handler, rpc, response, json, request, content)
(gl, goal, shader, https, twitter, com, cons, rb, targets, tex)
(code, constraint, owner, term, rec, cost, cursor, connection, constraints, balance)
(atk, role, attrs, condition, security, model, arch, modifier, text, attr)
(models, db, field, django, fields, blank, length, char, null, key)
(gtk, event, gdk, window, position, screen, move, start, view, drag)
(config, data, section, train, configuration, cfg, app, training, load, yaml)
(function, args, value, param, action, default, string, add, kwargs, arg)
(atom, spec, team, atoms, energy, jet, depth, pix, species, ring)
(letter, latin, small, de, capital, mark, al, flow, el, la)
(style, frame, freeze, left, right, ctx, font, cell, layer, top)
(info, stop, snd, recipe, trip, trips, disk, registry, ingredient, vm)
(text, attr, pattern, font, re, match, sub, stroke, tag, isalpha)
(graph, merge, add, builder, head, nodes, merged, ghost, problem, node)
(access, acc, permission, titulo, user, impl, read, gadget, ebook, broadcast)
(batch, song, artist, bab, nl, fare, sent, posts, notification, placemark)
(end, var, start, assign, variable, book, joint, gen, begin, variables)
(api, status, http, request, exception, nova, uri, auth, response, headers)
(test, equal, case, tests, foo, unittest, equals, suite, result, expected)
(time, queue, sleep, last, schedule, job, start, rate, total, duration)
(request, actor, profile, post, objects, user, django, form, data, redirect)
(com, http, www, google, https, url, org, res, title, gdata)
(key, params, data, string, md, param, private, hs, signature, digest)
(db, database, sql, conn, execute, select, cursor, table, engine, ds)
(branch, pdf, version, tags, master, git, rev, svn, revision, tag)
(random, sample, function, number, exp, df, samples, labels, math, mean)
(wx, menu, chooser, text, button, freeze, window, on, show, parent)
(root, app, project, module, admin, application, django, url, site, static)
(mock, call, patch, value, anim, parent, called, obj, effect, magic)
(pango, gtk, send, protocol, reactor, transport, my, gio, gdk, control)
(line, lines, pos, append, end, start, parse, match, string, re)
(np, array, data, shape, matrix, numpy, vector, axis, dtype, fit)
(tmp, path, os, ftp, url, robot, local, bcolors, host, cmd)
(char, cms, fields, field, django, agency, db, models, closure, blank)
(visit, write, read, writer, visitor, data, tabla, authorizer, v, last)
(path, os, join, filename, files, sys, directory, py, paths, build)
(field, value, fields, message, short, reserved, reserved, enum, string, obj)
(path, source, repo, cache, sentence, repository, target, vcs, diff, we)
(domain, faction, program, x, gain, andrew, solve, boxed, solver, quest)
(points, edge, point, material, swig, p, vertex, nx, me, p)
(content, lang, string, chrome, title, net, bee, browser, en, web)
(license, hep, apache, distributed, may, kind, basis, you, law, licensed)
(favicon, title, html, doc, last, source, document, url, cv, default)
(request, response, url, client, json, http, data, auth, status, headers)
(python, version, package, author, setup, description, language, copyright, packages, license)
(key, val, rule, plugin, keys, cert, plugins, certificate, value, rules)
(da, setuptools, py, broker, numero, data, ser, include, src, spider)
(entry, feed, gd, board, entries, timeline, user, link, recent, text)
(oprot, iprot, ftype, begin, write, i, fid, binary, end, i)
(report, controller, overlay, banner, script, presentation, slot, policy, bul, bih)
(ac, ic, ph, uk, qmx, nr, susy, i, icf, nz)
(lock, channel, cluster, release, ts, acquire, timeout, locked, leaf, clusters)
(version, dist, info, install, egg, download, pkg, path, os, distribution)
(account, wall, a, sticky, salir, accounts, a, test, management, plone)
(clutter, output, stdout, shell, stderr, subprocess, out, pipe, sys, fd)
(wifi, phone, thumbnails, provider, permission, bluetooth, credential, read, wake, providers)
(dt, datetime, iso, pipeline, time, zone, dst, timezone, utc, locale)
(license, gnu, public, general, it, free, software, version, you, without)
(cloud, inst, node, i, i, i, i, i, i, i)
(search, re, match, results, ss, binding, cc, fac, catalog, outcome)
(token, resource, data, tokens, scope, append, types, current, serial, stream)
(state, block, current, volume, stream, states, cs, blocks, snapshot, node)
(permission, social, texto, preference, frm, sms, verticalalignment, sa, change, alembic)
(op, expr, opcode, expression, accumulator, constants, nrt, operator, r, string)
(user, session, username, password, login, email, group, auth, perm, automated)
(model, query, meta, kwargs, instance, objects, relation, field, value, base)
(text, conf, lexer, php, turtle, keyword, z, border, java, marker)
(address, ip, network, ipv, ipv, net, packet, route, addr, dns)
(page, template, context, html, render, request, td, title, tr, tmpl)
(value, other, values, data, feature, key, number, mask, array, val)
(flags, video, flag, stats, component, ant, key, room, playlist, keymap)
(element, namespace, elements, text, attributes, link, append, attrs, base, rel)
(core, location, val, st, trajectory, ob, tool, roles, xx, portal)
(extension, category, product, order, price, descriptor, ids, number, message, default)
(player, env, players, command, corpus, fs, multiplayer, ticket, instance, attack)
(cache, region, mc, dump, geo, review, pickle, key, cached, regions)
(num, local, count, hand, collatz, files, number, backup, data, numbers)
(email, mail, subject, com, automated, sender, message, attach, mime, transitive)
(factory, trans, article, zope, articles, zpl, dato, fixer, i, transform)
(row, cell, col, csv, tls, media, rows, reader, sheet, private)
(issue, rhs, sem, lhs, nueva, ex, delegate, issues, mezzanine, seat)
(django, middleware, contrib, static, template, node, com, debug, media, path)
(module, moved, moves, urllib, bag, acute, tkinter, attribute, func, grave)
(comment, movie, contact, genre, people, person, key, cb, destination, identifier)
(group, caf, ret, opt, dns, options, opts, driver, wire, notes)
(index, pose, guess, trp, tab, quark, tt, theming, indexes, setp)
(port, instance, host, param, instances, console, ip, ports, switch, security)
(settings, fm, audio, scan, setting, scanner, messenger, monitor, framebuffer, transitfeed)
(x, x, ff, latin, x, x, control, bit, tile, x)
(function, obj, cls, context, module, attribute, method, member, schema, agent)
(node, tree, tag, child, parent, parse, value, root, children, parser)
(args, task, options, sys, parser, command, cmd, option, argv, add)
(form, record, game, forms, data, value, records, you, required, submit)
(socket, server, connection, log, timeout, handler, thread, logging, logger, ssl)
(gtk, widget, text, button, box, label, layout, icon, pango, qt)
(android, bundle, sync, bundles, music, activity, ordered, stub, permission, os)
(ns, fila, nt, av, std, add, station, data, member, const)
(word, words, level, outfile, mesh, export, dic, common, rmax, nltk)
(struct, data, uint, length, header, pack, offset, unpack, bit, byte)
(color, div, html, css, js, span, body, li, origin, red)
(manager, eq, uuid, formatter, tools, tc, torrent, surf, enemy, logger)
(cur, rating, archive, wsdl, soap, oth, grades, grade, snap, xmlsoap)
(cairo, bucket, const, reg, redis, function, ins, exchange, i, grp)
(pointer, image, uint, void, width, height, color, size, font, img)
(encoding, module, string, decode, encode, utf, errors, tb, exc, source)
(url, headers, message, msg, header, http, content, path, scheme, host)
(date, time, datetime, year, day, month, speed, days, start, hour)
(pyxb, sequence, seq, gene, stage, expanded, symbol, transitions, queen, motor)
(update, last, upload, target, metadata, tracker, nick, sp, s, progress)
(table, column, soup, markup, data, string, columns, nullable, html, key)
(size, unsigned, storage, chunk, stream, void, buf, buffer, share, read)

Python developers are incredibly diverse!

(ns, store, software, party, other, no, without, oss, copyright, notice)
(troop, theater, no, ants, agent, food, colony, center, armor, water)
(req, callback, alias, cv, aliases, stat, mode, bookmark, gid, callbacks)
(it, are, an, will, on, you, we, can, use, which)
(ctypes, void, function, pointer, uint, char, device, struct, stream, info)
(url, folder, question, answer, urllib, data, title, survey, poll, questions)
(item, items, key, count, pos, curr, data, append, add, keys)
(grid, color, plot, plt, label, step, data, width, ax, size)
(view, photo, url, views, title, metrics, context, detail, photos, django)
(proxy, service, dbus, uri, handler, rpc, response, json, request, content)
(gl, goal, shader, https, twitter, com, cons, rb, targets, tex)
(code, constraint, owner, term, rec, cost, cursor, connection, constraints, balance)
(atk, role, attrs, condition, security, model, arch, modifier, text, attr)
(models, db, field, django, fields, blank, length, char, null, key)
(gtk, event, gdk, window, position, screen, move, start, view, drag)
(config, data, section, train, configuration, cfg, app, training, load, yaml)
(function, args, value, param, action, default, string, add, kwargs, arg)
(atom, spec, team, atoms, energy, jet, depth, pix, species, ring)
(letter, latin, small, de, capital, mark, al, flow, el, la)
(style, frame, freeze, left, right, ctx, font, cell, layer, top)
(info, stop, snd, recipe, trip, trips, disk, registry, ingredient, vm)
(text, attr, pattern, font, re, match, sub, stroke, tag, isalpha)
(graph, merge, add, builder, head, nodes, merged, ghost, problem, node)
(access, acc, permission, titulo, user, impl, read, gadget, ebook, broadcast)
(batch, song, artist, bab, nl, fare, sent, posts, notification, placemark)
(end, var, start, assign, variable, book, joint, gen, begin, variables)
(api, status, http, request, exception, nova, uri, auth, response, headers)
(test, equal, case, tests, foo, unittest, equals, suite, result, expected)
(time, queue, sleep, last, schedule, job, start, rate, total, duration)
(request, actor, profile, post, objects, user, django, form, data, redirect)
(com, http, www, google, https, url, org, res, title, gdata)
(key, params, data, string, md, param, private, hs, signature, digest)
(db, database, sql, conn, execute, select, cursor, table, engine, ds)
(branch, pdf, version, tags, master, git, rev, svn, revision, tag)
(random, sample, function, number, exp, df, samples, labels, math, mean)
(wx, menu, chooser, text, button, freeze, window, on, show, parent)
(root, app, project, module, admin, application, django, url, site, static)
(mock, call, patch, value, anim, parent, called, obj, effect, magic)
(pango, gtk, send, protocol, reactor, transport, my, gio, gdk, control)
(line, lines, pos, append, end, start, parse, match, string, re)
(np, array, data, shape, matrix, numpy, vector, axis, dtype, fit)
(tmp, path, os, ftp, url, robot, local, bcolors, host, cmd)
(char, cms, fields, field, django, agency, db, models, closure, blank)
(visit, write, read, writer, visitor, data, tabla, authorizer, v, last)
(path, os, join, filename, files, sys, directory, py, paths, build)
(field, value, fields, message, short, reserved, reserved, enum, string, obj)
(path, source, repo, cache, sentence, repository, target, vcs, diff, we)
(domain, faction, program, x, gain, andrew, solve, boxed, solver, quest)
(points, edge, point, material, swig, p, vertex, nx, me, p)
(content, lang, string, chrome, title, net, bee, browser, en, web)
(license, hep, apache, distributed, may, kind, basis, you, law, licensed)
(favicon, title, html, doc, last, source, document, url, cv, default)
(request, response, url, client, json, http, data, auth, status, headers)
(python, version, package, author, setup, description, language, copyright, packages, license)
(key, val, rule, plugin, keys, cert, plugins, certificate, value, rules)
(da, setuptools, py, broker, numero, data, ser, include, src, spider)
(entry, feed, gd, board, entries, timeline, user, link, recent, text)
(oprot, iprot, ftype, begin, write, i, fid, binary, end, i)
(report, controller, overlay, banner, script, presentation, slot, policy, bul, bih)
(ac, ic, ph, uk, qmx, nr, susy, i, icf, nz)
(lock, channel, cluster, release, ts, acquire, timeout, locked, leaf, clusters)
(version, dist, info, install, egg, download, pkg, path, os, distribution)
(account, wall, a, sticky, salir, accounts, a, test, management, plone)
(clutter, output, stdout, shell, stderr, subprocess, out, pipe, sys, fd)
(wifi, phone, thumbnails, provider, permission, bluetooth, credential, read, wake, providers)
(dt, datetime, iso, pipeline, time, zone, dst, timezone, utc, locale)
(license, gnu, public, general, it, free, software, version, you, without)
(cloud, inst, node, i, i, i, i, i, i, i)
(search, re, match, results, ss, binding, cc, fac, catalog, outcome)
(token, resource, data, tokens, scope, append, types, current, serial, stream)
(state, block, current, volume, stream, states, cs, blocks, snapshot, node)
(permission, social, texto, preference, frm, sms, verticalalignment, sa, change, alembic)
(op, expr, opcode, expression, accumulator, constants, nrt, operator, r, string)
(user, session, username, password, login, email, group, auth, perm, automated)
(model, query, meta, kwargs, instance, objects, relation, field, value, base)
(text, conf, lexer, php, turtle, keyword, z, border, java, marker)
(address, ip, network, ipv, ipv, net, packet, route, addr, dns)
(page, template, context, html, render, request, td, title, tr, tmpl)
(value, other, values, data, feature, key, number, mask, array, val)
(flags, video, flag, stats, component, ant, key, room, playlist, keymap)
(element, namespace, elements, text, attributes, link, append, attrs, base, rel)
(core, location, val, st, trajectory, ob, tool, roles, xx, portal)
(extension, category, product, order, price, descriptor, ids, number, message, default)
(player, env, players, command, corpus, fs, multiplayer, ticket, instance, attack)
(cache, region, mc, dump, geo, review, pickle, key, cached, regions)
(num, local, count, hand, collatz, files, number, backup, data, numbers)
(email, mail, subject, com, automated, sender, message, attach, mime, transitive)
(factory, trans, article, zope, articles, zpl, dato, fixer, i, transform)
(row, cell, col, csv, tls, media, rows, reader, sheet, private)
(issue, rhs, sem, lhs, nueva, ex, delegate, issues, mezzanine, seat)
(django, middleware, contrib, static, template, node, com, debug, media, path)
(module, moved, moves, urllib, bag, acute, tkinter, attribute, func, grave)
(comment, movie, contact, genre, people, person, key, cb, destination, identifier)
(group, caf, ret, opt, dns, options, opts, driver, wire, notes)
(index, pose, guess, trp, tab, quark, tt, theming, indexes, setp)
(port, instance, host, param, instances, console, ip, ports, switch, security)
(settings, fm, audio, scan, setting, scanner, messenger, monitor, framebuffer, transitfeed)
(x, x, ff, latin, x, x, control, bit, tile, x)
(function, obj, cls, context, module, attribute, method, member, schema, agent)
(node, tree, tag, child, parent, parse, value, root, children, parser)
(args, task, options, sys, parser, command, cmd, option, argv, add)
(form, record, game, forms, data, value, records, you, required, submit)
(socket, server, connection, log, timeout, handler, thread, logging, logger, ssl)
(gtk, widget, text, button, box, label, layout, icon, pango, qt)
(android, bundle, sync, bundles, music, activity, ordered, stub, permission, os)
(ns, fila, nt, av, std, add, station, data, member, const)
(word, words, level, outfile, mesh, export, dic, common, rmax, nltk)
(struct, data, uint, length, header, pack, offset, unpack, bit, byte)
(color, div, html, css, js, span, body, li, origin, red)
(manager, eq, uuid, formatter, tools, tc, torrent, surf, enemy, logger)
(cur, rating, archive, wsdl, soap, oth, grades, grade, snap, xmlsoap)
(cairo, bucket, const, reg, redis, function, ins, exchange, i, grp)
(pointer, image, uint, void, width, height, color, size, font, img)
(encoding, module, string, decode, encode, utf, errors, tb, exc, source)
(url, headers, message, msg, header, http, content, path, scheme, host)
(date, time, datetime, year, day, month, speed, days, start, hour)
(pyxb, sequence, seq, gene, stage, expanded, symbol, transitions, queen, motor)
(update, last, upload, target, metadata, tracker, nick, sp, s, progress)
(table, column, soup, markup, data, string, columns, nullable, html, key)
(size, unsigned, storage, chunk, stream, void, buf, buffer, share, read)

Thank you!

@cmrn_dp

References

  1. "Probability/Combinatorics." Wikibooks, The Free Textbook Project. 30 Jan 2015, 16:13 UTC. 12 Oct 2015, 17:19 <https://en.wikibooks.org/w/index.php?title=Probability/Combinatorics&oldid=2760103>.
  2. Sandulescu, Vlad. "Predicting What User Reviews Are about with LDA and Gensim." Predicting What User Reviews Are about with LDA and Gensim. 19 Sept. 2014. Web. 4 Nov. 2015.

Turning the data analysis lens around

By Cam DP

Turning the data analysis lens around

Python is used for analysis an enormous variety datasets, however we've never turned that data analysis lens around and examined Python developers! In this presentation, we'll examine, through data, how Python developers write code.

  • 1,665