A PYPTUG presentation
by Francois Dion ( @f_dion )
page left intentionally blank
First up, Matlab
Thanks to numpy, scipy, matplotlib(.pylab)
ipython notebook
(demo matplotlib)
Learn:
Next, SAS
SAS and the like (SPSS, minitab to a much lesser extent) have been in use for years, but there are new kids on the block that are changing things.
So, let's first talk about trends
Since 2010, R has displaced everybody else
top 8 data mining/analytics tools used in 2010 (2013)
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.
Combined with the excellent IPython toolkit and other libraries, the environment for doing data analysis in Python excels in performance, productivity, and the ability to collaborate.
& demo
pandas does not implement significant modelling functionality outside of linear and panel regression; for this, look to statsmodels and scikit-learn. More work is still needed to make Python a first class statistical modelling environment, but we are well on our way toward that goal.
In the meantime, if you want to use R from Python:
Or if you are simply missing ggplot2, see:
and cute pandas
_ = (
255,
lambda
V ,B,c
:c and Y(V*V+B,B, c
-1)if(abs(V)<6)else
( 2+c-4*abs(V)**-0.4)/i
) ;v, x=1500,1000;C=range(v*x
);import struct;P=struct.pack;M,\
j ='<QIIHHHH',open('M.bmp','wb').write
for X in j('BM'+P(M,v*x*3+26,26,12,v,x,1,24))or C:
i ,Y=_;j(P('BBB',*(lambda T:(T*80+T**9
*i-950*T **99,T*70-880*T**18+701*
T **9 ,T*i**(1-T**45*2)))(sum(
[ Y(0,(A%3/3.+X%v+(X/v+
A/3/3.-x/2)/1j)*2.5
/x -2.7,i)**2 for \
A in C
[:9]])
/9)
) )
from matplotlib import pyplot as plt import numpy as np plt.xkcd() fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.spines['right'].set_color('none') ax.spines['top'].set_color('none') plt.xticks([]) plt.yticks([]) ax.set_ylim([-30, 10]) data = np.ones(100) data[70:] -= np.arange(30) plt.annotate( 'THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED', xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10)) plt.plot(data) plt.xlabel('time') plt.ylabel('my overall health')
Glumpy - http://code.google.com/p/glumpy - a small python library that uses OpenGL for the rapid vizualization of (mainly two dimensional) numpy arrays. Not so much for nice figures for inclusion in a scientific article, more for rapid vizualization of your running simulation.
pyqtgraph - http://luke.campagnola.me/code/pyqtgraph/ - Pure-python graphics and GUI library for scientific/engineering applications based on PyQt and numpy. This library provides fast plotting and image/video display, multidimensional image slicing, volumetric / isosurface rendering, interactive data manipulation tools, and a variety of Qt widgets including an editable property tree, visual programming flowchart, and gradient editor.
WrapITK - http://code.google.com/p/wrapitk/ - interface ITK http://itk.org and several languages, with a particular focus on python. ITK module used with python interpreter is particulary useful for quick and easy prototyping of image analysis procedures. Some glue classes allow to efficiently pass data to others modules like NumPy or VTK.
Plotly - https://plot.ly/- is a collaborative graphing and analytics platform. The web app has an online Python sandbox - NumPy supported - and grid for data analysis. The Plotly graphing library produces graphs that are interactive, publication quality, and browser-based. Graphs can be styled with Python or a GUI, shared, embedded, and exported.
PyAlgoViz - http://pyalgoviz.appspot.com/ - Python Algorithm Visualizations done in Python running in the browser. As you can see, you can interface Python with javascript visualization libraries. And speaking of javascript...
The folks at Trifacta are making it easy to build visualizations on top of D3 with Vega. Vincent makes it easy to build Vega with Python.
Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients.
demo
A PYPTUG presentation
by Francois Dion ( @f_dion )Tera scale? Giga scale? Hahaha.
Exa scale? Peta scale? OK, it is, what next?
Roll your own?
Look at hardware acceleration
Cloud based?
Time to look at Manta:
Or Hadoop:
(you'll be doing java too...)