Python 自修心得

0. 個人心得

1. Learning Python by  Mark Lutz

2. Mastering Python by Rick van Hattem

3. Data Science from Scratch

- First Principles with Python by Joel Grus 自修心得​

 

4/5/2018

about me

EE degree back in 1982

Z80 was the most popular CPU

Pascal/Fortran/COBOL were popular languages

Apple ][ + BASIC and CP/M

intel 80386SX PC mother board designer

......

Interested in Linux since 2016

Z80 CPU

intel 80386SX CPU

 

photo source: wikipedia.org

Apple ][

marconi.jiang@gmail.com

Vocaburary

Text

Python 自修心得

0. 系統

1. 其它

 

 

 

 

4/28/2018

系統定義函數 inbuilt functions

  • dir() 查詢當前系統已經定義的名稱

    • ['In', 'Out', '_', '__', '___', '__builtin__',
       '__builtins__', '__doc__', '__loader__',
       '__name__', '__package__', '__spec__',
       '_dh', '_i', '_i1', '_ih', '_ii', '_iii', '_oh',
       '_sh', 'exit', 'get_ipython', 'quit']

  • dir(__builtin__) 列出系統內置模組

    • ['ArithmeticError',
       'AssertionError',
       'AttributeError', ....]

  • id(x), id(7)

    • id() is an inbuilt function in Python. Syntax: id(object)

    • The id() function returns a unique id for the specified object. All objects in Python has its own unique id. The id is assigned to the object when it is created.

    • As we can see the function accepts a single parameter and is used to return the identity of an object. This identity has to be unique and constant for this object during the lifetime.

    • The id is the object's memory address, and will be different for each time you run the program. (except for some object that has a constant unique id, like integers from -5 to 256)

    • Two objects with non-overlapping lifetimes may have the same id() value.

系統定義函數 inbuilt functions

  • type(a) : 可觀察物件的類型

可變/不可變 mutable/immutable

  • Mutable

    • ​List,

  • Immutable

    • ​String,  Tuple

Python 自修心得

0. 其它

 

安裝 Anaconda

Anaconda

How to change the Jupyter start-up folder

  • Open cmd (or Anaconda Prompt) and run

     

  • This writes a file to

 

  • Search for the following line:

  • Replace and un-remark

     

Make sure you use forward slashes in your path and use /home/user/ instead of ~/ for your home directory

$ jupyter notebook --generate-config
c.NotebookApp.notebook_dir = '/Volumes/HDD160G/Dropbox/'
#c.NotebookApp.notebook_dir = ''
~/.jupyter/jupyter_notebook_config.py

Python 自修心得

0. 其它

 

關於 import

About import

Text

Numpy - import 方式

  • 先看一下不同的 import 方式, 造成不同的引用方式

 

 

 

 

 

 

容易出錯的引用

 

import numpy
>>> a = numpy.array([1, 2,3,4])

import numpy as np
>>> a = np.array([1, 2,3,4])

from numpy import *
>>> a = array([1, 2,3,4])
>>> a
array([1, 2, 3, 4])

>>> a.dtype
dtype('int32')
>>> a = array(1,2,3,4)    # WRONG

>>> a = array([1,2,3,4])  # RIGHT 

Python 自修心得

0. 其它

 

關於 Data Types

Python Data Types

Numeric Type 數值類型

  • int 整數 -2**31 ~ 2**31; 若超過範圍, 會自動在字尾加上 L, 標示為長整數, 實際範圍幾乎沒限制, 取決於記憶體大小
    • ​0b/0B : binary
    • 0o/0O : Octal
    • 0x/0X : Hexadecimal
  • long (floating?)
  • bool
  • complex

String Type 字串類型

Container Type 容器類型

  • list 串列 - [ ]
  • set 集合 - { , }
    • 空集合只能使用 set() 函數建立, 不能用 {}, {} 表示的是空的 dict
    • 如果要利用字串中的個別字元或串列中的元素, 同樣必須使用 set(), 而不是使用 {}
  • dict 字典 - { :  , : } mapping/映射
  • tuple 元組- ( )
  • Bytes(?)

參考資料 : Book - Python 與量化投資

>>> S0 = {}
>>> S1 = set()

>>> type(S0)
<class 'dict'>

>>> type(S1)
<class 'set'>

>>> set('Quant')
{'a', 'Q', 'u', 'n', 't')

>>> type(set('Quant'))
<class 'set'>

>>> {'Quant'}
{'Quant'}

>>> len({'Quant'})
1

>>> type({'Quant'})
<class 'set'>

Python Data Types – Learn From Basic To Advanced

關於 python 的屬性

  • Python 支援的資料結構有
    • 基本:tuple, generator, list, array (?), series (?), set, dictionary
      • tuple - (1,2,3,4)
      • generator - (ord(x) for x in 'spaam')
      • list - [ord(x) for x in 'spaam']
      • set - {ord(x) for x in 'spaam'}
      • dictionary - {x: ord(x) for x in 'spaam'}
    • 第三方常用:DataFrame, np.array
  • Pandas DataFrame
    • df.info()
    • df.columns
    • df.keys()
  • Object 又包含了什麼資料結構?

參考資料 : 淺談 Python 的屬性

關於 python 的屬性 - Tuple / Generator

  • Parentheses are used for three different things: grouping, tuple literals, an function calls.
  • Compare (1 + 2) (an integer) and (1, 2) (a tuple).
  • In the generator assignment, the parentheses are for grouping;
  • in the tuple assignment, the parentheses are a tuple literal.
  • Parentheses represent a tuple literal when they contain a comma and are not used for a function call.
  • This works since there is no way (1,2,3,4) could be a generator. There is nothing to generate there, you just specified all the elements, not a rule to obtain them.

  • In order for your generator to be a tuple, the expression (i for i in sample_list) would have to be a tuple comprehension. There is no way to have tuple comprehensions, since comprehensions require a mutable data type.

  • Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.

關於 python 的屬性 - Tuple / List (1/2)

Difference between list and tuple

  • Literal

    someTuple = (1,2)
    someList  = [1,2] 
  • Size

    a = tuple(range(1000))  # 如果是 generator, 更省 memory
    b = list(range(1000))   # c = (i for i in range(1000))
    
    a.__sizeof__() # 8024   # c.__sizeof__() # 64
    b.__sizeof__() # 9088

    Due to the smaller size of a tuple operation, it becomes a bit faster, but not that much to mention about until you have a huge number of elements.

  • Usage

    As a list is mutable, it can't be used as a key in a dictionary, whereas a tuple can be used.

    a    = (1,2)
    b    = [1,2] 
    
    c = {a: 1}     # OK
    c = {b: 1}     # Error

關於 python 的屬性 - Tuple / List (2)

  • Permitted operations

    b    = [1,2]   
    b[0] = 3       # [3, 2]
    
    a    = (1,2)
    a[0] = 3       # Error

    That also means that you can't delete an element or sort a tuple. However, you could add new element to both list and tuple with the only difference that you will change id of the tuple by adding element

    a     = (1,2)
    b     = [1,2]  
    
    id(a)          # 140230916716520
    id(b)          # 748527696
    
    a   += (3,)    # (1, 2, 3)
    b   += [3]     # [1, 2, 3]
    
    id(a)          # 140230916878160
    id(b)          # 748527696

List 與 Tuple 的特性與操作 methods 比較

list 與 tuple 的 methods 之差異,例如 list 可以 sort 排序, 而 tuple 就不能 sort 排序

  • List其特性有:
    List is a collection which is ordered and changeable. Allows duplicate members.

  • Tuple因其特性有:
    Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
    是ordered順序性與unchangeable(不可改變)的特性。

     

Python 自修心得

0. 其它

 

關於 Numpy 及 array

Numpy - import 方式

  • 先看一下不同的 import 方式, 造成不同的引用方式

 

 

 

 

 

 

容易出錯的引用

 

import numpy
>>> a = numpy.array([1, 2,3,4])

import numpy as np
>>> a = np.array([1, 2,3,4])

from numpy import *
>>> a = array([1, 2,3,4])
>>> a
array([1, 2, 3, 4])

>>> a.dtype
dtype('int32')
>>> a = array(1,2,3,4)    # WRONG

>>> a = array([1,2,3,4])  # RIGHT 

Numpy - array 矩陣

当你列印一个数组,NumPy以类似嵌套列表的形式显示它,但是呈以下布局:

  • 最后的轴从左到右打印
  • 次后的轴从顶向下打印
  • 剩下的轴从顶向下打印,每个切片通过一个空行与下一个隔开

 

>>> c = arange(24).reshape(2,3,4)         # 3d array
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]] 

>>> c
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

(2, 3, 4) = (z, y, x) 從

z =0

                 x [ 0 - 3 ]

(y=0) y*4 [a0, +1, +2, +3]

(y=1) y*4 [a4+0, +1, +2, +3]

(y=2) y*4 [a8+0, +1, +2, +3]

 

z =1 (starting from y * x )

                 x [ 0 - 3 ]

(y=0) y*4 [a12, +1, +2, +3]

(y=1) y*4 [a12+4+0, +1, +2, +3]

(y=2) y*4 [a12+8+0, +1, +2, +3

 

print 指令與 interactive 下的 array 不同表達方式

Numpy - array 矩陣

另一個例子

  • 一階的 array ( [2, 3] ) 的 shape 是 (2,)
  • 二階的 array 的 shape 是 (1, 2) 時, array 成為 ( [ [2,3] ] ). 因為是二階, 所以有 2 個 [ ], 因為, 當 shape 是 (2, 2) 時, array 就成為

( [ [0, 1],

     [2, 3] ] )

In [1]: input_data =np.array([2,3])
In [2]: input_data.reshape(1,2).shape
Out[2]: (1, 2)

In [3]: input_data.reshape(1,2)
Out[3]: array([[2, 3]])

In [4]: input_data.shape
Out[4]: (2,)

In [5]: input_data
Out[5]: array([2, 3])

In [6]: input_data * weights['node_1']
Out[6]: array([-2,  3])

Numpy - object 屬性 query

其中的 dtype 與 itemsize 隨 OS 有不同設定而有不同結果, 以我的例子結果分別是 'int64' 與 8

>>> from numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int32'
>>> a.itemsize
4
>>> a.size
15
>>> type(a)
numpy.ndarray

Numpy - object 屬性設定

複數

 

 

 

 

 

 

 

 

 

 

 

 

 

其它函数array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, rand, randn, fromfunction, fromfile参考:NumPy示例

>>> c = np.array([[1,2], [3,4]], dtype=complex)
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]]) 
>>> np.zeros( (3,4) )
array([[0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.]])
>>> np.ones( (2,3,4), dtype=int16 )       # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) )
array([[1.39069238e-309, 1.39069238e-309, 1.39069238e-309],
       [1.39069238e-309, 1.39069238e-309, 1.39069238e-309]])
>>> np.linspace(0, pi, 3)
array([0.        , 1.57079633, 3.14159265])

函数 zeros 创建一个全是0的数组,函数ones创建一个全1的数组,函数empty创建一个内容随机并且依赖与内存状态的数组。默认创建的数组类型(dtype)都是float64。

 

Numpy - 計算/統計

sum()、min()、max()

 

 

 

 

 

 

 

指定axis参数你可以吧运算应用到数组指定的轴上:

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a.sum()
66
>>> a.min()
0
>>> a.max()
11
>>> a.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> a.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> a.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]]) 

Numpy - ufunc 通用函數

NumPy提供常见的数学函数如sin,cos和exp。在NumPy中,这些叫作“通用函数”(ufunc)。在NumPy里这些函数作用按数组的元素运算,产生一个数组作为输出。

>>> b = np.arange(3)
>>> b
array([0, 1, 2])
>>> exp(b)
array([ 1.        ,  2.71828183,  7.3890561 ])
>>> sqrt(b)
array([ 0.        ,  1.        ,  1.41421356])
>>> c = np.array([2., -1., 4.])
>>> add(b, c)
array([ 2.,  0.,  6.]) 

更多函数all, alltrue, any, apply along axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, conjugate, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sometrue, sort, std, sum, trace, transpose, var, vdot, vectorize, where 参见:NumPy示例

当运算的是不同类型的数组时,结果数组和更普遍和精确的已知(这种行为叫做upcast)。

Numpy -

目前為止, 大概只有提到參考資料內容的 1/5, 待續

Numpy - array

  • F

Array

 

 

 

  • 以上例子會出現

TypeError: can't multiply sequence by non-int of type 'list'

  • 需要將 input_data 設成 np.array, 而 weights 內容不需要, 為什麼?
input_data = ([3, 5])
weights = {'node_0_0': ([2, 4]), 
           'node_0_1': ([ 4, -5]), 
           'node_1_0': ([-1,  2]), 
           'node_1_1': ([1, 2]), 
           'output': ([2, 7])}
input_data * weights['node_0_0'

TypeError: can't multiply sequence by non-int of type 'list'
input_data = np.array([3, 5])
weights = {'node_0_0': ([2, 4]), 
           'node_0_1': ([ 4, -5]), 
           'node_1_0': ([-1,  2]), 
           'node_1_1': ([1, 2]), 
           'output': ([2, 7])}

Numpy - One Hot Encoding

pd.get_dummies 與 np_utils.to_categorical 不同用法

  • from keras.utils import to_categorical
    from keras.utils import np_utils
  • import numpy

Python 自修心得

0. 其它

 

關於 Numpy 的 random 函數

 

參考資料:为什么你用不好Numpy的random函数?

Numpy - numpy 與 Python 的 random

  • From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

關於 list

  • append 跟 extend 的差異 - append

 

 

 

 

 

 

 

特別注意最後一行, 是將整個 object 當成一個 item 儲存

  • 另外, 在 append 時, 可以省略 index
The list.append method appends an object to the end of the list.

my_list.append(object) 

Whatever the object is, whether a number, a string, another list, or something else, it gets added onto the end of my_list as a single entry on the list.

>>> my_list
['foo', 'bar']
>>> my_list.append('baz')
>>> my_list
['foo', 'bar', 'baz']

So keep in mind that a list is an object. If you append another list onto a list, the first list will be a single object at the end of the list (which may not be what you want):

>>> another_list = [1, 2, 3]
>>> my_list.append(another_list)
>>> my_list
['foo', 'bar', 'baz', [1, 2, 3]]
                     #^^^^^^^^^--- single item on end of list.
bigdata = data1.append(data2, ignore_index=True)

關於 list

  • append 跟 extend 的差異 - extend

 

 

 

 

 

 

 

 

 

特別注意最後一行, 是將 'baz' 拆成 3 個 item 儲存

=> 目前為止, 我都用 append (還真不知道 extend 有什麼用途)

The list.extend method extends a list by appending elements from an iterable:

my_list.extend(iterable)
So with extend, each element of the iterable gets appended onto the list. For example:

>>> my_list
['foo', 'bar']
>>> another_list = [1, 2, 3]
>>> my_list.extend(another_list)
>>> my_list
['foo', 'bar', 1, 2, 3]
Keep in mind that a string is an iterable, so if you extend a list with a string, you'll append each character as you iterate over the string (which may not be what you want):

>>> my_list.extend('baz')
>>> my_list
['foo', 'bar', 1, 2, 3, 'b', 'a', 'z']

關於 dictionary

  • dict.keys()         # keys of dictionary
  • dict.values()      # values of dictionary
  • 如何 merge 兩個 dict, 還需花點時間了解

關於 pandas 

  • 關於 pandas 的基礎

  • Pandas 提供的資料結構
    1. Series:用來處理時間序列相關的資料(如感測器資料等),主要為建立索引的一維陣列。
    2. DataFrame:用來處理結構化(Table like)的資料,有列索引與欄標籤的二維資料集,例如關聯式資料庫、CSV 等等。
    3. Panel:用來處理有資料及索引、列索引與欄標籤的三維資料集。
    
  • Object Type
  • Indexers
  • Series
  • s.loc[indexer]
  • DataFrame
  • df.loc[row_indexer,column_indexer]
  • Panel
  • p.loc[item_indexer,major_indexer,minor_indexer]
    • 關於 pandas 的 Basics

    • As mentioned when introducing the data structures in the last section, the primary function of indexing with [ ] (a.k.a. __getitem__ for those familiar with implementing class behavior in Python) is selecting out lower-dimensional slices. The following table shows return type values when indexing pandas objects with [ ]:
    • Object Type
    • Selection
    • Return Value Type
    • Series
    • series[label]
    • scalar value
    • DataFrame
    • frame[colname]
    • Series corresponding to colname
    • Panel
    • panel[itemname]
    • DataFrame corresponding to the itemname
      • 關於 pandas

      • 12 Useful Pandas Techniques in Python for Data Manipulation
      • 1 – Boolean Indexing

      • 2 – Apply Function

      • 3 – Imputing missing files

      • 4 – Pivot Table

      • 5 – Multi-Indexing

      • 6 – Crosstab

      • 7 – Merge DataFrames

      • 8 – Sorting DataFrames

      • 9 – Plotting (Boxplot & Histogram)

      • 10 – Cut function for binning

      • 11 – Coding nominal data

      • 12 – Iterating over rows of a dataframe

      • 關於 pandas

      • df = df.drop(df[df['id'].str.contains('特殊字串')].index, 0)
      • df[df['A'].str.contains("hello")]
      • 關於 pandas

      • DataFrame 數字中內含 NaN 時,用 0 (或其它數字, 例如平均數, 取代) 
        
        
      •  
      •  
      • map 功能
      • 
        df['column'].fillna(0, inplace=True)  # can be another number instead of 0
        
        # or
        age_mean = df['age'].mean()
        df['age'] = df['age'].fillna(age_mean, inplace=True)
        
      • df['sex']= df['sex'].map({'female':0, 'male': 1}).astype(int)
        
        # or
        for i in df['Sex']:
            if i=='male':
                male.append(1)
            else:
                male.append(0)
        df['sex'] = male
      • 關於 pandas 的 get_dummies() 的功能

      • 見下一頁的 to_categorical
      • In[1]: df[:2]
        Out[1]:
        embarked    survived	pclass	sex	age	sibsp	parch	fare
        S        0	1	1	0	29.0000	0	0	211.3375
        S        1	1	1	1	0.9167	1	2	151.5500
        In[2]: x_OneHot_df = pd.get_dummies(data=df,columns=["embarked" ])
        In[3]: x_OneHot_df[:2]
        Out[3]: 	
        embarked_C	embarked_Q	embarked_S      survived	pclass	sex	age	sibsp	parch	fare
        0	0	1        0	1	1	0	29.0000	0	0	211.3375
        0	0	1        1	1	1	1	0.9167	1	2	151.5500
        
        # or my dummy way
        for i in df['Embarked']:
            if i=='C':
                embarked_from_cherbourg.append(1)
            else:
                embarked_from_cherbourg.append(0)
        
        for i in df['Embarked']:
            if i=='Q':
                embarked_from_queenstown.append(1)
            else:
                embarked_from_queenstown.append(0)
        
        for i in df['Embarked']:
            if i=='S':
                embarked_from_southampton.append(1)
            else:
                embarked_from_southampton.append(0)
        
        df['embarked_from_cherbourg'] = embarked_from_cherbourg
        df['embarked_from_queenstown'] = embarked_from_queenstown
        df['embarked_from_southampton'] = embarked_from_southampton
      • 關於 keras 的 to_categorical 的功能

      • 見上一頁的 get_dummies()
      • from numpy import array 
        from numpy import argmax 
        from keras.utils import to_categorical 
        # define example 
        data = [1, 3, 2, 0, 3, 2, 2, 1, 0, 1] 
        data = array(data) 
        print(data) 
        # one hot encode 
        encoded = to_categorical(data) 
        print(encoded) 
        # invert encoding 
        inverted = argmax(encoded[0]) 
        print(inverted) 
        
        # 原文網址:https://itw01.com/GJFRE5J.html
      • 關於 pandas

      • dataframe 的
      • ​axis = 0 指的是 index

      • axis = 1 指的是 columns

      •  
      • df.loc['2018-1-1': , ['column1', 'column2']]

      • df.iloc[0: , ['column1','column2']]

      • holding_stocks.at[i,'price']= float(stock_price['Close'])

      • df.cumsum(axis = 0)

      • df['column1']

      • df.drop('2018-1-1', index = 0)

      • df.drop('column1', index =1)

      • 關於 lambda 的使用方式

      • M
      •  
      •  
      • Text
      • bytes & bytearray 是用於處理位元組資料型態

        bytes是不可變

        bytearray是可改變

        兩個型態是保存8bit(byte)的無號整數構成的序列,範圍是0~255

        提供了很多與str類似的方法,也支援切片

        但用切片存取單一byte會回傳int物件

      • # byte.py
        w=b"abc"
        print(w[0])
        print(type(w[0]))
        print(w[:1])
        print(type(w[:1]))
        
        
        
        # byte_2.py
        w=b"\x74\x61\x69\x70\x65\x69"
        print(w)
        a=bytes.fromhex("746169706569")
        print(a)
        print(type(a))
        bytearr = bytearray(a)
        print(bytearr)
        print(type(bytearr))
        bytearr.pop()
        print(bytearr)
        bytearr.pop()
        print(bytearr)
        bytearr.pop()
        print(bytearr)
        bytearr.append(110)
        print(bytearr)
        bytearr.append(97)
        print(bytearr)
        bytearr.append(ord("n"))
        print(bytearr)
      • 關於 Sort

      • sort by values (in ascending order)
      •  
      • and display 20 highest values
      •  
      •  
      • >>> y107m01.sort_values(by='mom')
      • >>> y107m01.sort_values(by='mom').tail(20)
      • 關於 csv - 讀取 csv 檔案

      • 此程式會先把csv檔打開,之後透過csv.reader()把每一行的內容用逗號切開,回傳一個list。其實有點像是split,但是cvs.reader()會幫你把" "處理掉。這是單純用split沒辦法做到的事情。
        csv module還有提供其他好用的功能。比如說可以幫你把資料parsing成dictionary的格式,使用第一列當作dictionary的key。
      •  
      •  
      •  
      • 也可以自己指定key的名稱:
      • # -*- coding: utf-8 -*-
        import csv
        f = open('example.csv', 'r')
        for row in csv.reader(f):
            print row
        f.close()
      • # -*- coding: utf-8 -*-
        import csv
        f = open('example.csv', 'r')
        for row in csv.DictReader(f, ["日期", "成交股數", "成交金額", "成交筆數", "指數", "漲跌點數"]):
            print row['指數']
      • 關於 csv - 寫至 csv 檔案

      • dataframe 資料儲存成 csv 檔案, 如果 dataframe 已經有 index 欄位時, 就設 index=False 不需要再重新產生 index
      •  
      • 如有設定 index=True, 則於讀取時, 設定成
      • df.to_csv(file_name, encoding='utf-8', index=False)
      • df = pd.read_csv(file_name, encoding='utf-8', index_col=0)
      • 關於 print

      • Here are some common ways of doing it:
        
        1. Pass it as a tuple:
        print("Total score for %s is %s" % (name, score))
        
        2. Pass it as a dictionary:
        print("Total score for %(n)s is %(s)s" % {'n': name, 's': score})
        
        There's also new-style string formatting, which might be a little easier to read:
        
        3. Use the new-style string formatting:
        print("Total score for {} is {}".format(name, score))
        
        4. Use the new-style string formatting with numbers (useful for reordering or printing the same one multiple times):
        print("Total score for {0} is {1}".format(name, score))
        
        5. Use the new-style string formatting with explicit names:
        print("Total score for {n} is {s}".format(n=name, s=score))
        
        The clearest two, in my opinion:
        
        6.Pass the values as parameters and print will do it:
        print("Total score for", name, "is", score)
        
        7. If you don't want spaces to be inserted automatically by print in the above example, change the sep parameter:
        print("Total score for ", name, " is ", score, sep='')
        
        If you're using Python 2, won't be able to use the last two because print isn't a function in Python 2. You can, however, import this behavior from __future__:
        from __future__ import print_function
        Use the new f-string formatting in Python 3.6:
        print(f'Total score for {name} is {score}')
      • 關於 python 的異常

      • 异常名称	描述
        BaseException	所有异常的基类
        SystemExit	解释器请求退出
        KeyboardInterrupt	用户中断执行(通常是输入^C)
        Exception	常规错误的基类
        StopIteration	迭代器没有更多的值
        GeneratorExit	生成器(generator)发生异常来通知退出
        StandardError	所有的内建标准异常的基类
        ArithmeticError	所有数值计算错误的基类
        FloatingPointError	浮点计算错误
        OverflowError	数值运算超出最大限制
        ZeroDivisionError	除(或取模)零 (所有数据类型)
        AssertionError	断言语句失败
        AttributeError	对象没有这个属性
        EOFError	没有内建输入,到达EOF 标记
        EnvironmentError	操作系统错误的基类
        IOError	输入/输出操作失败
        OSError	操作系统错误
        WindowsError	系统调用失败
        ImportError	导入模块/对象失败
        LookupError	无效数据查询的基类
        IndexError	序列中没有此索引(index)
        KeyError	映射中没有这个键
        MemoryError	内存溢出错误(对于Python 解释器不是致命的)
        NameError	未声明/初始化对象 (没有属性)
        UnboundLocalError	访问未初始化的本地变量
        ReferenceError	弱引用(Weak reference)试图访问已经垃圾回收了的对象
        RuntimeError	一般的运行时错误
        NotImplementedError	尚未实现的方法
        SyntaxError	Python 语法错误
        IndentationError	缩进错误
        TabError	Tab 和空格混用
        SystemError	一般的解释器系统错误
        TypeError	对类型无效的操作
        ValueError	传入无效的参数
        UnicodeError	Unicode 相关的错误
        UnicodeDecodeError	Unicode 解码时的错误
        UnicodeEncodeError	Unicode 编码时错误
        UnicodeTranslateError	Unicode 转换时错误
        Warning	警告的基类
        DeprecationWarning	关于被弃用的特征的警告
        FutureWarning	关于构造将来语义会有改变的警告
        OverflowWarning	旧的关于自动提升为长整型(long)的警告
        PendingDeprecationWarning	关于特性将会被废弃的警告
        RuntimeWarning	可疑的运行时行为(runtime behavior)的警告
        SyntaxWarning	可疑的语法的警告
        UserWarning	用户代码生成的警告
      • How To Best Use Try Except In Python – Especially For Beginners

      • 關於 python 的疑問

      • method 的‘正確’位置擺放及是否需要括弧
      • list.method(object)

        • x.append(4)                     # 修改 x, 增加(append) [4]
      • dict2 = dict1.method()

        • z = x.copy()                      # 將 x copy 到 z, why not copy(x) or x.copy

        • df.info()                            # df (dataframe) 的 info

      • module.method

        • sys.platform                    # sys.platform() generate TypeError

      • df.method(file_name)

        • ​df.to_csv(file_name)       # why not to_csv(file_name, df) or file_name.to_csv(df)
        • pd.to_numeric(s)            # 修改 s (string) 成 numeric
      • l = method(s)​

        • ​l = len(s)                           # 返回字串 s 到長度到 l

      • x.counter = 1

        • x.counter = 1​                   # x 是 MyClass 物件下的一個 instance, not x(counter)=1

      • Learning Python

      • by Mark Lutz
      •  
      •  
      • 自修心得
      •  
      • 4/24/2018
      • 參考資料

      • Binary Bytes Files

      • Python struct module can both create and unpack packed binary data
      • list.met

      • # coding: utf-8
        # In[1]:
        import struct
        packed = struct.pack('>i4sh', 7, b'spam', 8)
        packed
        # Out[1]:
        b'\x00\x00\x00\x07spam\x00\x08'
        
        # In[2]:
        file = open('binary_data.bin', 'wb')
        file.write(packed)
        file.close()
        
        
        # In[3]:
        data = open('binary_data.bin', 'rb').read()
        data
        # Out[3]:
        b'\x00\x00\x00\x07spam\x00\x08'
        
        # In[4]:
        list(data)
        # Out[4]:
        [0, 0, 0, 7, 115, 112, 97, 109, 0, 8]
        
        # In[5]:
        struct.unpack('>i4sh', data)
        # Out[5]:
        (7, b'spam', 8)
      • Mastering Python

      • by Rick van Hattem
      •  
      • A book that can teach you about the more advanced techniques possible within Python
      • 自修心得
      •  
      • 4/5/2018
      • 參考資料

      • Chap 1 :Getting Started

      • One Environment per Project
      • Getting a virtual Python environment using venv
      • Bootstrapping pip using ensurepip
      • Installing packages based on distutils (C/C++) with pip
      • Creating a virtual Python Environment

      • venv :

        • distributed with Python 3.3, and simple, straightforward with no feature besides the bare necessities

      •  
      •  
      • or preferrably
      •  
      • virtualenv :

        • the most significant difference is the wide variety of Pythons that virtualenv supports.

        • ​Support convenient wrapper such  as virtualenvwrapper (http://virtualenvwrapper.readthedocs.org/)

      • # pyvenv test_venv
        # . ./test_venv/bin/activate
        (test_venv) #
      • # python3 -m venv test_venv
        # . ./test_venv/bin/activate
        (test_venv) #
      • Manual pip install

      • Download get-pip.py file: http://bootstrap.pypa.io/get-pip.py

      • Execute the get-pip.py file : 
      • # python get-pip.py
      • Install C/C++ packages

      • Debian and Ubuntu
      •  
      • However, this installs the development headers only ("python.h"). If you want the compiler and other headers bundled with the install, then the build-dep command is also very useful. Here is an example
      •  
      • Red Hat, CentOS, and Fedora
      •  
      •  
      • OS X / Windows : read the book
      • # sudo apt-get install python3-dev
      • # sudo apt-get build-dep python3
      • # sudo apt-get install python3-devel
        # sudo apt-get build-dep python3
      • Data Science from Scratch

      • - First Principles with Python by Joel Grus 自修心得​

      •  
      •  
      •  
      • 4/6/2018
      • Import

      • 關於 modules

      • import module 後, 需要使用 module 的功能, 就需要用 module 的名稱開頭
      •  
      • 如果你的程式恰好有用到同樣是 re 的參數, 可以改成別名
      •  
      • module 的名稱太長時, 自然地​
      •  
      • 如果只需要 module 中特定的 values 時, 
      •  
      •  
      • import re
        my_regex = re.compile("[0-9]+", re.I)
      • import regex
        my_regex = regex.compile("[0-9]+", regex.I)
      • import matplotlib.pyplot as plt
      • from collections import defaultdict, Counter
        lookup = defaultdict(int)
        my_counter = Counter()
      • 常用的 modules

      • Python 2.7 預設整數運算, 所以, 5 / 2 = 2. 我們大多會需要浮點運算, 需要 
      •  
      • 這時候, 如果需要做整數運算時, 改用雙斜線 5 // 2.
      •  
      • from __future__ import division 
      • Functions

      • Python 的 functions 是 first-class (什麼意思?), 我們可以將 function 當變數使用
      • 好奇怪的用法, 其中的 f 是 function
      •  
      •  
      •  
      • 所以, 變成這樣了, 對直覺是一大挑戰
      •  
      • def double(x):
            return x * 2
        
        def apply_to_one(f):
            return f(1)
        
        my_double = double
        x = apply_to_one(my_double)    # equals 2 
      • my_double = double
        x = apply_to_one(double(1))
      • Functions 的參數也可以有 default 值

      • 舉例如下
      •  
      •  
      •  
      •  
      • def my_print(message="my default message"):
            print message
        
        my_print("hello")     # prints 'hello'
        my_print()            # prints 'my default message'

      Python 自修心得

      By Marconi Jiang

      Python 自修心得

      Python 自修心得 0. 個人心得 1. Learning Python by Mark Lutz 2. Mastering Python by Rick van Hattem 3. Data Science from Scratch - First Principles with Python by Joel Grus

      • 317