Python 4 Stock

by Marconi Jiang

4/7/2018

about me

EE degree back in 1982

Z80 was the most popular CPU

Pascal/Fortran/COBOL were popular languages

Apple ][ + BASIC and CP/M

intel 80386SX PC mother board designer

......

Interested in Linux since 2016

Z80 CPU

intel 80386SX CPU

photo source: wikipedia.org

Apple ][

marconi.jiang@gmail.com

參考資料

超簡單用python抓取每月營收

Vocaburary

Text

長假這幾天, 看到這網站

FinLab超簡單用python抓取每月營收

閒來沒事,

花一點時間 K Python,

可以更有效率的分析股市

遇到了幾個問題

原始程式

到底是用什麼資料結構？ pandas 的 DataFrame

用了中文當 index, 可能是編碼的問題, 無法找到相對應的 columns. 就將 columns 改成英文

用 mom 做 sorting 時, 怪怪的, 原來為了節省空間, DataFrame 盡量以同一種 data type 儲存, 所以, 都是 text string

import pandas as pd
import requests
from io import StringIO
import time
def monthly_report(year, month):
    
    # 假如是西元，轉成民國
    if year > 1990:
        year -= 1911
    
    url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html'
    if year <= 98:
        url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'.html'
    
    # 偽瀏覽器
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    # 下載該年月的網站，並用pandas轉換成 dataframe
    r = requests.get(url, headers)
    r.encoding = 'big5'
    html_df = pd.read_html(StringIO(r.text))
    
    # 處理一下資料
    if html_df[0].shape[0] > 500:
        df = html_df[0].copy()
    else:
        df = pd.concat([df for df in html_df if df.shape[1] <= 11])
    df = df[list(range(0,10))]
    column_index = df.index[(df[0] == '公司代號')][0]
    df.columns = df.iloc[column_index]
    df['當月營收'] = pd.to_numeric(df['當月營收'], 'coerce')
    df = df[~df['當月營收'].isnull()]
    df = df[df['公司代號'] != '合計']
    
    # 偽停頓
    time.sleep(5)
    return df

>>> y107m02=monthly_report(107, 2)
>>> y107m02.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 844 entries, 5 to 983
Data columns (total 10 columns):
公司代號         844 non-null object
公司名稱         844 non-null object
當月營收         844 non-null float64
上月營收         844 non-null object
去年當月營收       844 non-null object
上月比較增減(%)    843 non-null object
去年同月增減(%)    844 non-null object
當月累計營收       844 non-null object
去年累計營收       844 non-null object
前期比較增減(%)    844 non-null object
dtypes: float64(1), object(9)
memory usage: 72.5+ KB

y107m02.columns =(['id', 'name', 'm_rev', 'm-1_rev', 'y-1_rev', 'mom', 'yoy', 'cum_rev', 'y-1_cum_rev', 'cum_yoy'])

資料格式轉換

用 mom 做 sorting 時, 怪怪的, 原來為了節省空間, DataFrame 盡量以同一種 data type 儲存, 所以, 都是 text string
一開始, 找到的是 Stackoverflow 的 Converting strings to floats in a DataFrame, 使用了 pd.to_numeric(s, errors='ignore'),
但是不適用於 DataFrame, 僅能用於 list, tuple, 1-d array, or Series, 詳見 pandas.to_numeric

得用 pd.convert_objects(convert_numeric=True) 才行, 未來會推薦 infer_objects()

import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')

>>> y107m02.to_numeric(s, errors='ignore')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-37d52fefc6f9> in <module>()
----> 1 y107m02.to_numeric(s, errors='ignore')

/Volumes/HDD160G/Applications/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
   3612             if name in self._info_axis:
   3613                 return self[name]
-> 3614             return object.__getattribute__(self, name)
   3615 
   3616     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'to_numeric'

>>> y107m02.convert_objects(convert_numeric=True)
/Volumes/HDD160G/Applications/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: convert_objects is deprecated.  To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  """Entry point for launching an IPython kernel.

資料格式轉換 - infer_objects()

推薦的 infer_objects() 目前為止還試不成功, 暫時先放一邊

讀取股價

也是從這網站開始 FinLab 用數學計算日馳何時崩盤,

但是出現錯誤 “ModuleNotFoundError: No module named 'pandas_datareader' Stackoverflow 是我 debug 的最愛, 到 CLI 去執行以下指令, 也把 Anaconda 也一併結束再開, 以確保 modules 有 load 進來
再執行還是出問題, 這次是 Yahoo API 的問題

還好有解 Yahoo! Finance Fix for Pandas Datareader
先在 CLI 執行
再修改 Anaconda 程式

$ conda install -c anaconda pandas-datareader

# 取得股價
from pandas_datareader import data # pip install pandas_datareader
import matplotlib.pyplot as plt    # pip install matplotlib
import pandas as pd                # pip install pandas
%matplotlib inline

data = data.DataReader("1526.tw", "yahoo", "2017-04-01","2018-01-10")
c = data['Close']
c.plot()

ImmediateDeprecationError: 
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.

See https://github.com/pydata/pandas-datareader/issues

from pandas_datareader import data as pdr

import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)

# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")

$ pip install fix_yahoo_finance --upgrade --no-cache-dir

Python 爬股市資料

- 上市/上櫃/興櫃公司營業額資料

- yahoo 股價歷史資料

- 上市/上櫃選定日期本益比資料

- 上市選定日期當天股價及交易資訊

- 上櫃選定日期當天股價及交易資訊

MOPS 上市公司營業額資料

MOPS 讀取程式（只顯示部份, 需向下瀏覽）

民國 99 年以後的網站 http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html
例子：http://mops.twse.com.tw/nas/t21/sii/t21sc03_107_2_0.html

民國 99 年以後的網站 http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'.html

# function to download MOPS 上市公司資料 
def monthly_report(year, month):
    
    # 假如是西元，轉成民國
    if year > 1990:
        year -= 1911
    
    url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html'
    if year <= 98:
        url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'.html'
    
    # 偽瀏覽器
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    
    # 下載該年月的網站，並用pandas轉換成 DataFrame
    r = requests.get(url, headers)
    r.encoding = 'big5'
    html_df = pd.read_html(StringIO(r.text))
    
    # 處理一下資料
    if html_df[0].shape[0] > 500:
        df = html_df[0].copy()
    else:
        df = pd.concat([df for df in html_df if df.shape[1] <= 11])
    df = df[list(range(0,10))]
    column_index = df.index[(df[0] == '公司代號')][0]
    df.columns = df.iloc[column_index]
    df['當月營收'] = pd.to_numeric(df['當月營收'], 'coerce')
    df = df[~df['當月營收'].isnull()]
    df = df[df['公司代號'] != '合計']
    
    # 偽停頓
    time.sleep(5)
    df.columns =(['id', 'name', 'm_rev', 'm-1_rev', 'y-1_rev', 'mom', 'yoy', 
                 'cum_rev', 'y-1_cum_rev', 'cum_yoy'])
    df = df.convert_objects(convert_numeric=True)
    return df

上櫃/興櫃營收資料何處尋

簡單, 只要將 MOPS 網站上的 sii 改成 otc 或 rotc 即可
http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html

yahoo 股價歷史資料

股價讀取程式（只顯示部份, 需點擊進入瀏覽）

主要幾個 ticker
- 恆生指數 ^HSI
  臺灣加權 ^TWII
  上海綜合(000001.SS)
- 美國道瓊 ^DJI
  美國NASDAQ ^IXIC
  那斯達克(^IXIC)
  費城半導體指數(^SOX)
- 其它各國指數代碼
- 搜尋所有 yahoo 代碼查詢
- 在 google search “台指期代號 yahoo finance” 有些不錯的網頁可以參考

# 取得股價
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)

import matplotlib.pyplot as plt    # pip install matplotlib
import pandas as pd                # pip install pandas
%matplotlib inline

companytwid = '2330.tw'
stockprices = pdr.get_data_yahoo(companytwid, start="2017-01-01",end="2018-04-08")

c = stockprices['Close']
c.plot()
# 結束取得股價

篩選本益比股票

基本上這個 code 可以直接複製貼上就可以用了！

我們可以把df印出來，可以看到我們有了所有上市上櫃的股票，當天所有的資料！
接下來就是選股時間，我們想要選擇本益比 < 15 的所有股票

import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '}) 
                                     for i in r.text.split('\n') 
                                     if len(i.split('",')) == 17 and i[0] != '='])), header=0)

df[pd.to_numeric(df['本益比'], errors='coerce') < 15]

參考資料：超簡單台股每日爬蟲教學

上市選定日期當天股價及交易資訊

基本上這個 code 可以直接複製貼上就可以用了！

我們可以把df印出來，可以看到我們有了所有上市上櫃的股票，當天所有的資料！
接下來就是選股時間，我們想要選擇當天漲幅超過 1% 所有上市上櫃股票

import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '}) 
                                     for i in r.text.split('\n') 
                                     if len(i.split('",')) == 17 and i[0] != '='])), header=0)

df[(df['漲跌(+/-)'] == '+') & 
   ((pd.to_numeric(df['漲跌價差'], errors='coerce')/pd.to_numeric(df['收盤價'], errors='coerce')) > 0.01)]

參考資料：超簡單台股每日爬蟲教學

上櫃選定日期當天股價及交易資訊

基本上這個 code 可以直接複製貼上就可以用了！

我們可以把df印出來，可以看到我們有了所有上市上櫃的股票，當天所有的資料！
接下來就是選股時間，我們想要選擇當天漲幅超過 1% 所有上市上櫃股票

import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '}) 
                                     for i in r.text.split('\n') 
                                     if len(i.split('",')) == 17 and i[0] != '='])), header=0)

df[(df['漲跌(+/-)'] == '+') & 
   ((pd.to_numeric(df['漲跌價差'], errors='coerce')/pd.to_numeric(df['收盤價'], errors='coerce')) > 0.01)]

參考資料：Python上櫃資料爬蟲輕鬆做

用 Python 理財：打造小資族選股策略

學習心得

安裝 TA-lib https://github.com/mrjbq7/ta-lib

you need to have the ta-lib underlying C library
- (you can get this by brew install ta-lib if you are using Homebrew http://brew.sh/, or
- by downloading ta-lib-0.4.0-src.tar.gz from http://ta-lib.org/hdr_dw.html and running ./configure; make; make install).
then install on Python (or Anaconda)
- Anaconda 的 virtual 環境 default 捨得是 'base'

參考文件：

macbookpro:~ apple$ source activate base
(base) macbookpro:~ apple$ pip install ta-lib

在 Ubuntu (Windows/VirtualBox 的虛擬環境）

建立 Anaconda 的

Python 開發環境

學習紀錄

2018/09/15

Step by Step - Anaconda Installation

Step 1: Install Ubuntu under Windows VirtualBox
- 提高 display 解析度
Step 2: Install Dropbox
- 透過命令列以「斷頭」方式安裝 Dropbox
- Dropbox menu "Start Dropbox on system startup
Step 3: Install Anaconda
- Download Anaconda
- 安裝
- Close and open your terminal window for the installation to take effect, or you can enter the command
- After your install is complete, verify it by opening Anaconda Navigator, a program that is included with Anaconda: Open a Terminal window and type anaconda-navigator.

$ cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf -

$ ~/.dropbox-dist/dropboxd

$ bash ~/Downloads/Anaconda3-5.2.0-Linux-x86_64.sh

$ source ~/.bashrc

$ anaconda-navigator

Step by Step - Ubuntu 中文設定

Ubuntu 中文輸入設定
- 到 Ubuntu 右上角的 En 點選 TextEntrySetting
- 增加 + Chinese (Bopomofo)(IBus), 或是其他輸入法
- 用 Super+space 來切換中英文 (Super 指的是 Windows鍵)
- 按 shift 可以暫時切換到中文/英文模式
- Sublimes 無法接受中文設定, 需要複雜設定, 暫不處理
新增中文字型
- https://www.pinyinjoe.com/linux/ubuntu-10-chinese-fonts-openoffice-language-features.htm
- 到 Adobe 下載字型 https://source.typekit.com/source-han-serif/tw/#get-the-fonts
搞了老半天, 以上所做都是白工, 參考此篇文章即可
原來只要下載 SimHei.ttf, 儲存到 anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf 目錄下
修改 anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/matplotlibrc
- font.family : sans-serif 將 # 去除
- font.sans-serif : 在最前面增加 SimHei
删除 ~/.matplotlib目录
重启IDE工具，第一次使用Matplotlib的时候会重建~/.matplotlib目录下所有内容。

$ cd ~/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf
$ wget https://github.com/StellarCN/scp_zh/blob/master/fonts/SimHei.ttf
$ vi ~/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/matplotlibrc

Step by Step - Anaconda Set up for Python

Step 4: Install Python packages on Aanconda
更新 anaconda
帳號管理及紀律.ipynb 所需 package
- Install Pandas
  - 出現 import is_list_like error, 修改 fred.py
- install fix_yahoo_fianance
  - 出現 No module named 'fix_yahoo_finance', 執行以下 pip 指令
20180522 growth.ipynb 所需 package
- install numpy

$ conda update conda
$ conda update anaconda-navigator
$ conda update navigator-updater

$ conda install -c anaconda pandas-datareader
$ vi ~/anaconda3/lib/python3.7/site-packages/pandas_datareader/fred.py
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
$ conda install numpy

Stock Market Predictions with LSTM in Python

DataCamp

Python 大盤分析

- 計算大盤何時崩盤

計算大盤何時崩盤

套用台股大盤資料, 程式儲存於 /Dropbox/StockPythonSource/lppl4twii.py
在4 月初算出來的結果, 預計在 8 月初崩盤 (見每日股市紀錄), 現在, 五月初再算一次, 還需要 100 天, 大概是 10 月份才崩盤
資料蒐集是從 2017/1/1 開始, 覺得那時已經起漲, 可能需要再將時間拉長, 拉長至 2016/1/1 重做一次, 結果詭異, 算出來是 398 天, 可是已經過了 520 天, 意思是 6 個月（2017/12）就該崩盤
試著只蒐集一年的資料, 從 2017/5/4 到 2018/5/4, 得出的結果是 314 天, 已過了 245 天, 預計 71 天後崩盤, 大約在 8 月底

參考資料：用數學計算日馳何時崩盤！原始程式在讀取 yahoo 資料部份需要修改

Python 4 Stock

about me

參考資料

Vocaburary

長假這幾天, 看到這網站

FinLab超簡單用python抓取每月營收

閒來沒事,

花一點時間 K Python,

可以更有效率的分析股市

遇到了幾個問題

資料格式轉換

資料格式轉換 - infer_objects()

讀取股價

Python 爬股市資料

MOPS 上市公司營業額資料

上櫃/興櫃營收資料何處尋

yahoo 股價歷史資料

篩選本益比股票

上市選定日期當天股價及交易資訊

上櫃選定日期當天股價及交易資訊

用 Python 理財：打造小資族選股策略

安裝 TA-lib https://github.com/mrjbq7/ta-lib

在 Ubuntu (Windows/VirtualBox 的虛擬環境）

建立 Anaconda 的

Python 開發環境

Step by Step - Anaconda Installation

Step by Step - Ubuntu 中文設定

Step by Step - Anaconda Set up for Python

Stock Market Predictions with LSTM in Python

Python 大盤分析

計算大盤何時崩盤

Python 運用於股市

Python 運用於股市

Marconi Jiang

Python 4 Stock

about me

參考資料

Vocaburary

長假這幾天, 看到這網站

FinLab超簡單用python抓取每月營收

閒來沒事,

花一點時間 K Python,

可以更有效率的分析股市

遇到了幾個問題

資料格式轉換

資料格式轉換 - infer_objects()

讀取股價

Python 爬股市資料

MOPS 上市公司營業額資料

上櫃/興櫃營收資料何處尋

yahoo 股價歷史資料

篩選本益比股票

上市選定日期當天股價及交易資訊

上櫃選定日期當天股價及交易資訊

用 Python 理財：打造小資族選股策略

安裝 TA-lib https://github.com/mrjbq7/ta-lib

在 Ubuntu (Windows/VirtualBox 的虛擬環境）

建立 Anaconda 的

Python 開發環境

Step by Step - Anaconda Installation

Step by Step - Ubuntu 中文設定

Step by Step - Anaconda Set up for Python

Stock Market Predictions with LSTM in Python

Python 大盤分析

計算大盤何時崩盤

Python 運用於股市

Python 運用於股市

Marconi Jiang

More from Marconi Jiang