by Marconi Jiang
4/7/2018
EE degree back in 1982
Z80 was the most popular CPU
Pascal/Fortran/COBOL were popular languages
Apple ][ + BASIC and CP/M
intel 80386SX PC mother board designer
......
Interested in Linux since 2016
Z80 CPU
intel 80386SX CPU
photo source: wikipedia.org
Apple ][
marconi.jiang@gmail.com
Text
import pandas as pd
import requests
from io import StringIO
import time
def monthly_report(year, month):
# 假如是西元,轉成民國
if year > 1990:
year -= 1911
url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html'
if year <= 98:
url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'.html'
# 偽瀏覽器
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
# 下載該年月的網站,並用pandas轉換成 dataframe
r = requests.get(url, headers)
r.encoding = 'big5'
html_df = pd.read_html(StringIO(r.text))
# 處理一下資料
if html_df[0].shape[0] > 500:
df = html_df[0].copy()
else:
df = pd.concat([df for df in html_df if df.shape[1] <= 11])
df = df[list(range(0,10))]
column_index = df.index[(df[0] == '公司代號')][0]
df.columns = df.iloc[column_index]
df['當月營收'] = pd.to_numeric(df['當月營收'], 'coerce')
df = df[~df['當月營收'].isnull()]
df = df[df['公司代號'] != '合計']
# 偽停頓
time.sleep(5)
return df>>> y107m02=monthly_report(107, 2)
>>> y107m02.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 844 entries, 5 to 983
Data columns (total 10 columns):
公司代號 844 non-null object
公司名稱 844 non-null object
當月營收 844 non-null float64
上月營收 844 non-null object
去年當月營收 844 non-null object
上月比較增減(%) 843 non-null object
去年同月增減(%) 844 non-null object
當月累計營收 844 non-null object
去年累計營收 844 non-null object
前期比較增減(%) 844 non-null object
dtypes: float64(1), object(9)
memory usage: 72.5+ KBy107m02.columns =(['id', 'name', 'm_rev', 'm-1_rev', 'y-1_rev', 'mom', 'yoy', 'cum_rev', 'y-1_cum_rev', 'cum_yoy'])
一開始, 找到的是 Stackoverflow 的 Converting strings to floats in a DataFrame, 使用了 pd.to_numeric(s, errors='ignore'),
但是不適用於 DataFrame, 僅能用於 list, tuple, 1-d array, or Series, 詳見 pandas.to_numeric
得用 pd.convert_objects(convert_numeric=True) 才行, 未來會推薦 infer_objects()
import pandas as pd
s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)
s = pd.Series(['apple', '1.0', '2', -3])
pd.to_numeric(s, errors='ignore')
pd.to_numeric(s, errors='coerce')>>> y107m02.to_numeric(s, errors='ignore')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-4-37d52fefc6f9> in <module>()
----> 1 y107m02.to_numeric(s, errors='ignore')
/Volumes/HDD160G/Applications/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
3612 if name in self._info_axis:
3613 return self[name]
-> 3614 return object.__getattribute__(self, name)
3615
3616 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'to_numeric'
>>> y107m02.convert_objects(convert_numeric=True)
/Volumes/HDD160G/Applications/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: convert_objects is deprecated. To re-infer data dtypes for object columns, use DataFrame.infer_objects()
For all other conversions use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
"""Entry point for launching an IPython kernel.推薦的 infer_objects() 目前為止還試不成功, 暫時先放一邊
$ conda install -c anaconda pandas-datareader# 取得股價
from pandas_datareader import data # pip install pandas_datareader
import matplotlib.pyplot as plt # pip install matplotlib
import pandas as pd # pip install pandas
%matplotlib inline
data = data.DataReader("1526.tw", "yahoo", "2017-04-01","2018-01-10")
c = data['Close']
c.plot()ImmediateDeprecationError:
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.
See https://github.com/pydata/pandas-datareader/issues
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
data = pdr.get_data_yahoo("SPY", start="2017-01-01", end="2017-04-30")$ pip install fix_yahoo_finance --upgrade --no-cache-dir- 上市/上櫃/興櫃公司營業額資料
- yahoo 股價歷史資料
- 上市/上櫃選定日期本益比資料
- 上市選定日期當天股價及交易資訊
- 上櫃選定日期當天股價及交易資訊
# function to download MOPS 上市公司資料
def monthly_report(year, month):
# 假如是西元,轉成民國
if year > 1990:
year -= 1911
url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'_0.html'
if year <= 98:
url = 'http://mops.twse.com.tw/nas/t21/sii/t21sc03_'+str(year)+'_'+str(month)+'.html'
# 偽瀏覽器
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
# 下載該年月的網站,並用pandas轉換成 DataFrame
r = requests.get(url, headers)
r.encoding = 'big5'
html_df = pd.read_html(StringIO(r.text))
# 處理一下資料
if html_df[0].shape[0] > 500:
df = html_df[0].copy()
else:
df = pd.concat([df for df in html_df if df.shape[1] <= 11])
df = df[list(range(0,10))]
column_index = df.index[(df[0] == '公司代號')][0]
df.columns = df.iloc[column_index]
df['當月營收'] = pd.to_numeric(df['當月營收'], 'coerce')
df = df[~df['當月營收'].isnull()]
df = df[df['公司代號'] != '合計']
# 偽停頓
time.sleep(5)
df.columns =(['id', 'name', 'm_rev', 'm-1_rev', 'y-1_rev', 'mom', 'yoy',
'cum_rev', 'y-1_cum_rev', 'cum_yoy'])
df = df.convert_objects(convert_numeric=True)
return df
# 取得股價
from pandas_datareader import data as pdr
import fix_yahoo_finance as yf
yf.pdr_override() # <== that's all it takes :-)
import matplotlib.pyplot as plt # pip install matplotlib
import pandas as pd # pip install pandas
%matplotlib inline
companytwid = '2330.tw'
stockprices = pdr.get_data_yahoo(companytwid, start="2017-01-01",end="2018-04-08")
c = stockprices['Close']
c.plot()
# 結束取得股價
接下來就是選股時間,我們想要選擇 本益比 < 15 的所有股票
import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '})
for i in r.text.split('\n')
if len(i.split('",')) == 17 and i[0] != '='])), header=0)df[pd.to_numeric(df['本益比'], errors='coerce') < 15]參考資料:超簡單台股每日爬蟲教學
接下來就是選股時間,我們想要選擇 當天漲幅超過 1% 所有上市上櫃股票
import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '})
for i in r.text.split('\n')
if len(i.split('",')) == 17 and i[0] != '='])), header=0)df[(df['漲跌(+/-)'] == '+') &
((pd.to_numeric(df['漲跌價差'], errors='coerce')/pd.to_numeric(df['收盤價'], errors='coerce')) > 0.01)]
參考資料:超簡單台股每日爬蟲教學
接下來就是選股時間,我們想要選擇 當天漲幅超過 1% 所有上市上櫃股票
import requests
from io import StringIO
import pandas as pd
import numpy as np
datestr = '20180131'
r = requests.post('http://www.twse.com.tw/exchangeReport/MI_INDEX?response=csv&date=' + datestr + '&type=ALL')
df = pd.read_csv(StringIO("\n".join([i.translate({ord(c): None for c in ' '})
for i in r.text.split('\n')
if len(i.split('",')) == 17 and i[0] != '='])), header=0)df[(df['漲跌(+/-)'] == '+') &
((pd.to_numeric(df['漲跌價差'], errors='coerce')/pd.to_numeric(df['收盤價'], errors='coerce')) > 0.01)]
參考資料:Python上櫃資料爬蟲輕鬆做
學習心得
macbookpro:~ apple$ source activate base
(base) macbookpro:~ apple$ pip install ta-lib學習紀錄
2018/09/15
Close and open your terminal window for the installation to take effect, or you can enter the command
After your install is complete, verify it by opening Anaconda Navigator, a program that is included with Anaconda: Open a Terminal window and type anaconda-navigator.
$ cd ~ && wget -O - "https://www.dropbox.com/download?plat=lnx.x86_64" | tar xzf -
$ ~/.dropbox-dist/dropboxd$ bash ~/Downloads/Anaconda3-5.2.0-Linux-x86_64.sh$ source ~/.bashrc$ anaconda-navigator删除 ~/.matplotlib目录
重启IDE工具,第一次使用Matplotlib的时候会重建~/.matplotlib目录下所有内容。
$ cd ~/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/fonts/ttf
$ wget https://github.com/StellarCN/scp_zh/blob/master/fonts/SimHei.ttf
$ vi ~/anaconda3/lib/python3.7/site-packages/matplotlib/mpl-data/matplotlibrc
$ conda update conda
$ conda update anaconda-navigator
$ conda update navigator-updater
$ conda install -c anaconda pandas-datareader
$ vi ~/anaconda3/lib/python3.7/site-packages/pandas_datareader/fred.py
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
$ conda install numpy- 計算大盤何時崩盤
參考資料:用數學計算日馳何時崩盤!原始程式在讀取 yahoo 資料部份需要修改