Text
Do you have Tsundoku?
What is Tsundoku(積ん読)?
buying books and not reading them
Jisho.org
Compound of 積む (tsumu, “to pile up”) + 読 (doku, “reading”), punning on 積んどく (tsundoku), contraction of 積んでおく
(tsunde oku, “to leave piled up”).
Wikitionary
Do you read books on tablet?
I have so many kindle books.
(3000+ books including samples, free books)
I have so many unread kindle books.
It’s hard to find books on Kindle UI.
No API for Kindle books.
No OAuth
Build an app to manage kindle books
Steps to scrape kindle books data
I split this part in a library(gem).
Check out amazon_auth gem.
Steps to scrape kindle books data
I build a library for this part.
Check out kindle_manager gem.
I suppose users update data periodically(daily). This library will
Helper for page loading
def wait_for_selector(selector, options = {})
options.fetch(:wait_time, 3).times do
if session.first(selector)
break
else
sleep(1)
end
end
end
attr_accessor :session
def initialize(options)
@session = options.fetch(:session, nil)
@options = options
...
end
def doc
Nokogiri.HTML(session.html)
end
def number_of_fetched_books
# Capybara method
wait_for_selector('.contentCount_myx')
# Nokogiri method
text = doc.css('.contentCount_myx').text
...
end
class BooksAdapter < BaseAdapter
def load_next_kindle_list
wait_for_selector('.contentCount_myx')
current_loop = 0
while current_loop <= max_scroll_attempts
if limit && limit < number_of_fetched_books
break
elsif has_more_button?
snapshot_page
current_loop = 0
log "Clicking 'Show More'"
show_more_button.click
else
log "Loading books with scrolling #{current_loop+1}"
session.execute_script "window.scrollBy(0,10000)"
end
sleep fetching_interval
current_loop += 1
end
snapshot_page
end
Code is’t clear because
Steps to scrape kindle books data
class BooksParser
def initialize(filepath, options = {})
@filepath = filepath
end
def doc
@doc ||= Nokogiri::HTML(body)
end
def body
@body ||= File.read(@filepath)
end
class BooksParser
def parse
@_parsed ||= doc.css("div[id^='contentTabList_']").map{|e| BookRow.new(e) }
end
class BookRow
def initialize(node)
@node = node
end
def asin
@_asin ||= @node['name'].gsub(/\AcontentTabList_/, '')
end
def title
@_title ||= @node.css("div[id^='title']").text
end
Steps to scrape kindle books data
@parser.parse.first.asin
#=> "B004YW6M6G"
@parser.parse.first.title
#=> "Design Patterns in Ruby"
puts @parser.parse.to_json
#=> [{"asin":"B004YW6M6G","title":"Design Patterns in Ruby", ...
I assume this library is used in private projects/machines.
It doesn’t have strong protection of credentials.
There is a tool called envchain which works with macOS Keychain.
This can be used as an alternative of dotenv.
Reading status, rating, public notes
The site for Kindle notes and highlights is closing
(August 1st, originally July 3rd)
New site for Kindle notes and highlights
I have an app to collect kindle highlights
heroku buildpacks:add https://github.com/heroku/heroku-buildpack-chromedriver
heroku buildpacks:add https://github.com/heroku/heroku-buildpack-xvfb-google-chrome
I have some experience of capybara/testing.
Ask me later if you have questions something like
session = Capybara::Session.new(:chrome)
# login
session.visit 'https://github.com/login'
session.fill_in 'login_field', with: ''
session.fill_in 'password', with: ''
session.click_on 'Sign in'
# store cookies
data = Marshal.dump session.driver.browser.manage.all_cookies
File.open('all_cookies.txt', 'wb') {|f| f.write(data)}
session.driver.quit
session = Capybara::Session.new(:chrome)
# First visit is required before restoring cookies
session.visit 'https://github.com/'
# restore cookies
data = File.read('all_cookies.txt')
Marshal.load(data).each do |d|
session.driver.browser.manage.add_cookie d
end
session.visit session.current_url
# Store 'session.driver.browser.manage.all_cookies' into json column
cookies_from_db.each do |d|
# :name needs to be symbol on 'add_cookie'
d.symbolize_keys!
# :expires needs to be Time class
d[:expires] = Time.parse(d[:expires]) if d[:expires]
session.driver.browser.manage.add_cookie d
end