Why Keep a Budget?
I came to Cluj in 2011, to college.
The first month, I expected to have a lot of expenses.
By the end of the second month:
Where's all my money?
I had X at the beginning of the month and now I have X - X = 0
I had to do something
Not good Enough
Meet ReceiptBudgetA web app written in Python
- Adding expenses by just taking a photo of the receipt
- all the details, none of the work
- Much more detailed reports
- visualizing expenses on a map
- slicing and dicing data based on month, day, shop
- And backwards compatible with what I had until now
THE OCR ENGINE
- had to be custom built
- receipts are space constrained, so the font is usually very squeezed
- Tesseract (best free alternative) was hit-and-miss - sometimes it worked, sometimes it didn't
- I developed a custom tool just to gather all the data for training
- the image is preprocessed
- edges removed
- uses Random Forests for character segmentation and a linear SVM for character recognition
- accuracy: ~85%
- if one digit in a date is not recognized correctly
- "23/10/13" => "S3/10/13"
- if one letter in TOTAL is changed, how to tell it's not an item?
- TOTA - brand of professional photography lamps
The goal is to get some insight into spending patterns
If I know when and where I usually spend more money, I can start doing something about it
(at least be more careful when walking past my favorite shop)
The most expenses were at Kaufland it's to be expected, it's a grocery store
But the second most expensive one is the area around my university
I often go to the nearby store to buy snacks
If I were to buy them somewhere else, I could probably save some money
- I spend a lot of money on Mondays
- very little on Saturdays
- There is a peak at the beginning of the month
- probably taxes and rent
- I spend a lot of money in odd places
- the Unknown column, where I didn't write down the place of spending
- deep learning for the OCR engine
- Google's doing it, so it must be good
- use a probabilistic classifier instead of a rule based one for understanding line contents
- dashboard should make predictions, not just give reports
Results after experimenting with restricted boltzmann machines