So, recently as the Egyptian pound has continued its relentless devaluation journey against the US dollar, I have found myself taking more care with spending. As my credit card bill mushroomed, I decided to get a better understanding of our spending habits. The first step to action is knowledge, and I needed to know what was eating up most of my income. Naturally, as a computer geek, my first impulse was to write a program to solve my issue — and that is exactly what I have done.

As I am currently away from my computers, I decided to try my hand at writing a python script that I could run on an excel file of my credit card bill. For that, I turned to my favorite iOS and iPad python IDE, Pythonista.

My bank, for some reason best known to its management team, exports all numbers in their downloadable credit card bill in text format. Also, they have many unnecessary columns in the file — I only need three, the date the transaction was made, the name of the merchant and the amount. So I cleaned up the excel file a bit by removing the unnecessary columns.

In addition, the excel file uses negative numbers to represent amounts of money that I spent, and positive numbers to represent the amounts I deposited — for example payments I made towards my bill, and transfer of loyalty points into cash. Since the purpose of the exercise was to know what I was spending on, I removed these positive numbers from the file.

I was now ready to write the code that would allow me to see what I was spending on. Here it is

import re 
import openpyxl
from collections import defaultdict
from openpyxl.utils import get_column_letter


infile = 'Data.xlsx'
outfile = 'data_categorized.xlsx'


workbook = openpyxl.load_workbook(infile)
sheet = workbook.active

amazonInsta = re.compile(r'(amazon|instashop|carrefour|talabat|breadfast)', re.IGNORECASE)
pharma = re.compile(r'(mostafa|neqabty|paymob|GEIDE)', re.IGNORECASE)
subenter = re.compile(r'(apple|netflix|starzplay)', re.IGNORECASE)
subsoftware = re.compile(r'(kindle|paypal|adobe|IFTTT|overleaf|Microsoft|google)', re.IGNORECASE)
phone = re.compile(r'(orange|etisalat)', re.IGNORECASE)
Uber = re.compile(r'(uber)', re.IGNORECASE)

# Create a new workbook
new_workbook = openpyxl.Workbook()

# Select the first sheet
new_sheet = new_workbook.active

# Write the headers
new_sheet['A1'] = 'Date'
new_sheet['B1'] = 'Merchant'
new_sheet['C1'] = 'Amount'
new_sheet['D1'] = 'Category'

i=0
for row in sheet.iter_rows(min_row=2, values_only=True):
    date, merchant, amount = row
    if amazonInsta.search(merchant):
        category = 'General Shopping'
    elif pharma.search(merchant):
    		category = 'Health'
    elif Uber.search(merchant):
    		category = 'Uber'
    elif subenter.search(merchant):
    		category = 'Entertainment'
    elif subsoftware.search(merchant): 
    		category = 'Software'
    elif phone.search(merchant):
    		category = 'Phone Bills'
  # Add additional if/elif blocks to match other categories
    else:
        category = 'Other'
 
    # Prepare the data to be written into categories
    new_sheet.cell(row=i+2, column=1, value=date)
    new_sheet.cell(row=i+2, column=2, value=merchant)
    new_sheet.cell(row=i+2, column=3, value=amount)
    new_sheet.cell(row=i+2, column=4, value=category)
    i=i+1
    
for column in new_sheet.columns:
    column_letter = get_column_letter(column[0].column)
    new_sheet.column_dimensions[column_letter].auto_size = True
    
new_workbook.save(outfile)

# Initialize a dictionary to store the category totals
totals = defaultdict(float)

# Iterate through the rows and sum the amounts by category
for row in new_sheet.iter_rows(min_row=2, values_only=True):
    date, merchant, amount, category = row
    if category in totals:
        totals[category] += float(amount)
    else:
        totals[category] = float(amount)

# Print the category totals
for category, total in totals.items():
    print(f'{category}: ${total*-1:.2f}')
    
    
import matplotlib.pyplot as plt

# Convert the totals dictionary to a list of tuples
totals_list = list(totals.items())

# Sort the list by total amount in descending order
totals_list.sort(key=lambda x: x[1], reverse=True)

# Extract the categories and amounts as separate lists
categories = [x[0] for x in totals_list]
amounts = [x[1] for x in totals_list]

# Convert the totalss to positive numbers
posamounts = [-x for x in amounts]

# Create a pie chart
fig, ax = plt.subplots()
ax.pie(posamounts, labels=categories, autopct='%1.1f%%')
ax.set_title('Spending by Category')

# Show the plot
plt.show()

I tried to make the code as self-explanatory as possible with comments everywhere, but I will quickly go over it here again. First, I import the openpyxl library to be able to manipulate excel files from within Pythonista, and then I import a couple of utilities that allow me to work with dictionaries and named columns.

I then read the file from the current directory, and set the sheet to be the active sheet of the opened file. I then define some regular expressions to allow me to categorize my spending habits, you can modify this part to whatever makes sense for your personal spending habits.

The next bit of code creates a new excel file, with a fourth column category, this is the output excel file that I create with the spending data categorized so that I can conduct any analysis that comes to my mind later. I then iterate over all the rows, starting at row two (this assumes that row one contains the headers of the columns and no actual data).

While iterating over the rows, I run the regular expressions over the merchant column, the column that indicates who I made the payment to, and based on what regular expression matches it, I assign it a category. Then I place this data into the new sheet I created and save it to the output excel file. I now have an excel file with all my spending categorized. I can do whatever analysis I want on this data, but before leaving python, I decided to plot a pie chart of my spending habit to get a quick idea of where most of money was going.

I did this by first defining a dictionary to hold the category totals — that is, the total amount spent on each category. I next iterated over each row of the new excel file and added up the amount column into the appropriate entry in the dictionary. You will notice that I cast the data to float here — remember that my bank stores all the numbers in the report as a string.

Once this is done, I print out the numerical amount for each category and then use matplotlib to draw a pie chart of the data. The only interesting thing here is that I convert the amounts to positive values as my bank reports spending as negative numbers.

And that’s it ladies and gents, a quick and dirty method to get an overview of your spending habits on your iPad to iPhone, just get the excel file containing the data into the same folder as the Pythonista script above and run it, you will have an instant breakdown of your spending habits — what you do with this information is totally up to you, but I strongly advise identifying where most of your spending goes and trying to cut out non-essentials in that category.

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.