✅Fetch data from database:
✅We will be using SQL queries to get the Data from the Database .
✅We will also create a connection to Database and store the data in a Data Frame df.
In a Colab code cell, run this:
1import pandas as pd2import matplotlib.pyplot as plt3import matplotlib.patches as mpatches4
5# -------------------------- 6# Connect to Database and Fetch Raw Data7# --------------------------8
9query = """10SELECT 11 job_title,12 location,13 job_type14FROM seek_jobs15WHERE location IS NOT NULL 16 AND job_type IS NOT NULL17"""18
19conn = connect_to_db(DATABASE_URL)20df = pd.read_sql(query, conn)21conn.close()✅Classify Roles and Filter for "Data Analyst"
✅We will all the related key word to Data Analyst and then fetch the all related jobs
In a Colab code cell, run this:
1# -------------------------- 2# Classify Roles and Filter for "Data Analyst"3# --------------------------4
5role_keywords = {6 "Data Analyst": ["data analyst", "data"],7}8
9def classify_role(title):10 title = title.lower()11 for role, keywords in role_keywords.items():12 if any(keyword in title for keyword in keywords):13 return role14 return "Other"15
16df['Role'] = df['job_title'].apply(classify_role)17df = df[df['Role'] == "Data Analyst"]✅This part of code Normalize the location which consist of various small cities into Major City.
In a Colab code cell, run this:
1# -------------------------- 2# Normalize City Names3# --------------------------4
5major_cities = [6 "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS",7 "Darwin NT", "Sydney NSW", "Melbourne VIC", "Canberra ACT"8]9
10def extract_city(location):11 if "," in location:12 city = location.split(",")[-1].strip()13 else:14 city = location.strip()15 return city if city in major_cities else "Other"16
17df['clean_city'] = df['location'].apply(extract_city)✅Normalize the Job Type.
✅There are many job types but we will consider only with Full Time, Contract/Temp, Part time and rest all in others.
In a Colab code cell, run this:
1# -------------------------- 2# Normalize the Job Type3# --------------------------4
5# Consolidate job types into standard categories6def normalize_job_type(job_type):7 job_type = job_type.lower()8 if "full" in job_type:9 return "Full-time"10 elif "part" in job_type:11 return "Part-time"12 elif "contract" in job_type or "temp" in job_type:13 return "Contract/Temp"14 else:15 return "Other"16
17# Create a new column with the cleaned job type18df['job_type_clean'] = df['job_type'].apply(normalize_job_type)✅We will group and pivot the data to
✅we will also use normalized job_types instead of original one as we need to focused regarding jobs names.
In a Colab code cell, run this:
1# -------------------------- 2# Group and Pivot Data3# --------------------------4
5# Use the normalized job type instead of the original one6grouped = df.groupby(['clean_city', 'job_type_clean']).size().reset_index(name='count')7
8pivot = grouped.pivot(index='clean_city', columns='job_type_clean', values='count').fillna(0)9pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]✅Plot bar chart with value annotations
✅Create a bar chart for job types for various cities(Full-time, Contract/Temp, Part-time, Others)
✅Add job count text annotations to each bar segment
✅Add legend with the count of job type
In a Colab code cell, run this:
1# -------------------------- 2# Plot the Stacked Bar Chart3# --------------------------4
5# Pick colors for commonly expected job types6colors = {7 'Full-time': '#C4409D',8 'Contract/Temp': '#7EC8E3',9 'Part-time': '#F4B400',10 'Other': '#B0B0B0'11}12
13# Use available types from data14available_types = list(pivot.columns)15color_list = [colors.get(t, '#999999') for t in available_types]16
17# Calculate totals for each job type for the legend18type_totals = pivot.sum()19
20# Create figure and axes21fig, ax = plt.subplots(figsize=(20, 14))22
23# Plot the stacked bar chart24pivot.plot(kind='bar', stacked=True, ax=ax, color=color_list)25
26# Title and labels27plt.title("Employment Type Breakdown for Data Analyst Roles (by City)\n05-Mar-2025 to 01-May-2025", fontsize=30)28plt.xlabel("City", fontsize=30)29plt.ylabel("Number of Jobs", fontsize=30)30plt.xticks(rotation=45, ha='right', fontsize=30)31plt.yticks(fontsize=12)32
33# Add count labels inside bars34for i, city in enumerate(pivot.index):35 y_offset = 036 for job_type in pivot.columns:37 count = pivot.loc[city, job_type]38 if count > 0:39 ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=20, color='black')40 y_offset += count41
42# Create enhanced legend with totals43legend_labels = [f"{job_type} ({int(type_totals[job_type])})" for job_type in available_types]44plt.legend(legend_labels, title='Job Type', bbox_to_anchor=(1.01, 1), loc='upper left', fontsize=30)45plt.tight_layout()46
47plt.show()