✅Fetch data from database:
✅We will be using SQL queries to get the Data from the Database .
✅We will also create a connection to Database and store the data in a Data Frame df.
In a Colab code cell, run this:
1import pandas as pd2import matplotlib.pyplot as plt3
4# --------------------------5# Fetch data from database6# --------------------------7
8# SQL query to fetch job titles and their associated locations9query = """10SELECT job_title, location11FROM seek_jobs12WHERE location IS NOT NULL AND job_title IS NOT NULL13"""14
15# Connect and fetch data from database into a DataFrame16conn = connect_to_db(DATABASE_URL)17df = pd.read_sql(query, conn)18
19# --------------------------20# SAMPLE OUTPUT (first 5 rows of df):21#22# job_title location23# 0 Data Analyst Brisbane QLD24# 1 Business Analyst South Perth, Perth WA25# 2 Python Developer Sydney NSW26# 3 AWS Engineer Melbourne VIC27# 4 Full Stack Developer Adelaide SA28# --------------------------✅This part of code Normalize the location which consist of various small cities into a Major City.
In a Colab code cell, run this:
1# --------------------------2# Normalize locations into major Australian cities3# --------------------------4
5# List of major cities we're interested in6major_cities = [7 "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS", "Darwin NT",8 "Sydney NSW", "Melbourne VIC", "Canberra ACT"9]10
11# Extracts major city name from a location string. 12# Assumes format: "Suburb, Major City" and falls back to "Other" if not in known list.13# Example input: "South Perth, Perth WA" → Output: "Perth WA"14# Example input: "Lewisham, Sydney NSW" → Output: "Sydney NSW"15# Example input: "Windsor" → Output: "Other"16def extract_city(location):17 if "," in location:18 city = location.split(",")[-1].strip()19 else:20 city = location.strip()21 return city if city in major_cities else "Other"22
23df['clean_location'] = df['location'].apply(extract_city)
✅ Classifies a job title string into a predefined role label based on keyword match.
✅Returns the first matching role or "Other" if none found.
✅Example input: "Senior Python Developer" → Output: "Python Developer"
✅Example input: "Lead Full Stack Engineer" → Output: "Full Stack Developer"
✅Example input: "Test Automation Engineer" → Output: "Other"
In a Colab code cell, run this:
1# --------------------------2# Classify job roles using keyword mapping3# --------------------------4
5# Dictionary mapping role labels to keywords likely to appear in job titles6role_keywords = {7 "Data Analyst": ["data analyst", "data analysis", "data"],8 "Business Analyst": ["business analyst", "business"],9 "Full Stack Developer": ["full stack", "full-stack"],10 "Next.js Developer": ["next.js", "nextjs"],11 "AWS Cloud Engineer": ["aws", "cloud engineer"],12 "Azure Developer": ["azure"],13 "Python Developer": ["python developer"]14}15
16# Classifies a job title string into a predefined role label based on keyword match.17# Returns the first matching role or "Other" if none found.18# Example input: "Senior Python Developer" → Output: "Python Developer"19# Example input: "Lead Full Stack Engineer" → Output: "Full Stack Developer"20# Example input: "Test Automation Engineer" → Output: "Other"21def classify_role(title):22 title = title.lower()23 for role, keywords in role_keywords.items():24 if any(keyword in title for keyword in keywords):25 return role26 return "Other"27
28df['Role'] = df['job_title'].apply(classify_role)✅Aggregate and structure data
✅Group data by city and role, and count how many jobs fall into each combination
✅ Example:
"Sydney NSW" + "Data Analyst" = 34
"Brisbane QLD" + "Python Developer" = 12
In a Colab code cell, run this:
1# --------------------------2# Aggregate and structure data3# --------------------------4
5# Group data by city and role, and count how many jobs fall into each combination6# Example:7# "Sydney NSW" + "Data Analyst" = 348# "Brisbane QLD" + "Python Developer" = 129grouped = df.groupby(['clean_location', 'Role']).size().reset_index(name='count')10
11# Pivot the grouped data:12# Rows = Cities (e.g., Sydney, Melbourne)13# Columns = Roles (e.g., Python Developer, Data Analyst)14# Values = Count of job postings15pivot = grouped.pivot(index='clean_location', columns='Role', values='count').fillna(0)16
17# Sort the cities by total number of jobs (highest to lowest)18pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]✅Keep only the roles that actually exist in this dataset
✅Reorder columns to desired role display order
In a Colab code cell, run this:
1# --------------------------2# Reorder columns to desired role display order3# --------------------------4
5custom_role_order = [6 "Business Analyst", "Data Analyst", "Full Stack Developer", "Other",7 "Azure Developer", "AWS Cloud Engineer", "Python Developer", "Next.js Developer"8]9
10# Keep only the roles that actually exist in this dataset11existing_roles = [role for role in custom_role_order if role in pivot.columns]12pivot = pivot[existing_roles]✅Plot stacked bar chart with value annotations
✅Create a stacked bar chart for job roles per city
✅Add job count text annotations to each bar segment
In a Colab code cell, run this:
1# --------------------------2# Plot stacked bar chart with value annotations3# --------------------------4
5fig, ax = plt.subplots(figsize=(26, 14))6
7# Create a stacked bar chart for job roles per city8pivot.plot(kind='bar', stacked=True, colormap='tab20', ax=ax)9
10# Chart formatting11plt.title("Job Role Volume Breakdown Across Major Australian Cities /n(05-Mar-2025 to 22-Apr-2025)", fontsize=18)12plt.xlabel("City", fontsize=14)13plt.ylabel("Number of Jobs", fontsize=14)14plt.xticks(rotation=45, ha='right', fontsize=12)15plt.yticks(fontsize=12)16plt.legend(title='Job Roles', bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=10)17plt.grid(axis='y', linestyle='--', alpha=0.5)18
19# Add job count text annotations to each bar segment20for i, city in enumerate(pivot.index):21 y_offset = 022 for role in pivot.columns:23 count = pivot.loc[city, role]24 if count > 0:25 ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=10, color='black')26 y_offset += count27
28plt.tight_layout()29plt.show()