AusBiz Consulting

✅Fetch data from database:
✅We will be using SQL queries to get the Data from the Database .
✅We will also create a connection to Database and store the data in a Data Frame df.

In a Colab code cell, run this:

Python

1import pandas as pd
2import matplotlib.pyplot as plt
3
4# --------------------------
5# Fetch data from database
6# --------------------------
7
8# SQL query to fetch job titles and their associated locations
9query = """
10SELECT job_title, location
11FROM seek_jobs
12WHERE location IS NOT NULL AND job_title IS NOT NULL
13"""
14
15# Connect and fetch data from database into a DataFrame
16conn = connect_to_db(DATABASE_URL)
17df = pd.read_sql(query, conn)
18
19# --------------------------
20# SAMPLE OUTPUT (first 5 rows of df):
21#
22#          job_title                        location
23# 0   Data Analyst               Brisbane QLD
24# 1   Business Analyst     South Perth, Perth WA
25# 2   Python Developer            Sydney NSW
26# 3   AWS Engineer                 Melbourne VIC
27# 4   Full Stack Developer      Adelaide SA
28# --------------------------

✅This part of code Normalize the location which consist of various small cities into a Major City.

In a Colab code cell, run this:

Python

1# --------------------------
2#  Normalize locations into major Australian cities
3# --------------------------
4
5# List of major cities we're interested in
6major_cities = [
7    "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS", "Darwin NT",
8    "Sydney NSW", "Melbourne VIC", "Canberra ACT"
9]
10
11# Extracts major city name from a location string. 
12# Assumes format: "Suburb, Major City" and falls back to "Other" if not in known list.
13# Example input: "South Perth, Perth WA" → Output: "Perth WA"
14# Example input: "Lewisham, Sydney NSW" → Output: "Sydney NSW"
15# Example input: "Windsor" → Output: "Other"
16def extract_city(location):
17    if "," in location:
18        city = location.split(",")[-1].strip()
19    else:
20        city = location.strip()
21    return city if city in major_cities else "Other"
22
23df['clean_location'] = df['location'].apply(extract_city)

✅ Classifies a job title string into a predefined role label based on keyword match.

✅Returns the first matching role or "Other" if none found.

✅Example input: "Senior Python Developer" → Output: "Python Developer"

✅Example input: "Lead Full Stack Engineer" → Output: "Full Stack Developer"

✅Example input: "Test Automation Engineer" → Output: "Other"

In a Colab code cell, run this:

Python

1# --------------------------
2#  Classify job roles using keyword mapping
3# --------------------------
4
5# Dictionary mapping role labels to keywords likely to appear in job titles
6role_keywords = {
7    "Data Analyst": ["data analyst", "data analysis", "data"],
8    "Business Analyst": ["business analyst", "business"],
9    "Full Stack Developer": ["full stack", "full-stack"],
10    "Next.js Developer": ["next.js", "nextjs"],
11    "AWS Cloud Engineer": ["aws", "cloud engineer"],
12    "Azure Developer": ["azure"],
13    "Python Developer": ["python developer"]
14}
15
16# Classifies a job title string into a predefined role label based on keyword match.
17# Returns the first matching role or "Other" if none found.
18# Example input: "Senior Python Developer" → Output: "Python Developer"
19# Example input: "Lead Full Stack Engineer" → Output: "Full Stack Developer"
20# Example input: "Test Automation Engineer" → Output: "Other"
21def classify_role(title):
22    title = title.lower()
23    for role, keywords in role_keywords.items():
24        if any(keyword in title for keyword in keywords):
25            return role
26    return "Other"
27
28df['Role'] = df['job_title'].apply(classify_role)

✅Aggregate and structure data

✅Group data by city and role, and count how many jobs fall into each combination

✅ Example:

"Sydney NSW" + "Data Analyst" = 34

"Brisbane QLD" + "Python Developer" = 12

In a Colab code cell, run this:

Python

1# --------------------------
2#  Aggregate and structure data
3# --------------------------
4
5# Group data by city and role, and count how many jobs fall into each combination
6# Example:
7# "Sydney NSW" + "Data Analyst" = 34
8# "Brisbane QLD" + "Python Developer" = 12
9grouped = df.groupby(['clean_location', 'Role']).size().reset_index(name='count')
10
11# Pivot the grouped data:
12# Rows = Cities (e.g., Sydney, Melbourne)
13# Columns = Roles (e.g., Python Developer, Data Analyst)
14# Values = Count of job postings
15pivot = grouped.pivot(index='clean_location', columns='Role', values='count').fillna(0)
16
17# Sort the cities by total number of jobs (highest to lowest)
18pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]

✅Keep only the roles that actually exist in this dataset

✅Reorder columns to desired role display order

In a Colab code cell, run this:

Python

1# --------------------------
2#  Reorder columns to desired role display order
3# --------------------------
4
5custom_role_order = [
6    "Business Analyst", "Data Analyst", "Full Stack Developer", "Other",
7    "Azure Developer", "AWS Cloud Engineer", "Python Developer", "Next.js Developer"
8]
9
10# Keep only the roles that actually exist in this dataset
11existing_roles = [role for role in custom_role_order if role in pivot.columns]
12pivot = pivot[existing_roles]

✅Plot stacked bar chart with value annotations

✅Create a stacked bar chart for job roles per city

✅Add job count text annotations to each bar segment

In a Colab code cell, run this:

Python

1# --------------------------
2# Plot stacked bar chart with value annotations
3# --------------------------
4
5fig, ax = plt.subplots(figsize=(26, 14))
6
7# Create a stacked bar chart for job roles per city
8pivot.plot(kind='bar', stacked=True, colormap='tab20', ax=ax)
9
10# Chart formatting
11plt.title("Job Role Volume Breakdown Across Major Australian Cities /n(05-Mar-2025 to 22-Apr-2025)", fontsize=18)
12plt.xlabel("City", fontsize=14)
13plt.ylabel("Number of Jobs", fontsize=14)
14plt.xticks(rotation=45, ha='right', fontsize=12)
15plt.yticks(fontsize=12)
16plt.legend(title='Job Roles', bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=10)
17plt.grid(axis='y', linestyle='--', alpha=0.5)
18
19# Add job count text annotations to each bar segment
20for i, city in enumerate(pivot.index):
21    y_offset = 0
22    for role in pivot.columns:
23        count = pivot.loc[city, role]
24        if count > 0:
25            ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=10, color='black')
26            y_offset += count
27
28plt.tight_layout()
29plt.show()

1import pandas as pd 2import matplotlib.pyplot as plt 3 4# -------------------------- 5# Fetch data from database 6# -------------------------- 7 8# SQL query to fetch job titles and their associated locations 9query = """ 10SELECT job_title, location 11FROM seek_jobs 12WHERE location IS NOT NULL AND job_title IS NOT NULL 13""" 14 15# Connect and fetch data from database into a DataFrame 16conn = connect_to_db(DATABASE_URL) 17df = pd.read_sql(query, conn) 18 19# -------------------------- 20# SAMPLE OUTPUT (first 5 rows of df): 21# 22# job_title location 23# 0 Data Analyst Brisbane QLD 24# 1 Business Analyst South Perth, Perth WA 25# 2 Python Developer Sydney NSW 26# 3 AWS Engineer Melbourne VIC 27# 4 Full Stack Developer Adelaide SA 28# --------------------------

1# -------------------------- 2# Normalize locations into major Australian cities 3# -------------------------- 4 5# List of major cities we're interested in 6major_cities = [ 7 "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS", "Darwin NT", 8 "Sydney NSW", "Melbourne VIC", "Canberra ACT" 9] 10 11# Extracts major city name from a location string. 12# Assumes format: "Suburb, Major City" and falls back to "Other" if not in known list. 13# Example input: "South Perth, Perth WA" → Output: "Perth WA" 14# Example input: "Lewisham, Sydney NSW" → Output: "Sydney NSW" 15# Example input: "Windsor" → Output: "Other" 16def extract_city(location): 17 if "," in location: 18 city = location.split(",")[-1].strip() 19 else: 20 city = location.strip() 21 return city if city in major_cities else "Other" 22 23df['clean_location'] = df['location'].apply(extract_city)

1# -------------------------- 2# Classify job roles using keyword mapping 3# -------------------------- 4 5# Dictionary mapping role labels to keywords likely to appear in job titles 6role_keywords = { 7 "Data Analyst": ["data analyst", "data analysis", "data"], 8 "Business Analyst": ["business analyst", "business"], 9 "Full Stack Developer": ["full stack", "full-stack"], 10 "Next.js Developer": ["next.js", "nextjs"], 11 "AWS Cloud Engineer": ["aws", "cloud engineer"], 12 "Azure Developer": ["azure"], 13 "Python Developer": ["python developer"] 14} 15 16# Classifies a job title string into a predefined role label based on keyword match. 17# Returns the first matching role or "Other" if none found. 18# Example input: "Senior Python Developer" → Output: "Python Developer" 19# Example input: "Lead Full Stack Engineer" → Output: "Full Stack Developer" 20# Example input: "Test Automation Engineer" → Output: "Other" 21def classify_role(title): 22 title = title.lower() 23 for role, keywords in role_keywords.items(): 24 if any(keyword in title for keyword in keywords): 25 return role 26 return "Other" 27 28df['Role'] = df['job_title'].apply(classify_role)

1# -------------------------- 2# Aggregate and structure data 3# -------------------------- 4 5# Group data by city and role, and count how many jobs fall into each combination 6# Example: 7# "Sydney NSW" + "Data Analyst" = 34 8# "Brisbane QLD" + "Python Developer" = 12 9grouped = df.groupby(['clean_location', 'Role']).size().reset_index(name='count') 10 11# Pivot the grouped data: 12# Rows = Cities (e.g., Sydney, Melbourne) 13# Columns = Roles (e.g., Python Developer, Data Analyst) 14# Values = Count of job postings 15pivot = grouped.pivot(index='clean_location', columns='Role', values='count').fillna(0) 16 17# Sort the cities by total number of jobs (highest to lowest) 18pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]

1# -------------------------- 2# Reorder columns to desired role display order 3# -------------------------- 4 5custom_role_order = [ 6 "Business Analyst", "Data Analyst", "Full Stack Developer", "Other", 7 "Azure Developer", "AWS Cloud Engineer", "Python Developer", "Next.js Developer" 8] 9 10# Keep only the roles that actually exist in this dataset 11existing_roles = [role for role in custom_role_order if role in pivot.columns] 12pivot = pivot[existing_roles]

1# -------------------------- 2# Plot stacked bar chart with value annotations 3# -------------------------- 4 5fig, ax = plt.subplots(figsize=(26, 14)) 6 7# Create a stacked bar chart for job roles per city 8pivot.plot(kind='bar', stacked=True, colormap='tab20', ax=ax) 9 10# Chart formatting 11plt.title("Job Role Volume Breakdown Across Major Australian Cities /n(05-Mar-2025 to 22-Apr-2025)", fontsize=18) 12plt.xlabel("City", fontsize=14) 13plt.ylabel("Number of Jobs", fontsize=14) 14plt.xticks(rotation=45, ha='right', fontsize=12) 15plt.yticks(fontsize=12) 16plt.legend(title='Job Roles', bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=10) 17plt.grid(axis='y', linestyle='--', alpha=0.5) 18 19# Add job count text annotations to each bar segment 20for i, city in enumerate(pivot.index): 21 y_offset = 0 22 for role in pivot.columns: 23 count = pivot.loc[city, role] 24 if count > 0: 25 ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=10, color='black') 26 y_offset += count 27 28plt.tight_layout() 29plt.show()

Chart Job Roles across major cities in Australia (Source Code)

Next Up

Lesson 11: Mini Project: Chart Job Types across different cities (Hands On)

Next Up

Lesson 11: Mini Project: Chart Job Types across different cities (Hands On)