AusBiz Consulting

✅Fetch data from database:
✅We will be using SQL queries to get the Data from the Database .
✅We will also create a connection to Database and store the data in a Data Frame df.

In a Colab code cell, run this:

Python

1import pandas as pd
2import matplotlib.pyplot as plt
3import matplotlib.patches as mpatches
4
5# -------------------------- 
6#  Connect to Database and Fetch Raw Data
7# --------------------------
8
9query = """
10SELECT 
11    job_title,
12    location,
13    job_type
14FROM seek_jobs
15WHERE location IS NOT NULL 
16  AND job_type IS NOT NULL
17"""
18
19conn = connect_to_db(DATABASE_URL)
20df = pd.read_sql(query, conn)
21conn.close()

✅Classify Roles and Filter for "Data Analyst"

✅We will all the related key word to Data Analyst and then fetch the all related jobs

In a Colab code cell, run this:

Python

1# -------------------------- 
2#  Classify Roles and Filter for "Data Analyst"
3# --------------------------
4
5role_keywords = {
6    "Data Analyst": ["data analyst", "data"],
7}
8
9def classify_role(title):
10    title = title.lower()
11    for role, keywords in role_keywords.items():
12        if any(keyword in title for keyword in keywords):
13            return role
14    return "Other"
15
16df['Role'] = df['job_title'].apply(classify_role)
17df = df[df['Role'] == "Data Analyst"]

✅This part of code Normalize the location which consist of various small cities into Major City.

In a Colab code cell, run this:

Python

1# -------------------------- 
2#  Normalize City Names
3# --------------------------
4
5major_cities = [
6    "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS",
7    "Darwin NT", "Sydney NSW", "Melbourne VIC", "Canberra ACT"
8]
9
10def extract_city(location):
11    if "," in location:
12        city = location.split(",")[-1].strip()
13    else:
14        city = location.strip()
15    return city if city in major_cities else "Other"
16
17df['clean_city'] = df['location'].apply(extract_city)

✅Normalize the Job Type.

✅There are many job types but we will consider only with Full Time, Contract/Temp, Part time and rest all in others.

In a Colab code cell, run this:

Python

1# -------------------------- 
2#  Normalize the Job Type
3# --------------------------
4
5# Consolidate job types into standard categories
6def normalize_job_type(job_type):
7    job_type = job_type.lower()
8    if "full" in job_type:
9        return "Full-time"
10    elif "part" in job_type:
11        return "Part-time"
12    elif "contract" in job_type or "temp" in job_type:
13        return "Contract/Temp"
14    else:
15        return "Other"
16
17# Create a new column with the cleaned job type
18df['job_type_clean'] = df['job_type'].apply(normalize_job_type)

✅We will group and pivot the data to

✅we will also use normalized job_types instead of original one as we need to focused regarding jobs names.

In a Colab code cell, run this:

Python

1# -------------------------- 
2# Group and Pivot Data
3# --------------------------
4
5# Use the normalized job type instead of the original one
6grouped = df.groupby(['clean_city', 'job_type_clean']).size().reset_index(name='count')
7
8pivot = grouped.pivot(index='clean_city', columns='job_type_clean', values='count').fillna(0)
9pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]

✅Plot bar chart with value annotations

✅Create a bar chart for job types for various cities(Full-time, Contract/Temp, Part-time, Others)

✅Add job count text annotations to each bar segment

✅Add legend with the count of job type

In a Colab code cell, run this:

TypeScript

1# -------------------------- 
2#  Plot the Stacked Bar Chart
3# --------------------------
4
5# Pick colors for commonly expected job types
6colors = {
7    'Full-time': '#C4409D',
8    'Contract/Temp': '#7EC8E3',
9    'Part-time': '#F4B400',
10    'Other': '#B0B0B0'
11}
12
13# Use available types from data
14available_types = list(pivot.columns)
15color_list = [colors.get(t, '#999999') for t in available_types]
16
17# Calculate totals for each job type for the legend
18type_totals = pivot.sum()
19
20# Create figure and axes
21fig, ax = plt.subplots(figsize=(20, 14))
22
23# Plot the stacked bar chart
24pivot.plot(kind='bar', stacked=True, ax=ax, color=color_list)
25
26# Title and labels
27plt.title("Employment Type Breakdown for Data Analyst Roles (by City)\n05-Mar-2025 to 01-May-2025", fontsize=30)
28plt.xlabel("City", fontsize=30)
29plt.ylabel("Number of Jobs", fontsize=30)
30plt.xticks(rotation=45, ha='right', fontsize=30)
31plt.yticks(fontsize=12)
32
33# Add count labels inside bars
34for i, city in enumerate(pivot.index):
35    y_offset = 0
36    for job_type in pivot.columns:
37        count = pivot.loc[city, job_type]
38        if count > 0:
39            ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=20, color='black')
40            y_offset += count
41
42# Create enhanced legend with totals
43legend_labels = [f"{job_type} ({int(type_totals[job_type])})" for job_type in available_types]
44plt.legend(legend_labels, title='Job Type', bbox_to_anchor=(1.01, 1), loc='upper left', fontsize=30)
45plt.tight_layout()
46
47plt.show()

1import pandas as pd 2import matplotlib.pyplot as plt 3import matplotlib.patches as mpatches 4 5# -------------------------- 6# Connect to Database and Fetch Raw Data 7# -------------------------- 8 9query = """ 10SELECT 11 job_title, 12 location, 13 job_type 14FROM seek_jobs 15WHERE location IS NOT NULL 16 AND job_type IS NOT NULL 17""" 18 19conn = connect_to_db(DATABASE_URL) 20df = pd.read_sql(query, conn) 21conn.close()

1# -------------------------- 2# Classify Roles and Filter for "Data Analyst" 3# -------------------------- 4 5role_keywords = { 6 "Data Analyst": ["data analyst", "data"], 7} 8 9def classify_role(title): 10 title = title.lower() 11 for role, keywords in role_keywords.items(): 12 if any(keyword in title for keyword in keywords): 13 return role 14 return "Other" 15 16df['Role'] = df['job_title'].apply(classify_role) 17df = df[df['Role'] == "Data Analyst"]

1# -------------------------- 2# Normalize City Names 3# -------------------------- 4 5major_cities = [ 6 "Brisbane QLD", "Perth WA", "Adelaide SA", "Hobart TAS", 7 "Darwin NT", "Sydney NSW", "Melbourne VIC", "Canberra ACT" 8] 9 10def extract_city(location): 11 if "," in location: 12 city = location.split(",")[-1].strip() 13 else: 14 city = location.strip() 15 return city if city in major_cities else "Other" 16 17df['clean_city'] = df['location'].apply(extract_city)

1# -------------------------- 2# Normalize the Job Type 3# -------------------------- 4 5# Consolidate job types into standard categories 6def normalize_job_type(job_type): 7 job_type = job_type.lower() 8 if "full" in job_type: 9 return "Full-time" 10 elif "part" in job_type: 11 return "Part-time" 12 elif "contract" in job_type or "temp" in job_type: 13 return "Contract/Temp" 14 else: 15 return "Other" 16 17# Create a new column with the cleaned job type 18df['job_type_clean'] = df['job_type'].apply(normalize_job_type)

1# -------------------------- 2# Group and Pivot Data 3# -------------------------- 4 5# Use the normalized job type instead of the original one 6grouped = df.groupby(['clean_city', 'job_type_clean']).size().reset_index(name='count') 7 8pivot = grouped.pivot(index='clean_city', columns='job_type_clean', values='count').fillna(0) 9pivot = pivot.loc[pivot.sum(axis=1).sort_values(ascending=False).index]

1# -------------------------- 2# Plot the Stacked Bar Chart 3# -------------------------- 4 5# Pick colors for commonly expected job types 6colors = { 7 'Full-time': '#C4409D', 8 'Contract/Temp': '#7EC8E3', 9 'Part-time': '#F4B400', 10 'Other': '#B0B0B0' 11} 12 13# Use available types from data 14available_types = list(pivot.columns) 15color_list = [colors.get(t, '#999999') for t in available_types] 16 17# Calculate totals for each job type for the legend 18type_totals = pivot.sum() 19 20# Create figure and axes 21fig, ax = plt.subplots(figsize=(20, 14)) 22 23# Plot the stacked bar chart 24pivot.plot(kind='bar', stacked=True, ax=ax, color=color_list) 25 26# Title and labels 27plt.title("Employment Type Breakdown for Data Analyst Roles (by City)\n05-Mar-2025 to 01-May-2025", fontsize=30) 28plt.xlabel("City", fontsize=30) 29plt.ylabel("Number of Jobs", fontsize=30) 30plt.xticks(rotation=45, ha='right', fontsize=30) 31plt.yticks(fontsize=12) 32 33# Add count labels inside bars 34for i, city in enumerate(pivot.index): 35 y_offset = 0 36 for job_type in pivot.columns: 37 count = pivot.loc[city, job_type] 38 if count > 0: 39 ax.text(i, y_offset + count / 2, str(int(count)), ha='center', va='center', fontsize=20, color='black') 40 y_offset += count 41 42# Create enhanced legend with totals 43legend_labels = [f"{job_type} ({int(type_totals[job_type])})" for job_type in available_types] 44plt.legend(legend_labels, title='Job Type', bbox_to_anchor=(1.01, 1), loc='upper left', fontsize=30) 45plt.tight_layout() 46 47plt.show()

Chart Job Types across different cities (Source Code)

Next Up

Lesson 13: Mini Project: Job Opportunity Analysis Time Series Analysis (Business Requirements)

Next Up

Lesson 13: Mini Project: Job Opportunity Analysis Time Series Analysis (Business Requirements)