Exploring the Impact of Education, Job Experience, and Income

  • 23 Jul, 2023
  • read

The content presented in this article is intended solely for academic purposes. The opinions expressed are based on my personal understanding and research. It’s important to note that the field of big data and the programming languages discussed, such as Python, R, Power BI, Tableau, and SQL, are dynamic and constantly evolving. This article aims to foster learning, exploration, and discussion within the field rather than provide definitive answers. Reader discretion is advised.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
marketing1 = pd.read_csv(r'D:\helen\Documents\PythonScripts\datasets\kaggle\ifood_df.csv')
marketing1.head()

OUTPUT

image



Lest creat a new column

The function called YearsEducation that takes a row of data as input and calculates the total number of years of education based on the education level specified in the row. The function iterates over different education levels (education_PhD, education_Master, education_Graduation, education_Basic) and adds the corresponding number of years to the years variable.


def YearsEducation(row):
    years = 0

    if row['education_PhD'] == 1:
        years += 2  # Add 2 years for PhD

    if row['education_Master'] == 1:
        years += 2  # Add 2 years for master's degree

    if row['education_Graduation'] == 1:
        years += 5  # Add 5 years for graduation

    if row['education_Basic'] == 1:
        years += 10  # Add 10 years for basic

    years += 1  # Add 2 years for basic PhD
    years += 1  # Add 2 years for Master
    years += 3  # Add 5 years for Graduation
    years += 5  # Add 10 years for Basic


    return years

marketing1['YearsEducation'] = marketing1.apply(YearsEducation, axis=1)
Lest creat a new column - variable

Lest do the same with JobExperience


def JobExperience(row):
    years = 0

    if row['education_PhD'] == 1:
        years += 2  # Add 2 year of JobExperience

    if row['education_Master'] == 1:
        years += 2  # Add 2 years of JobExperience

    if row['education_Graduation'] == 1:
            years += 5  # Add 5 years of JobExperience

    if row['education_Basic'] == 1:
                years += 1  # Add 1 years of JobExperience

    years += 1  # Add 1 years for PhD
    years += 1  # Add 2 years for Master
    years += 3  # Add 5 years for Graduation
    years += 1  # Add 1 years for Basic

    return years

marketing1['JobExperience'] = marketing1.apply(JobExperience, axis=1)
Lest creat a new column - variable to complete the model

Calculate the square of the number in Python using the pow() method. JobExperienceSquare

marketing1['JobExperienceSquared'] = marketing1['JobExperience'].apply(lambda x: pow(x, 2))

Lest calculate the Multiple Regression Model for those variables that already were created
Income_vs_YearsEducation1 = ols("Income ~ YearsEducation + JobExperience + JobExperienceSquared",
                        data=marketing1).fit()
print(Income_vs_YearsEducation1.params)
OUTPUT
Intercept                4306.238982
YearsEducation          -3482.609229
JobExperience           17753.208824
JobExperienceSquared     -788.168509
dtype: float64

The intercept value of 4306.238982 represents the estimated income when the YearsEducation is zero. The coefficinet for the YearsEducation variable is -3482 indicating that on average, for each additional year of education, the expected income decrease by approximately $3482 and for each year of JobExperience the Income earned will be $17753 and the JobExperienceSqueared is positive until certain point where it will start to decrease by -788 the Income



Lest includ the categorical variable maritalstatus in the regression model without using dummy coding or one-hot encoding. Instead, lets use the levels of the variable as parameters in the model formula.

Income_vs_numcateg = ols("Income ~ YearsEducation + JobExperience + JobExperienceSquared + maritalstatus + 0",
                        data=marketing1).fit()
print(Income_vs_numcateg.params)
OUTPUT
maritalstatus[Divorced]     5168.166512
maritalstatus[Married]      4660.514211
maritalstatus[Single]       4707.647165
maritalstatus[Together]     4927.548932
maritalstatus[Widow]        9181.141773
YearsEducation             -3469.387292
JobExperience              17571.756723
JobExperienceSquared        -778.507442
dtype: float64
Here is an interpretation:

maritalstatusDivorced: On average, individuals who are divorced have an estimated increase in income of $5.168.17 units compared to the reference level. maritalstatusMarried: On average, individuals who are married have an estimated increase in income of $4.660.51 units compared to the reference level. maritalstatusSingle: On average, individuals who are single have an estimated increase in income of $4.707.65 units compared to the reference level. maritalstatusTogether: On average, individuals who are in a relationship (together) have an estimated increase in income of $4.927.55 units compared to the reference level. maritalstatusWidow: On average, individuals who are widowed have an estimated increase in income of $9.181.14 units compared to the reference level.

Additionally, negative coefficient values for YearsEducation suggest that an increase in years of education is associated with a decrease in income, all else being equal. Positive coefficients for JobExperience and JobExperienceSquared suggest that higher levels of job experience (and its squared term) are associated with higher income.