Data Visualization with Python|using IfoodDataSet from Kaggle

23 May, 2023
read

The content presented in this article is intended solely for academic purposes. The opinions expressed are based on my personal understanding and research. It’s important to note that the field of big data and the programming languages discussed, such as Python, R, Power BI, Tableau, and SQL, are dynamic and constantly evolving. This article aims to foster learning, exploration, and discussion within the field rather than provide definitive answers. Reader discretion is advised.

marketing1.columns

OUTPUT

marketing1['marital_Divorced'].unique()

OUTPUT

array([0, 1], dtype=int64)

marketing1['education_Basic'].unique()

OUTPUT

array([0, 1], dtype=int64)

marketing1['kidhome'].unique()

OUTPUT

array([0, 1, 2], dtype=int64)

marketing1['AcceptedCmp5'].unique()

OUTPUT

array([0, 1], dtype=int64)

sns.displot(marketing1.marital_Married, bins=10, kde=True, color='red')

marital_

Creating new string Columns for a current int64 Columns

Les’t convert some variables from dtype int64 like: marital , education , _home and AcceptedCmp1 to string to make more understandable the visualization .

import numpy as np

marital_mapping = {
    0: 'Not Divorced',
    1: 'Divorced',
    0: 'Not Together',
    1: 'Together',
    0: 'Not Widow',
    1: 'Widow',
    0: 'Not Single',
    1: 'Single',
    0: 'Not Married',
    1: 'Married'
}

marketing1['maritalstatus'] = np.where(marketing1['marital_Divorced'] == 1, 'Divorced', 'Not Divorced')
marketing1['maritalstatus'] = np.where(marketing1['marital_Married'] == 1, 'Married', marketing1['maritalstatus'])
marketing1['maritalstatus'] = np.where(marketing1['marital_Single'] == 1, 'Single', marketing1['maritalstatus'])
marketing1['maritalstatus'] = np.where(marketing1['marital_Together'] == 1, 'Together', marketing1['maritalstatus'])
marketing1['maritalstatus'] = np.where(marketing1['marital_Widow'] == 1, 'Widow', marketing1['maritalstatus'])

Let’s apply the same code to get education variable and the levels too.

marketing1.loc[0,'Customer_Days']

OUTPUT

In the case above we need to change the current dtype int65 to a datetime because Customer_Days is date variable

marketing1['Customer_Days'] = pd.to_numeric(marketing1['Customer_Days'], errors='coerce')
marketing1['Customer_Days'] = pd.to_datetime(marketing1['Customer_Days'], origin='1970-01-01', unit='D')

start_date = pd.to_datetime('1976-01-01')
end_date = pd.to_datetime('1977-12-31')

accurate_dates = marketing1[(marketing1['Customer_Days'] >= start_date) & (marketing1['Customer_Days'] <= end_date)]['Customer_Days']

#let's switch from int dtype to string dtype (Object) the new follow variables
marketing1['maritalstatus'] = marketing1['maritalstatus'].astype(str)
marketing1['education'] = marketing1['education'].astype(str)
marketing1['campaing'] = marketing1['campaing'].astype(str)
marketing1['family'] = marketing1['family'].astype(str)

marketing1.head()

OUTPUT

graphic_date.plot(x='Customer_Days', y='Income', kind = 'line')

OUTPUT

graphic_date.plot(x='Customer_Days', y='Income', kind = 'line')

OUTPUT

graphic_date.plot(x='Customer_Days', y='Income', kind = 'line')

OUTPUT

sns.catplot(x='maritalstatus',
y='Income',
data=marketing1,
kind='box',
hue='family',
sym="")
plt.show()

OUTPUT

sns.lmplot(data=marketing1,
            x="Age",
            y="Income",
            hue="maritalstatus")
plt.show()