Welcome to analysis of students performance¶
The dataset that is used comes from the following kaggle page: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams
InĀ [13]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
InĀ [3]:
df = pd.read_csv("StudentsPerformance.csv")
Data exploration phase¶
InĀ [4]:
df.head()
Out[4]:
| gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
|---|---|---|---|---|---|---|---|---|
| 0 | female | group B | bachelor's degree | standard | none | 72 | 72 | 74 |
| 1 | female | group C | some college | standard | completed | 69 | 90 | 88 |
| 2 | female | group B | master's degree | standard | none | 90 | 95 | 93 |
| 3 | male | group A | associate's degree | free/reduced | none | 47 | 57 | 44 |
| 4 | male | group C | some college | standard | none | 76 | 78 | 75 |
InĀ [5]:
df.describe()
Out[5]:
| math score | reading score | writing score | |
|---|---|---|---|
| count | 1000.00000 | 1000.000000 | 1000.000000 |
| mean | 66.08900 | 69.169000 | 68.054000 |
| std | 15.16308 | 14.600192 | 15.195657 |
| min | 0.00000 | 17.000000 | 10.000000 |
| 25% | 57.00000 | 59.000000 | 57.750000 |
| 50% | 66.00000 | 70.000000 | 69.000000 |
| 75% | 77.00000 | 79.000000 | 79.000000 |
| max | 100.00000 | 100.000000 | 100.000000 |
InĀ [17]:
for i in df.columns:
df[i].value_counts().plot(kind='bar')
plt.title(i)
plt.show()
InĀ [20]:
df.plot(subplots=True, kind='hist')
plt.tight_layout()
plt.show()