You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
41 KiB
41 KiB
None
<html lang="en">
<head>
</head>
</html>
In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
x = 5
print(x)
x= "Hola"
print(x)
The tips dataset¶
This dataset comes from a restaurant and is used to teach EDA. Each row represents a bill (table) and registers the complete bill, tip, among other parameters during the service.
total_bill: Conplete amount without tip.tip: The given tip.sex: Sex identification (pay)smoker: if there are smokers included in the tableday: day of the weektime: type of food(Lunch/Dinner)size: Number of guessings
In [8]:
# importing the tips dataset
import seaborn as sns
df = sns.load_dataset("tips")
df
Out[8]:
In [11]:
df.head() # dataframe example
df.info()
In [13]:
df.describe(include='all')
Out[13]:
In [15]:
df.isna().sum()
Out[15]:
In [24]:
tb = df['total_bill']
plt.figure()
plt.hist(tb, bins=25)
plt.title("Histogram: total_bill")
plt.xlabel("total_bill")
plt.ylabel("count")
plt.show()
In [27]:
plt.figure()
plt.boxplot(tb, vert=False)
plt.title("Boxplot: total_bill")
plt.ylabel("total_bill")
plt.show()
In [ ]:
# Task 1: Nmerically state the quartile values (IQR)
# Task 2: Scatter plot total_bill vs tip
# Task 3: Scatter plot tip vs size
# Task 4:
# -What does the data represent?
#- What are typical ranges?
#- Any suspicious values? Why?
#- One conclusion in plain language.
In [ ]: