You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

41 KiB

Raw Permalink Blame History Unescape Escape

None <html lang="en"> <head> </head>

Título¶

Subtítulo¶

sub-subtítulo¶

Esto es un párrafo en markdow. La siguiente ecuación $f(x)=x^3$ es evaluada en:

primer elemento
segundo elemento
tercer elemento

Primer
Segundo
Tercer
- Sub elemento
- sub elemento

In [6]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x = 5 
print(x)
x= "Hola"
print(x)

5
Hola

The tips dataset¶

This dataset comes from a restaurant and is used to teach EDA. Each row represents a bill (table) and registers the complete bill, tip, among other parameters during the service.

total_bill: Conplete amount without tip.
tip: The given tip.
sex: Sex identification (pay)
smoker: if there are smokers included in the table
day: day of the week
time: type of food(Lunch/Dinner)
size: Number of guessings

In [8]:

# importing the tips dataset
import seaborn as sns
df = sns.load_dataset("tips")
df

Out[8]:

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

244 rows × 7 columns

In [11]:

df.head() # dataframe example 
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB

In [13]:

df.describe(include='all')

Out[13]:

	total_bill	tip	sex	smoker	day	time	size
count	244.000000	244.000000	244	244	244	244	244.000000
unique	NaN	NaN	2	2	4	2	NaN
top	NaN	NaN	Male	No	Sat	Dinner	NaN
freq	NaN	NaN	157	151	87	176	NaN
mean	19.785943	2.998279	NaN	NaN	NaN	NaN	2.569672
std	8.902412	1.383638	NaN	NaN	NaN	NaN	0.951100
min	3.070000	1.000000	NaN	NaN	NaN	NaN	1.000000
25%	13.347500	2.000000	NaN	NaN	NaN	NaN	2.000000
50%	17.795000	2.900000	NaN	NaN	NaN	NaN	2.000000
75%	24.127500	3.562500	NaN	NaN	NaN	NaN	3.000000
max	50.810000	10.000000	NaN	NaN	NaN	NaN	6.000000

In [15]:

df.isna().sum()

Out[15]:

total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64

In [24]:

tb = df['total_bill']
plt.figure()
plt.hist(tb, bins=25)
plt.title("Histogram: total_bill")
plt.xlabel("total_bill")
plt.ylabel("count")
plt.show()

No description has been provided for this image

In [27]:

plt.figure()
plt.boxplot(tb, vert=False)
plt.title("Boxplot: total_bill")
plt.ylabel("total_bill")
plt.show()

In [ ]:

# Task 1: Nmerically state the quartile values (IQR)
# Task 2: Scatter plot total_bill vs tip
# Task 3:  Scatter plot tip vs size
# Task 4: 
    # -What does the data represent?
	#- What are typical ranges?
	#- Any suspicious values? Why?
	#- One conclusion in plain language.