You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

41 KiB

None <html lang="en"> <head> </head>

Título

Subtítulo

sub-subtítulo

Esto es un párrafo en markdow. La siguiente ecuación $f(x)=x^3$ es evaluada en:

  • primer elemento
  • segundo elemento
  • tercer elemento
  1. Primer
  2. Segundo
  3. Tercer
    • Sub elemento
    • sub elemento
In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

x = 5 
print(x)
x= "Hola"
print(x)
5
Hola

The tips dataset

This dataset comes from a restaurant and is used to teach EDA. Each row represents a bill (table) and registers the complete bill, tip, among other parameters during the service.

  • total_bill: Conplete amount without tip.
  • tip: The given tip.
  • sex: Sex identification (pay)
  • smoker: if there are smokers included in the table
  • day: day of the week
  • time: type of food(Lunch/Dinner)
  • size: Number of guessings
In [8]:
# importing the tips dataset
import seaborn as sns
df = sns.load_dataset("tips")
df
Out[8]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
... ... ... ... ... ... ... ...
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2

244 rows × 7 columns

In [11]:
df.head() # dataframe example 
df.info() 
<class 'pandas.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    category
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(4), float64(2), int64(1)
memory usage: 7.4 KB
In [13]:
df.describe(include='all')
Out[13]:
total_bill tip sex smoker day time size
count 244.000000 244.000000 244 244 244 244 244.000000
unique NaN NaN 2 2 4 2 NaN
top NaN NaN Male No Sat Dinner NaN
freq NaN NaN 157 151 87 176 NaN
mean 19.785943 2.998279 NaN NaN NaN NaN 2.569672
std 8.902412 1.383638 NaN NaN NaN NaN 0.951100
min 3.070000 1.000000 NaN NaN NaN NaN 1.000000
25% 13.347500 2.000000 NaN NaN NaN NaN 2.000000
50% 17.795000 2.900000 NaN NaN NaN NaN 2.000000
75% 24.127500 3.562500 NaN NaN NaN NaN 3.000000
max 50.810000 10.000000 NaN NaN NaN NaN 6.000000
In [15]:
df.isna().sum()
Out[15]:
total_bill    0
tip           0
sex           0
smoker        0
day           0
time          0
size          0
dtype: int64
In [24]:
tb = df['total_bill']
plt.figure()
plt.hist(tb, bins=25)
plt.title("Histogram: total_bill")
plt.xlabel("total_bill")
plt.ylabel("count")
plt.show()
No description has been provided for this image
In [27]:
plt.figure()
plt.boxplot(tb, vert=False)
plt.title("Boxplot: total_bill")
plt.ylabel("total_bill")
plt.show()
No description has been provided for this image
In [ ]:
# Task 1: Nmerically state the quartile values (IQR)
# Task 2: Scatter plot total_bill vs tip
# Task 3:  Scatter plot tip vs size
# Task 4: 
    # -What does the data represent?
	#- What are typical ranges?
	#- Any suspicious values? Why?
	#- One conclusion in plain language.
In [ ]:
 
</html>