## Instructions

**Objective**

## Requirements and Specifications

**Source Code
**

import pandas as pd #pandas is a library that makes working with datasheets/tables easy

import math

import numpy as np #Numpy is a scientific libarary that makes working with arrays of numbers easier

### Read data and parse date

df = pd.read_csv("weatherstats_toronto_daily.csv", parse_dates=['date'])

df.head() #See the first 5 rows

### Some tutorial

Uncomment each line to see the effect

#To get individual columns, try

df.date #or

df['date'] #These yield the same result

#Get date and avvg temperature columns

df[['date', 'avg_temperature']]

#Since each column is a Series (like list), you can use indexes to extract individual entries. You can also iterate through them

df['date'][0] #Get the 'series' date and then use index to get the first date value

df.date

df[['date', 'avg_temperature']]

df['date'][0]

#To subset a data try

#df[(df['avg_temperature']>30)] #This gives you data for all days where average temperature was more than 30. Remember: You may want to save this subset in a different variable if you want to work with it.

df[(df['avg_temperature']>30)]

#Multiple conditions can be combined

#df[(df['avg_temperature']>30) & (df['max_humidex']>40)]

df[(df['avg_temperature']>30) & (df['max_humidex']>40)]

#Get just one column that meets the criteria

#sbst = df[(df['avg_temperature']>30) & (df['max_humidex']>40)]

#sbst["date"] #Note the first column shows the row number for convinience. It is not a part of the dataframe as per say

sbst = df[(df['avg_temperature']>30) & (df['max_humidex']>40)]

sbst["date"]

### Exercises

### Write a function to RETURN a suggestion for clothing. (1 point)

If temprature less than -10, suggest 'Wear everything' <br>

If temperatute between -10 and 0 suggets 'heavy jacket' <br>

If temperature between 0 and 10 suggets 'light jacket' <br>

If temperatire between 10 and 20 suggest 'very light jacket' <br>

Above 20 'go as you like'

def need_a(dt1): #dt1 is a date

#Get the avg_tempearature for dt1. You can check for equality (==) on the date column

df_sub = df[df["date"] == dt1]

print("Temperature today: ", float(df_sub["avg_temperature"])) #Notice that you have to typecast the variable to float. otherwise it remains in a pd.Series and makes comparisons in if statements hard

suggestion = "None" #This will hold the suggestion

#Write if statements

#Assign your suggestion in the variable named suggestion

avg_temp = float(df_sub["avg_temperature"])

if avg_temp < -10:

suggestion = "Wear everything"

elif -10 <= avg_temp < 0:

suggestion = "heavy jacket"

elif 0 <= avg_temp < 10:

suggestion = "light jacket"

elif 10 <= avg_temp < 20:

suggestion = "very light jacket"

else: # temp is equal or higher than 20

suggestion = "t-shirt"

return(suggestion)

#Call function and print the output

op = need_a('2021-05-22')

print(op)

### Write a function to RETURN rain probability for the given date. (1 point)

This time there is no template provided.

If max_relative_humidity > 80 and avg_hourly_cloud_cover_8 >5.0, there is 'high rain probaility' <br>

If max_relative_humidity > 60 and avg_hourly_cloud_cover_8 >6.0, there is 'medium rain probaility' <br>

else, 'uncertain' <br>

Note: You may need to handle NaN (short for not a number for missing entries)

#Write your code here

def rain_probability(dt1):

df_sub = df[df["date"] == dt1]

# Get rain prob

prob = float(df_sub["rain"])

# if the value is Nan, set to zero

if math.isnan(prob):

prob = 0.0

return prob

rain_probability('2021-05-22')

### Write a function that takes two dates and returns mean rainfall between those two dates. (2 point)

The rainfall information is in the column 'rain' <br>

You may need to handle NaN. So, use np.nanmean() function from numpy. This function accepts a series (like list, column)) and returns the mean.

#Write your code here

def mean_rain(dt1, dt2):

#Subset to obtain all data between 2 dates. Make sure dt1 is before dt2. You can combine 2 conditions using &.

df_sub = df[(df["date"] > dt1) & (df["date"] < dt2)]

# Select only rows with not NaN values in the 'rain' column

df_sub = df_sub[df_sub["rain"].notna()]

#Extract just the rain column values

rain = df_sub["rain"]

#Calculate mean

rain_mean = rain.mean()

#return value

return rain_mean

#Call the function

mean_rain('2020-05-22', '2021-07-22')

### Use the function you wrote in the previous cell to get the 5-day running average rainfall between the `start_date` and `end_date`. (2 point)

#### For the output, you should print the start and end date of the 5 day period along with the 5 day mean rainfall. The output should have the followuing format:

2021-02-20 2021-02-25 <br/>

0.15 <br/>

2021-02-25 2021-03-02 <br/>

1.8 <br/>

...

You will need `date` and `timedelta` from the `datetime` package. <br/>

Convert the dates to `datetime` objects so that you can add 5 days using the `timedelta` function. <br/>

You may need to convert the date back to a `string` representation when calling the `mean_rain` function from before. You can do this by using the `str()` function. Note: the str function is a useful Python function that can provide string represntation of most objects. <br/>

Some time periods may have `nan` (Not A Number) output and may produce `RuntimeWarning: Mean of empty slice` since data from some time periods are missing

#Your code here

#Between start and end date create an average for each 5 days

print("2021-02-20 2021-02-25")

print(mean_rain('2021-02-20', '2021-02-25'))

print("2021-02-25 2021-03-02")

print(mean_rain('2021-02-25', '2021-03-02'))

### Write a function named get_mean_temp_month which accepts the month number (e.g. June is 6) and year and returns the average temperature for that month and year. (2 point)

df['date'].dt.month gives you just the months and df['date'].dt.year gives date number.

#Write your code here

#Define function

def get_mean_temp_month(month, year):

# Select subset

df_sub = df[(df["date"].dt.month == month) & (df["date"].dt.year == year)]

# Select rows with not NaN values

df_sub = df_sub[df_sub["avg_temperature"].notna()]

# Return average

return df_sub["avg_temperature"].mean()

#Call function to test

get_mean_temp_month(6, 2020)

### Now use the function to get the mean temperature for April across all the years and Plot the trend. (2 point)

The code for plotting using the matplotlib library is provided.

month_means = {}

years = df['date'].dt.year.unique()#This gives the years present in the dataset

#Iterate (for loop) through the years and get the mean temperature for April (month 4) using the function you wrote in the previous cell

#Store the results in the dictionary month_means with the year as the key and the mean temperature as value

# Iterate through years

for yr in years:

# April is month 4

this_mean = get_mean_temp_month(4, yr)

month_means[yr] = this_mean

#Now visualize the changing trends across the years for April

import matplotlib.pyplot as plt

%matplotlib inline

year = list(month_means.keys())

mean_temp = list(month_means.values())

plt.plot(year, mean_temp)