## Instructions

**Objective**

Write a program to implement data visualization in python.

## Requirements and Specifications

**Source Code**

```
# MANOVA example dataset
https://www.statsmodels.org/dev/generated/statsmodels.multivariate.manova.MANOVA.html
Suppose we have a dataset of various plant varieties (plant_var) and their associated phenotypic measurements for plant heights (height) and canopy volume (canopy_vol). We want to see if plant heights and canopy volume are associated with different plant varieties using MANOVA.
### Load dataset
import pandas as pd
df=pd.read_csv("https://reneshbedre.github.io/assets/posts/ancova/manova_data.csv")
df.head(5)
### Summary statistics and visualization of dataset
Get summary statistics based on each dependent variable
[df.groupby("plant_var")["height"].mean(),df.groupby("plant_var")["height"].count(),df.groupby("plant_var")["height"].std()]
[df.groupby("plant_var")["canopy_vol"].mean(),df.groupby("plant_var")["canopy_vol"].count(),df.groupby("plant_var")["canopy_vol"].std()]
### Visualize dataset
import seaborn as sns
import matplotlib.pyplot as plt
fig, axs = plt.subplots(ncols=2)
sns.boxplot(data=df, x="plant_var", y="height", hue=df.plant_var.tolist(), ax=axs[0])
sns.boxplot(data=df, x="plant_var", y="canopy_vol", hue=df.plant_var.tolist(), ax=axs[1])
plt.show()
### Perform one-way MANOVA
from statsmodels.multivariate.manova import MANOVA
fit = MANOVA.from_formula('height + canopy_vol ~ plant_var', data=df)
print(fit.mv_test())
### Make a Conclusion
The Pillai’s Trace test statistics is statistically significant [Pillai’s Trace = 1.03, F(6, 72) = 12.90, p < 0.001] and indicates that plant varieties has a statistically significant association with both combined plant height and canopy volume.
## Your Task 1
Suppose we have gathered the following data on female athletes in three sports. The
measurements we have made are the athletes' heights and vertical jumps, both in inches. The
data are listed as (height, jump) as follows:
Basketball Players:
Track Athletes:
Softball Players:
(66, 27), (65, 29), (68, 26), (64, 29), (67, 29)
(63, 23), (61, 26), (62, 23), (60, 26)
(62, 23), (65, 21), (63, 21), (62, 23), (63.5, 22), (66, 21.5)
Use statsmodels.multivariate.manova Python to conduct the MANOVA F-test using Wilks' Lambda to test for a difference in
(height, jump) mean vectors across the three sports. Make sure you include clear command
lines and relevant output/results with hypotheses, test result(s) and
conclusion(s)/interpretation(s)
# YOUR CODE here
# Define your dataframe
# Check data
# Define a list with the data
data_lst = [
['Basketball Players', 66,27],
['Basketball Players', 65,29],
['Basketball Players', 68,26],
['Basketball Players', 64,29],
['Basketball Players', 67,29],
['Track Athletes', 63,23],
['Track Athletes', 61,26],
['Track Athletes', 62,23],
['Track Athletes', 60,26],
['Track Athletes', 62,23],
['Softball Players', 65,21],
['Softball Players', 63,21],
['Softball Players', 62,23],
['Softball Players', 63.5,22],
['Softball Players', 66,21.5]]
# Define column names
columns = ['Type', 'Height', 'Jump']
# Constructo dataframe
data = pd.DataFrame(data = data_lst, columns = columns)
data.head()
# Conduct the MANOVA F-test
fit = MANOVA.from_formula('Height + Jump ~ Type', data=data)
print(fit.mv_test())
From Wilk's lambda we can see that the p-value is < 0.05 so we reject the null Hyptothesis, meaning that the Height and Jump are not related to the Type of Athelete.
## Your Task 2 (bonus and optional)
For the above problem, try to use non-built-in function in Python to calculate F score and check with your built-in function output above
# YOUR CODE HERE
def F_score(prec, recall):
return 2*(prec*recall)/(prec+recall)
```