+1 (315) 557-6473 

Create a Program to Create a Heart Analysis Classifier in Python Assignment Solution.


Instructions

Objective

If you're looking to complete a Python assignment, here's an interesting task for you. Write a program in Python to create a heart analysis classifier. This assignment will not only test your coding skills but also your understanding of data analysis and machine learning concepts. You'll need to import relevant libraries, preprocess the heart-related data, build a suitable machine learning model, and evaluate its performance. This hands-on project will provide you with valuable experience in working with real-world datasets and enhancing your proficiency in Python programming.

Requirements and Specifications

program to create a heart analysis classifier in python
program to create a heart analysis classifier in python 1

Source Code

## Worksheet 8 ### Constructing, Evaluating, and Visualizing Piplelines ### Due on 6/12/21 @ 11:55 pm EST (see Assignment Folder in Sakai) ## Authorized help and collaboration rules You **may not** collaborate with friends or teammates. You **may** use your notes, class provided resources (e.g., web links, notebooks,videos, slides) to help you solve the problems below. For effective learning, you should try to complete the worksheet on your own before looking for help. If you have any questions regarding what is, or is not, authorized you must ask. Saying after the fact you didn't understand, or were not sure, is not a valid excuse. ## Python modules In the coding cell below, include all the Python modules needed to run your project. Some commonly used modules have already been included for you (System, Numpy, Matplotlib, and Pandas). import sys import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression from sklearn import preprocessing from sklearn.model_selection import cross_validate from sklearn.model_selection import StratifiedKFold from sklearn.preprocessing import MinMaxScaler plt.rcParams.update({"font.size": 16, "legend.loc": "upper right"}) ## Python Version Check Verify you're running Python version `3.7.0` or later. print(sys.version) **** ### Kaggle: Heart Attack Analysis & Prediction Dataset Heart Attack dataset includes the attributes listed below. In total, there are 77 samples (i.e. patients) that make up this dataset. - Age : Age of the patient - Sex : Sex of the patient - cp : Chest Pain type chest pain type - Value 1: typical angina - Value 2: atypical angina - Value 3: non-anginal pain - Value 4: asymptomatic - trtbps : resting blood pressure (in mm Hg) - chol : cholestoral in mg/dl fetched via BMI sensor - fbs : (fasting blood sugar > 120 mg/dl) - Value 1: true - Value 0: false - rest_ecg : resting electrocardiographic results - Value 0: normal - Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) - Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria - thalach : maximum heart rate achieved - pcp : peak cardiac power - output : target values - Value 0: healthy - Value 1: heart condition heart_df = pd.read_csv("heart.csv") heart_df **** ## Question 1: Visualizing Data Relationships (5 Points) In the coding cell below, create a combined scatter plot that visualizes the relationship between **healthy** and **heart condition** patients using the following data attributes: - ``Resting Blood Pressure`` and ``Cholestoral``, - ``Resting Blood Pressure`` and ``Age``, - ``Resting Blood Pressure`` and ``Maximum Heart Rate Achieved``, and - ``Resting Blood Pressure`` and ``Peak Cardiac Power``. In your scatter plots, To recieve full credit your plotting solution must the data provided in the **heart_df** panda and the Matplotlib Python library and create a plotting solution that is **visually identical** to the `plot shown on the right`. You may assume: - The colors used to generate plots are red (heart condition subjects) and blue (healthy subjects) - The figsize=(15,15) - The minimum xtick value is 80, the maximum ytick value is 240, and increments by 20 # Select the rows for healthy patientis healthy = heart_df[heart_df['output'] == 0] # Select patients with heart condition non_healthy = heart_df[heart_df['output'] == 1] fig, axes = plt.subplots(nrows = 2, ncols = 2, figsize=(8,8)) # Resting Blood Pressure and Cholesterol axes[0,0].scatter(healthy['trtbps'], healthy['chol'], label = 'healthy', color = 'blue') axes[0,0].scatter(non_healthy['trtbps'], non_healthy['chol'], label = 'heart condition', color = 'red') axes[0,0].legend() axes[0,0].set_ylabel('Cholesterol') axes[0,0].set_xlabel('Resting Blood Pressure') # Resting Blood Pressure and Age axes[0,1].scatter(healthy['trtbps'], healthy['age'], label = 'healthy', color = 'blue') axes[0,1].scatter(non_healthy['trtbps'], non_healthy['age'],label = 'heart condition', color = 'red') axes[0,1].legend() axes[0,1].set_ylabel('Age') axes[0,1].set_xlabel('Resting Blood Pressure') # Resting Blood Pressure and Maximum Heart Rate Achieved axes[1,0].scatter(healthy['trtbps'], healthy['thalach'], label = 'healthy', color = 'blue') axes[1,0].scatter(non_healthy['trtbps'], non_healthy['thalach'], label = 'heart condition', color = 'red') axes[1,0].legend() axes[1,0].set_ylabel('Maximum Heart Rate Achieved') axes[1,0].set_xlabel('Resting Blood Pressure') # Resting Blood Pressure and Peak Cardiac Power axes[1,1].scatter(healthy['trtbps'], healthy['pcp'], label = 'healthy', color = 'blue') axes[1,1].scatter(non_healthy['trtbps'], non_healthy['pcp'], label = 'heart condition', color ='red') axes[1,1].legend() axes[1,1].set_ylabel('Peak Cardiac Power') axes[1,1].set_xlabel('Resting Blood Pressure') ***** ## Question 2: Construct a Sklearn Pipeline (5 Points) In the coding cell below, create a 2-stage Sklearn pipeline that has two models in this order: 1. MinMaxScaler Sklearn model. Please name this model ``min_max_scaler``. Lastly, set the **feature_range** attribute to (-1,1). 2. LogisticRegression Sklearn model. Please name this model ``logit_classifier``. This is a very simple question, don't over think it :) # scaler min_max_scaler = MinMaxScaler(feature_range = (-1,1)) # classifier logit_classifier = LogisticRegression() # Create pipeline pipe = Pipeline([('scaler', min_max_scaler), ('logistic', logit_classifier)]) ***** ## Question 3: Pipeline Performance Evaluation (5 points) In the coding cell below, evaluate the classification performance of a pipeline using at 10-fold cross-validation approach. The data input (i.e., X matrix) into the pipeline pipeline should only include ``Resting Blood Pressure`` and ``Maximum Heart Rate Achieved`` values, and the classfication labels (i.e., y vector) should only include the output values. To recieve full credit your plotting solution must the data provided in the **heart_df** panda and the Matplotlib Python library and create a plotting solution that is **visually identical** to the `plot shown on the right`. Please ensure you, - use Sklearn cross_validate, - Set the cross_validate **scoring** attribute to ``accuracy`` - Set the cross_validate **cv** attribute StratifiedKFold - Set the StratifiedKFold **n_splits** attribute to 10 - The number of decimal places for Q1, Q2, and Q3 is two. Hint: - To compute the first quartile (Q1), second quartile (Q2, or median), and third quartile (Q3) values use the quantile function in Numpy. # Define X_data X = heart_df[['trtbps', 'thalach']] # Resting Blood pressure and Maximum heart rate achieved y = heart_df['output'] cv_results = cross_validate(pipe, X, y, scoring = 'accuracy', cv = StratifiedKFold(n_splits = 10)) results = np.quantile(cv_results['test_score'], [0.25, 0.5, 0.75]) results = np.round(results, 2) Q1 = results[0] Q2 = results[1] Q3 = results[2] plt.figure() plt.boxplot(results, vert=False) plt.xlabel('Classification Accuracy') plt.title('Resting Blood Pressure vs. Maximum Heart Rate Achieved') plt.text(0.5, 0.5, "Q1 = {:.2f}, Q2 = {:.2f}, Q3 = {:.2f}".format(Q1, Q2, Q3)) ***** ## Question 4: Pipeline Performance Evaluation (5 points) In the coding cell below, evaluate the classification performance of a pipeline using at 10-fold cross-validation approach. The data input (i.e., X matrix) into the pipeline pipeline should only include ``Resting Blood Pressure`` and ``Peak Cardiac Power`` values, and the classfication labels (i.e., y vector) should only include the output values. To recieve full credit your plotting solution must the data provided in the **heart_df** panda and the Matplotlib Python library and create a plotting solution that is **visually identical** to the `plot shown on the right`. Please ensure you, - use Sklearn cross_validate, - Set the cross_validate **scoring** attribute to ``accuracy`` - Set the cross_validate **cv** attribute StratifiedKFold - Set the StratifiedKFold **n_splits** attribute to 10 - The number of decimal places for Q1, Q2, and Q3 is two. Hint: - To compute the first quartile (Q1), second quartile (Q2, or median), and third quartile (Q3) values use the quantile function in Numpy. # Define X_data X = heart_df[['trtbps', 'pcp']] # Resting Blood pressure and Peak Cardiac Power y = heart_df['output'] cv_results = cross_validate(pipe, X, y, scoring = 'accuracy', cv = StratifiedKFold(n_splits = 10)) results = np.quantile(cv_results['test_score'], [0.25, 0.5, 0.75]) results = np.round(results, 2) Q1 = results[0] Q2 = results[1] Q3 = results[2] plt.figure() plt.boxplot(results, vert=False) plt.xlabel('Classification Accuracy') plt.title('Resting Blood Pressure vs. Maximum Heart Rate Achieved') plt.text(0.5, 0.5, "Q1 = {:.2f}, Q2 = {:.2f}, Q3 = {:.2f}".format(Q1, Q2, Q3))