+1 (315) 557-6473 

Python Program to Visualize and Preprocess Data Assignment Solution.


Instructions

Objective
Write a Python homework program to visualize and preprocess data.

Requirements and Specifications

program to visualize and preprocess data in python
program to visualize and preprocess data in python 1

Source Code

# **APCO 1P93: Applied Programming (for Data Science)**

### Winter, 2022

### Instructor: Yifeng (Ethan) Li

### Department of Computer Science, Brock University

### Email:

### TA: Tristan Navikevicius:

---

## **Assignment as Final Exam**

## **Due Date: 10:00pm, Tuesday, April 26th, 2022 **

###**Plagiarism = Severe Consequence **

###**Place your work in a zipped folder named _Firstname_Lastname_StudentNumber_ for your submission. **

## **Question 1** (30 points)

In this question, your job is to define a class named `Hydro` to process and visualize a small data set from Alectra Utilities. For your convenience, the structure of this class is given below. You will need to do the following tasks.

* Define a method named `read_data` within the `Hydro` class to load the data from the given text file named `hydro_28-Mar-2022.csv` using the `loadtxt` function in `numpy` (). The first row of the data should be assigned to `self.header` and the rest should be assigned to `self.data`. **(3 marks)**

* Define a method named `sort_data` within the `Hydro` class to sort the rows in `self.data` according the first column (Reading Date) in incremental order. You need to use function `numpy.argsort`(). Note, the sorted data should still be in `self.data`. **(3 marks)**

* Define a method named `add_temperature` within the `Hydro` class to load the climate data from the given text file named `climate.txt` and add a new column to `self.data` with corresponding Daily Mean Temperature in the corresponding months. For example, for `'2021-04-16'` (an element in the first column of self.data), `'7.4'` (Daily Mean Temperature for `'Apr'` in climate.txt) should be the corresponding value in the new column. Note, you should also add a new string element `'Daily Mean Temperature'` to `self.header`. **(5 marks)**

* Define a method named `save_data` within the `Hydro` class to concatenate `self.header` and `self.data` and save it to a csv file. Note, use `numpy.savetxt` function (). **(2 marks)**

* Define a method named `draw_plots` within the `Hydro` class to draw two subplots. **(12 marks)**

* The first subplot is a mixture of bars (for `Average KWH/Day` in `self.data`) and a line/curve (for `Daily Mean Temperature` in `self.data`) with shared/twin x-axis but different y-axis. This subplot has a left y-axis and a right y-axis. The left axis is for `Average KWH/Day` and the right-axis is for `Daily Mean Temperature`.

* The second subplot is a line plot to visualize the `Current Reading` column in `self.data`.

* The instructor's plot, as a pdf file, is provided with this assignment. You will have to reproduce it as precisely as possible. You may find the following materials helpful: , (particularly `set_rotation`, `set_fontsize`, and `set_color` methods for a text object), , , .

* The figure should be saved to a pdf file.

* After defining the class and methods described above, create an instance of the `Hydro` class. Call the following methods of this instance in sequential order: `read_data`, `sort_data`, `add_temperature`, `draw_plots`, `save_data`. **(5 marks)**

import numpy as np

import matplotlib.pyplot as plt

# define your Hydro class here

class Hydro:

def __init__(self):

self.header = None # will be 1d array of str type, length 8 or 9 (after adding a new field for temperature)

self.data = None # will be 2d array of str type, shape (23,8) or (23,9)

def read_data(self, filename = './hydro_28-Mar-2022.csv'):

"""

Read the provided hydro data as string data type.

Assign the header info (first row of the text file) to self.header.

Assign the rows after the first row of the text file to self.data.

INPUTS:

filename: string, file name for the given data set.

"""

self.header = np.empty((1,8))

self.data = np.empty((0,8),dtype=str)

# (3 marks)

# Open the file

with open(filename, 'r') as f:

# Read all lines

lines = f.readlines()

# The first line contains the header

self.header = np.array(lines.pop(0).strip().split(','))

# The rest contains the data

print(len(lines))

for i, line in enumerate(lines):

# Split and append to data

row = line.strip().split(',')

#self.data[i,:] = np.array(row)

self.data = np.vstack([self.data, row])

print('Data from {0} has been successfully loaded.'.format(filename))

print('The data has header:\n{0}'.format(self.header))

print('There are {0} rows (excluding the header) in the data set.'.format(self.data.shape[0]))

def sort_data(self):

"""

Sort the rows of self.data based on the first field "Reading Date" in increamental order.

https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

"""

# (3 marks)

# Sort data

self.data = np.array(sorted(self.data, key = lambda row: row[0]))

print('The data has been sorted in increamental order.')

def add_temperature(self, filename='./climate.txt'):

"""

Load the climate data and add the Daily Mean Temperature for corresponding months as a new column to self.data.

Of course, a new string element `'Daily Mean Temperature'` should be added to self.header.

"""

# Create a list with months. This list will help us to map the month with its id

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# (5 marks)

with open(filename, 'r') as f:

# Read lines

lines = f.readlines()

# Skip header

lines.pop(0)

# Skip last line

lines.pop(len(lines)-1)

# Add the new column to the header

if 'Daily Mean Temperature' not in self.header:

self.header = np.hstack([self.header, 'Daily Mean Temperature'])

# Extend the data array by adding a new column

column = np.zeros((self.data.shape[0],1))

self.data = np.append(self.data, column, axis=1)

# Now, for each line, split and take the second element and convert it to float

for line in lines:

row = line.strip().split('\t')

month = row[0]

# Get month number

month_n = months.index(month)+1

# Get temp

temp = row[1]

# Now add to data where the recorded date is for the given month

for i in range(len(self.data)):

# Take the date and get the month number

date = self.data[i][0]

month_number = int(date.split('-')[1])

# If the month from the current row of climate data is equal to the month in the main data, then

# append the value of temperature

if month_number == month_n:

self.data[i,-1] = temp

def save_data(self, filename = './hydro_temp.txt'):

"""

This function saves the data to the fiven filename. The first line in the file will contain the header

while the rest of lines contains the data split by commas

"""

#(2 marks)

with open(filename, 'w+') as f:

# Write header

f.write(','.join(self.header) + '\n')

# Now, write the rest

for row in self.data:

f.write(','.join(row) + '\n')

print('Data saved to a text file.')

def draw_plots(self):

fig, (ax1, ax2) = plt.subplots(2,1)

fig.set_size_inches((12,6))

fig.set_tight_layout(tight=True)

# draw bar subplot (4 marks)

# First, get the average KWH/day into a list

avg_kwh = self.data[:,7].astype('float')

# Get daily mean temp

daily_temp = self.data[:,-1].astype('float')

# Get dates

dates = self.data[:,0]

# Now plot

ax1.bar(dates, avg_kwh, color = 'green')

ax1.set_ylabel(self.header[7])

ax1.tick_params(axis='x', rotation=45)

# Display values at the top of each bar

# add a line for temperature to ax1, (3 marks)

ax11 = ax1.twinx() # instantiate a second axes that shares the same x-axis

ax11.plot(range(len(dates)), daily_temp, color = 'orange', marker = 'D')

ax11.set_ylabel(self.header[-1])

ax1.set_title('Daily Average Electricity Usage')

ax1.set_xlabel('Date')

# draw line subplot (4 marks)

# Get consumption values

consumption = self.data[:,6].astype('float')

for i, v in enumerate(avg_kwh):

ax1.text(i-0.5, v+0.1, str(v))

# Plot

ax2.plot(range(len(consumption)), consumption, color = 'red', marker = 'o')

ax2.set_xticklabels(dates)

ax2.tick_params(axis='x', rotation=45)

ax2.set_title('Electricity Consumption')

ax2.set_xticks(range(len(dates)))

ax2.set_ylabel(self.header[6])

ax2.set_xlabel('Date')

# save the drawn figure to a pdf file (1 mark)

plt.savefig('fig.pdf')

plt.show()

# creat instance h1 of class Hydro and call its methods

# (5 marks)

h1 = Hydro()

h1.read_data()

h1.sort_data()

h1.add_temperature()

h1.save_data('output.csv')

h1.draw_plots()

## **Question 2** (Bonus: 6 points)

Q 2.1: Define a function named `min_iterative` using iteration to find the minimal value from a given list of numbers. Your must use a `for` or `while` loop in your implementation. You should also use `try-except` statement for capturing and processing `TypeError` caused by non-int and non-float elements in the input list. After defining this function, test it using lists `[5, -1, 4, -9, 3, 4, 3, 7, -9, 10]`, and `[5, -1, 4, -9, 3, 4, 3, 7, 'abc', 10]`, respectively. **(3 bonus marks)**

# answer Q 2.1 here

import math

def min_iterative(lst):

# Find the min

min_val = lst[0]

for i in range(1, len(lst)):

try:

if lst[i] < min_val:

min_val = lst[i]

except TypeError as e:

pass

return min_val

print(min_iterative([5, -1, 4, -9, 3, 4, 3, 7, -9, 10]))

print(min_iterative([5, -1, 4, -9, 3, 4, 3, 7, 'abc', 10]))

Q2.2: Define a function named `min_recursion` using recursion to find the minimal value a given list of numbers. You must consider the base case and the recursive case for making progress. A `try-except` statement do not need to be used in this function. After defining this function, test it using list `[5, -1, 4, -9, 3, 4, 3, 7, -9, 10]`. **(3 bonus marks)**

# answer Q 2.2 here

def min_recursion(lst):

print(f"Running min_recursion([{', '.join(list(map(str, lst)))}])...")

if len(lst) == 1:

return lst[0]

else:

return min(lst[0], min_recursion(lst[1:]))

min_recursion([5, -1, 4, -9, 3, 4, 3, 7, -9, 10])