Program to Implement Film Filtering Assignment Solution.

Instructions

Objective

Write a program to implement film filtering in python.

Requirements and Specifications

Data Input Ratings file: A text file that contains movie ratings. Each line has the name (with year) of a movie, its rating (range 0-5 inclusive), and the id of the user who rated the movie. A movie can have multiple ratings from different users. A user can rate a particular movie only once. A user can however rate multiple movies. Here's a sample ratings file. Movies file: A text file that contains the genres of movies. Each line has a genre, a movie id, and the name (with year) of the movie. To keep it simple, each movie belongs to a single genre. Here's a sample movies file. Note: A movie name includes the year, since it's possible different movies have the same title, but were made in different years. You may assume that input files will be correctly formatted, and data types will be as expected. So you don't need to write code to catch any formatting or data typing errors. For all computation of rating, do not round up (or otherwise modify) the rating unless otherwise specified. Task 1: Reading Data [10 pts] Write a python assignment function read_ratings_data(f) that takes in a ratings file, and returns a dictionary. The dictionary should have movie as key, and the corresponding list of ratings as value. For example: movie_ratings_dict = { "The Lion King (2019)" : [6.0, 7.5, 5.1], "Titanic (1997)": [7] } [10 pts] Write a function read_movie_genre(f) that takes in a movies file and returns a dictionary. The dictionary should have a one-to-one mapping between movie and genre. For example { "Toy Story (1995)" : "Adventure", "Golden Eye (1995)" : "Action" } Watch out for leading and trailing whitespaces in movie name and genre name, and remove them when encountered. Task 2: Processing Data [8 pts] Genre dictionary Write a function create_genre_dict that takes as a parameter a movie-to-genre dictionary, of the kind created in Task 1.2. The function should return another dictionary in which a genre is mapped to all the movies in that genre. For example: { genre1: [ m1, m2, m3], genre2: [m6, m7] } [8 pts] Average Rating Write a function calculate_average_rating that takes as a parameter a ratings dictionary, of the kind created in Task 1.1. It should return a dictionary where the movie is mapped to its average rating computed from the ratings list. For example: {"Spider-Man (2002)": [3,2,4,5]} ==> {"Spider-Man (2002)": 3.5} Task 3: Recommendation [10 pts] Popularity based In services such as Netflix and Spotify, you often see recommendations with the heading “Popular movies” or “Trending top 10”. Write a function get_popular_movies that takes as parameters a dictionary of movie-to-average rating ( as created in Task 2.2), and an integer n (default should be 10). The function should return a dictionary ( movie:average rating, same structure as input dictionary) of top n movies based on the average ratings. If there are fewer than n movies, it should return all movies in order of top average ratings. [8 pts] Threshold Rating Write a function filter_movies that takes as parameters a dictionary of movie-to-average rating (same as for the popularity based function above), and a threshold rating with default value of 3. The function should filter movies based on the threshold rating, and return a dictionary with same structure as the input. For example, if the threshold rating is 3.5, the returned dictionary should have only those movies from the input whose average rating is equal to or greater than 3.5. [12 pts] Popularity + Genre based In most recommendation systems, genre of the movie/song/book plays an important role. Often features like popularity, genre, artist are combined to present recommendations to a user. Write a function get_popular_in_genre that, given a genre, a genre-to-movies dictionary (as created in Task 2.1), a dictionary of movie:average rating (as created in Task 2.2), and an integer n (default 5), returns the top n most popular movies in that genre based on the average ratings. The return value should be a dictionary of movie-to-average rating of movies that make the cut. If there are fewer than n movies, it should return all movies in order of top average ratings. Genres will be from those in the movie:genre dictionary created in Task 1.2. The genre name will exactly match one of the genres in the dictionary, so you do not need to do any upper or lower case conversion. [8 pts] Genre Rating One important analysis for the content platforms is to determine ratings by genre. Write a function get_genre_rating that takes the same parameters as get_popular_in_genre above, except for n, and returns the average rating of the movies in the given genre. [12 pts] Genre Popularity Write a function genre_popularity that takes as parameters a genre-to-movies dictionary (as created in Task 2.1), a movie-to-average rating dictionary (as created in Task 2.2), and n (default 5), and returns the top-n rated genres as a dictionary of genre:average rating. If there are fewer than n genres, it should return all genres in order of top average ratings. Hint: Use the above get_genre_rating function as a helper. Task 4: User Focused [10 pts] Read the ratings file to return a user-to-movies dictionary that maps user ID to the associated movies and the corresponding ratings. Write a function named read_user_ratings for this, with the ratings file as the parameter. For example: { u1: [ (m1, r1), (m2, r2) ], u2: [ (m3, r3), (m8, r8) ] } where ui is user ID, mi is movie, ri is corresponding rating. You can handle user ID as int or string type, but make sure you consistently use it as the same type everywhere in your code. [12 pts] Write a function get_user_genre that takes as parameters a user id, the user-to-movies dictionary (as created in Task 4.1 above), and the movie-to-genre dictionary (as created in Task 1.2), and returns the top genre that the user likes based on the user's ratings. Here, the top genre for the user will be determined by taking the average rating of the movies genre-wise that the user has rated. If multiple genres have the same highest ratings for the user, return any one of genres (arbitrarily) as the top genre. [12 pts] Recommend 3 most popular (highest average rating) movies from the user's top genre that the user has not yet rated. Write a python assignment function recommend_movies for this, that takes a parameters a user id, the user-to-movies dictionary (as created in Task 4.1 above), the movie-to-genre dictionary (as created in Task 1.2), and the movie-to-average rating dictionary (as created in Task 2.2). The function should return a dictionary of movie-to-average rating. If fewer than 3 movies make the cut, then return all the movies that make the cut in order of top average ratings.

Source Code

# FILL IN ALL THE FUNCTIONS IN THIS TEMPLATE
# MAKE SURE YOU TEST YOUR FUNCTIONS WITH MULTIPLE TEST CASES
# ASIDE FROM THE SAMPLE FILES PROVIDED TO YOU, TEST ON YOUR OWN FILES
# WHEN DONE, SUBMIT THIS FILE TO CANVAS
import math
from collections import defaultdict
from collections import Counter
# YOU MAY NOT CODE ANY OTHER IMPORTS
# ------ TASK 1: READING DATA --------
# 1.1
def read_ratings_data(f):
 # parameter f: movie ratings file name f (e.g. "movieRatingSample.txt")
 # return: dictionary that maps movie to ratings
 # WRITE YOUR CODE BELOW
 # Open file
 lines = None
 with open(f, 'r') as fil:
 lines = fil.readlines()
 # declare dictionary
 ratings = dict()
 if lines != None: # a list of lines was read successfully
 # Loop through lines
 for line in lines:
 # Split
 row = line.split("|")
 name = row[0].strip()
 rating = float(row[1].strip())
 # Add to dictionary
 if not name in ratings:
 ratings[name] = list()
 ratings[name].append(rating)
 return ratings
# 1.2
def read_movie_genre(f):
 # parameter f: movies genre file name f (e.g. "genreMovieSample.txt")
 # return: dictionary that maps movie to genre
 # WRITE YOUR CODE BELOW
 # Open file
 lines = None
 with open(f, 'r') as fil:
 lines = fil.readlines()
 # declare dictionary
 genres = dict()
 if lines != None: # a list of lines was read successfully
 # Loop through lines
 for line in lines:
 # Split
 row = line.split("|")
 genre = row[0].strip()
 name = row[2].strip()
 # Add to dictionary
 genres[name] = genre
 return genres
# ------ TASK 2: PROCESSING DATA --------
# 2.1
def create_genre_dict(d):
 # parameter d: dictionary that maps movie to genre
 # return: dictionary that maps genre to movies
 # WRITE YOUR CODE BELOW
 # Create a dictionary to store the genres
 genres = dict()
 # Now, loop through the dictionary 'd'
 for name, genre in d.items():
 if not genre in genres:
 genres[genre] = list()
 genres[genre].append(name)
 return genres
# 2.2
def calculate_average_rating(d):
 # parameter d: dictionary that maps movie to ratings
 # return: dictionary that maps movie to average rating
 # WRITE YOUR CODE BELOW
 # Create the dictionary that will contains the average rating
 ratings_d = dict()
 for name, ratings in d.items():
 ratings_d[name] = sum(ratings)/len(ratings)
 return ratings_d
# ------ TASK 3: RECOMMENDATION --------
# 3.1
def get_popular_movies(d, n=10):
 # parameter d: dictionary that maps movie to average rating
 # parameter n: integer (for top n), default value 10
 # return: dictionary that maps movie to average rating
 # WRITE YOUR CODE BELOW
 # First of all, sort the dictionary d in descending order
 d_sorted = {k: v for k, v in sorted(d.items(), key = lambda item: item[1], reverse = True)}
 # Now, convert the dictionary into a list
 d_list = list(d_sorted.items())
 # Get the top n items
 top_movies = d_list[:n]
 # Convert again to a dictionary
 top_movies_dict = {item[0]: item[1] for item in top_movies}
 return top_movies_dict
# 3.2
def filter_movies(d, thres_rating=3):
 # parameter d: dictionary that maps movie to average rating
 # parameter thres_rating: threshold rating, default value 3
 # return: dictionary that maps movie to average rating
 # WRITE YOUR CODE BELOW
 # We assume the dict 'd' is already sorted
 filtered_movies = dict()
 for name, rating in d.items():
 if rating >= thres_rating:
 filtered_movies[name] = rating
 return filtered_movies
# 3.3
def get_popular_in_genre(genre, genre_to_movies, movie_to_average_rating, n=5):
 # parameter genre: genre name (e.g. "Comedy")
 # parameter genre_to_movies: dictionary that maps genre to movies
 # parameter movie_to_average_rating: dictionary that maps movie to average rating
 # parameter n: integer (for top n), default value 5
 # return: dictionary that maps movie to average rating
 # WRITE YOUR CODE BELOW
 # First of all, get all movies of the given genre
 movies = genre_to_movies[genre]
 # Now, create a dict of movies to avg rating for the movies of the given genre
 d = {name: rating for name, rating in movie_to_average_rating.items() if name in movies}
 # Now get the top n movies
 top_movies = get_popular_movies(d, n)
 # Return
 return top_movies
# 3.4
def get_genre_rating(genre, genre_to_movies, movie_to_average_rating):
 # parameter genre: genre name (e.g. "Comedy")
 # parameter genre_to_movies: dictionary that maps genre to movies
 # parameter movie_to_average_rating: dictionary that maps movie to average rating
 # return: average rating of movies in genre
 # WRITE YOUR CODE BELOW
 # First of all, get all movies of the given genre
 movies = genre_to_movies[genre]
 # Now, create a dict of movies to avg rating for the movies of the given genre
 d = {name: rating for name, rating in movie_to_average_rating.items() if name in movies}
 # Now, calculate the avg rating between all movies in d
 ratings = list(d.values())
 # Now, calculate the avg rating
 avg_rating = sum(ratings)/len(ratings)
 return avg_rating
# 3.5
def genre_popularity(genre_to_movies, movie_to_average_rating, n=5):
 # parameter genre_to_movies: dictionary that maps genre to movies
 # parameter movie_to_average_rating: dictionary that maps movie to average rating
 # parameter n: integer (for top n), default value 5
 # return: dictionary that maps genre to average rating
 # WRITE YOUR CODE BELOW
 # Create the dictionary that maps the genre to avg rating
 genre_ratings = dict()
 # Loop through all genres
 for genre, movies in genre_to_movies.items():
 # Now calculate the avg rating for the current genre
 rating = get_genre_rating(genre, genre_to_movies, movie_to_average_rating)
 # Add to dict
 genre_ratings[genre] = rating
 # Now, sort the dict
 genre_ratings_sorted = {genre: rating for genre, rating in sorted(genre_ratings.items(), key = lambda item: item[1], reverse = True)}
 # Convert to list
 genre_ratings_lst = list(genre_ratings_sorted.items())
 # Get only the first n
 genre_ratings_lst = genre_ratings_lst[:n]
 # Convert again to dict
 genre_ratings_final = {item[0]: item[1] for item in genre_ratings_lst}
 return genre_ratings_final
# ------ TASK 4: USER FOCUSED --------
# 4.1
def read_user_ratings(f):
 # parameter f: movie ratings file name (e.g. "movieRatingSample.txt")
 # return: dictionary that maps user to movies and ratings
 # WRITE YOUR CODE BELOW
 lines = None
 with open(f, 'r') as fil:
 lines = fil.readlines()
 # Create dict
 data = dict()
 if lines:
 for line in lines:
 row = line.split("|")
 name = row[0]
 rating = float(row[1])
 uid = int(row[2])
 if not uid in data:
 data[uid] = list()
 data[uid].append((name, rating))
 return data
# 4.2
def get_user_genre(user_id, user_to_movies, movie_to_genre):
 # First, get movies and ratings for the given user id
 movies = user_to_movies[user_id]
 # Create a dict that will map genre-to-user-rating
 genres = dict()
 # Loop through all rated movies
 for tup in movies:
 movie = tup[0]
 rating = tup[1]
 # Get genre
 genre = movie_to_genre[movie]
 if not genre in genres:
 genres[genre] = list()
 genres[genre].append(rating)
 # Now, for all genre, calculate the avg rating
 genres = {genre: sum(ratings)/len(ratings) for genre, ratings in genres.items()}
 # Now, sort this dict
 genres_sorted = {genre: rating for genre, rating in sorted(genres.items(), key = lambda item: item[1], reverse=True)}
 # Now, get the first genre from the dict
 return list(genres_sorted.keys())[0]
# parameter user_id: user id
# parameter user_to_movies: dictionary that maps user to movies and ratings
# parameter movie_to_genre: dictionary that maps movie to genre
# return: top genre that user likes
# WRITE YOUR CODE BELOW
# 4.3
def recommend_movies(user_id, user_to_movies, movie_to_genre, movie_to_average_rating):
 # parameter user_id: user id
 # parameter user_to_movies: dictionary that maps user to movies and ratings
 # parameter movie_to_genre: dictionary that maps movie to genre
 # parameter movie_to_average_rating: dictionary that maps movie to average rating
 # return: dictionary that maps movie to average rating
 # WRITE YOUR CODE BELOW
 # First, get user's top genre
 top_genre = get_user_genre(user_id, user_to_movies, movie_to_genre)
 # get a list of all movies for this genre
 genre_movies = [movie for movie, genre in movie_to_genre.items() if genre == top_genre]
 # Now, get a list of all movies (of this genre) that the user rated
 genre_rated_movies = list()
 user_movies = user_to_movies[user_id]
 for tup in user_movies:
 genre_rated_movies.append(tup[0])
 # Now, get all movies of the genre that user has not rated
 not_seen_movies = list(set(genre_movies) - set(genre_rated_movies))
 # Now, get a dict of movie-to-rating for these movies
 not_seen_d = {movie: rating for movie, rating in movie_to_genre.items() if movie in not_seen_movies}
 # Sort dict by rating in descending order
 not_seen_d = {movie: rating for movie, rating in sorted(not_seen_d.items(), key = lambda item: item[1], reverse = True)}
 # Finally, get the top 3 movies
 not_seen_list = list(not_seen_d.items())[:3]
 top_3 = {tup[0]: movie_to_average_rating[tup[0]] for tup in not_seen_list}
 return top_3
# -------- main function for your testing -----
def main():
 # write all your test code here
 # this function will be ignored by us when grading
 # Task 1.1
 ratings_data = read_ratings_data('ratings.txt')
 print("*** TASK 1.1 ***")
 print(ratings_data)
 print("\n\n")
 # Task 1.2
 genre_data = read_movie_genre('genres.txt')
 print("*** TASK 1.2 ***")
 print(genre_data)
 print("\n\n")
 # Task 2.1
 genre_dict = create_genre_dict(genre_data)
 print("*** TASK 2.1 ***")
 print(genre_dict)
 print("\n\n")
 # Task 2.2
 average_rating = calculate_average_rating(ratings_data)
 print("*** TASK 2.2 ***")
 print(average_rating)
 print("\n\n")
 # Task 3.1
 popular_movies = get_popular_movies(average_rating, 10)
 print("*** TASK 3.1 ***")
 print(popular_movies)
 print("\n\n")
 # Task 3.2
 filtered_movies = filter_movies(average_rating, 3)
 print("*** TASK 3.2 ***")
 print(filtered_movies)
 print("\n\n")
 # Task 3.3
 popular_genre = get_popular_in_genre('Comedy', genre_dict, average_rating, 5)
 print("*** TASK 3.3 ***")
 print(popular_genre)
 print("\n\n")
 # Task 3.4
 genre_rating = get_genre_rating('Comedy', genre_dict, average_rating)
 print("*** TASK 3.4 ***")
 print(genre_rating)
 print("\n\n")
 # Task 3.5
 genre_pop = genre_popularity(genre_dict, average_rating, 5)
 print("*** TASK 3.5 ***")
 print(genre_pop)
 print("\n\n")
 # Task 4.1
 user_ratings = read_user_ratings('ratings.txt')
 print("*** TASK 4.1 ***")
 print(user_ratings)
 print("\n\n")
 # Task 4.2
 user_genre = get_user_genre(18, user_ratings, genre_data)
 print("*** TASK 4.2 ***")
 print(user_genre)
 print("\n\n")
 # Task 4.3
 recommended = recommend_movies(18, user_ratings, genre_data, average_rating)
 print("*** TASK 4.3 ***")
 print(recommended)
 print("\n\n")
# DO NOT write ANY CODE (including variable names) outside of any of the above functions
# In other words, ALL code your write (including variable names) MUST be inside one of
# the above functions
# program will start at the following main() function call
# when you execute hw1.py
main()

Python Program to Implement Film Filtering Assignment Solution.

Instructions

Requirements and Specifications