Bumblebee Sela G#3
Conversation
anselrognlie
left a comment
There was a problem hiding this comment.
Looks good overall, but please review my comments, and let me know if you have any questions.
| assert updated_data["watched"] == [{ | ||
| "title": MOVIE_TITLE_1, | ||
| "genre": GENRE_1, | ||
| "rating": RATING_1 | ||
| }] |
There was a problem hiding this comment.
👍 Nice use of the fact that Python compares dictionaries by their contents to simplify these checks, rather than writing out the checks key by key.
| assert len(updated_data["watched"]) == 1 | ||
|
|
||
| raise Exception("Test needs to be completed.") | ||
| assert updated_data["watchlist"] == [] |
There was a problem hiding this comment.
Checking that the watchlist is empty isn't necessary here, since we already checked that the length was 0 earlier.
| # ****** Add assertions here to test that the correct movie was added to "watched" ********** | ||
| # ******************************************************************************************* | ||
| assert updated_data["watchlist"] == [FANTASY_1] | ||
| assert updated_data["watched"] == [FANTASY_2, movie_to_watch] |
There was a problem hiding this comment.
The project description doesn't require that the watched movie ends up at any particular location in the watched list. As a result, rather than checking the results of the watched list against a literal list (which implies an ordering), we might use in checks. We could do the same for the watchlist check, but since there should be only a single movie left there, we actually do know exactly where in the list it is!
| assert INTRIGUE_3 in friends_unique_movies | ||
| assert HORROR_1 in friends_unique_movies | ||
| assert FANTASY_4 in friends_unique_movies |
There was a problem hiding this comment.
👍 The get_friends_unique_watched method doesn't specify the order of the movies (and doing so could restrict the ways in which we might perform uniquing on the movie records). Using in to check only for each movie's presence is a great idea. And since we also know the number of expected records, these checks completely describe the contents of the result.
| } | ||
| recommendations = get_new_rec_by_genre(sonyas_data) | ||
|
|
||
| assert len(recommendations) == 0 |
There was a problem hiding this comment.
Since the specification says this should return a list, it would be more appropriate to check not only that the length of the result is 0, but that the result is itself a list. This is most succinctly checked as
assert recommendations == []Note that while in application code, it's considered idiomatic to check for an empty list as not some_list_variable, we should be aware that this is a pretty loose check. It really checks that whatever some_list_variable refers to is falsy. In a test, we like our checks to be more specific than checks in regular code. If we're more strict in our tests than we need to be, then application code can use more idiomatic code without concern, but can also be more precise where desired.
| # get list of friends unique movie | ||
| # add to empty genre rec only if genre of the movies is equal to user favorite | ||
| # return list | ||
| user_genre = get_most_watched_genre(user_data) |
There was a problem hiding this comment.
👍 The description for this function states that we should "consider the user's most frequently watched genre." This sounds a lot like what get_most_watched_genre calculates! Reusing get_most_watched_genre gets us the required most watched genre information while keeping our code DRY.
| # add to empty genre rec only if genre of the movies is equal to user favorite | ||
| # return list | ||
| user_genre = get_most_watched_genre(user_data) | ||
| friends_unique = get_friends_unique_watched(user_data) |
There was a problem hiding this comment.
👍 The description for this function states that the results should only include movies that a friend has watched, but the user has not. This sounds a lot like what get_friends_unique_watched calculates! Reusing get_friends_unique_watched gets us the unique movies watched by friends, while keeping our code DRY.
All we have left to do is filter the list down to the movies in the same genre as the user's most watched genre.
| genre_reqs = [] | ||
| for data in friends_unique: | ||
| if data["genre"] == user_genre: | ||
| genre_reqs.append(data) |
There was a problem hiding this comment.
Here, we're filtering a list of movies based on a relatively simple criterion: get the movies that have a matching genre. This would be a great fit for a list comprehension. This could be expressed as
genre_reqs = [movie for movie in friends_unique if movie["genre"] == user_genre]This may look a little odd, but they pop up all over the place in Python code for cases like this, so it's worth getting used to reading and writing them.
| # put movies in list only if in user favorites | ||
| # AND NOT in friend's watched | ||
| favorites_recs = [] | ||
| user_unique = get_unique_watched(user_data) |
There was a problem hiding this comment.
👍The description for this function states that the results should only include movies that no friends have watched. This sounds a lot like what get_unique_watched calculates! However, this problem differs slightly in that we're starting from the user's favorites list, not their watched list, whereas get_unique_watched starts from the watched list.
If we make a bit of an assumption that a user cannot have made a movie a favorite that they have not yet already watched, then we could approach this by getting the unique user-watched movies (no friends will have watched these) and then filter them to the movies in the user's favorites list. It turns out this is a safe assumption to make for this project, allowing us to reuse our get_unique_watched.
However, if we assumed that there were no direct connection between the user's watched list and their favorites (maybe a user decides to use their "favorites" data to track some other collection of movies or to just register approval of the concept of a movie), we might not choose to reuse get_unique_watched directly.
In this case, it would be possible to make another helper function, say filter_movies_against_friends_watched(movie_list), that performs the same logic that we have in get_unique_watched, just that it starts with the list of movies passed in rather than assuming it should start with user_data["watched"].
Then for get_unique_watched we could pass in user_data["watched"], while for get_rec_from_favorites we could pass in user_data["favorites"]!
| favorites_recs = [] | ||
| user_unique = get_unique_watched(user_data) | ||
| for data in user_unique: | ||
| if data in user_data["favorites"]: |
There was a problem hiding this comment.
Depending on how large we're expecting the favorites list to become, it could be worth extracting the titles into a collection with more favorable lookup time complexity rather than directly checking that each movie is in the favorites list (linear time).
| for movie in user_data["watched"]: | ||
| genre = movie["genre"] | ||
|
|
||
| if genre in genre_counts: # can maybe do this instead: genre_counts[genre] = genre_counts.get(genre, 0) + 1 | ||
| genre_counts[genre] += 1 | ||
| else: | ||
| genre_counts[genre] = 1 |
There was a problem hiding this comment.
👍 Nice use of a frequency map to count up the appearances of each genre in a single pass through the list of movies. This has better time complexity than trying to count the occurrences of each genre while looking for the most frequent genre, which results in more nested iterations. Instead, we can trade some space to store the results of counting them all once.
| for movie in user_data["watched"]: | ||
| genre = movie["genre"] | ||
|
|
||
| if genre in genre_counts: # can maybe do this instead: genre_counts[genre] = genre_counts.get(genre, 0) + 1 |
There was a problem hiding this comment.
👍 I like the .get() approach since it hides the conditional behind the method call, making this code simpler to read. The .get() approach isn't any more or less efficient (internally, it basically has to do the same thing we're doing here), so we'd really be choosing this for the improved readability.
| if genre_counts[genre] > top_count: | ||
| top_count = genre_counts[genre] | ||
| top_genre = genre |
There was a problem hiding this comment.
Prefer to pull this out of the loop that's building the frequency map into its own loop. Doing so would allow for greater decomposability making it easier to refactor the major steps. When everything is mixed into a single loop, it can be difficult to meaningfully untangle them.
But we could instead write
for movie in user_data["watched"]:
genre = movie["genre"]
if genre in genre_counts:
genre_counts[genre] += 1
else:
genre_counts[genre] = 1
for genre, count in genre_counts.items():
if count > top_count:
top_count = count
top_genre = genrewhich has exactly same complexity, but could be further refactored to
genre_counts = count_genres(user_data["watched"])
top_genre = get_max_genre(genre_counts)assuming we split those two loops into the functions listed above.
No description provided.