Publish date March 28, 2019
How a recommendation system suggests anything relevant based on the user interest?
For a long time, I have been thinking about how shopping websites like Flipkart or Amazon or movie-based platforms like Netflix or even on medium suggest anything based on user interest.
But things are very simple. Unlike my other blogs, this will be little short and will suffice enough to brief you about “recommendation system” and of course with working code.
In this blog post, I will build a movie recommendation system using The movies dataset and deploy it using Flask.
I hear what Google has to say about it.
A recommender system or a recommendation system (sometimes replacing “system” with a synonym such as platform or engine) is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item
and putting in a simple language “a recommendation system suggest anything relevant based on the user interest”
The recommendation system is classified into two types Content-Based and Collaborative based recommendation system. Let’s try to understand each one by one.
The idea behind Content-based (cognitive filtering) recommendation system is to recommend an item based on a comparison between the content of the items and a user profile. In simple words, I may get a recommendation for a movie based on the description of other movies.
The theory behind collaborative filtering to work with collaboration with user or movie id. For example, there are two user A and B, user A likes movie P, Q, R, S and user B like movies Q, R, S, T. Since movies Q, R and S are similar to both user, therefore, movie P will be recommended to user B, and movie T will be recommended to used A.
We are starting with the understanding of data first. I have used The Movies Dataset. This dataset has metadata on over 45,000 movies and 26 million ratings from over 270,000 users. For our purpose, we will be using movies_metadata.csv and links_small.csv . This dataset describes the one-to-many relationship among the user and ratings. Before we dive in code lets try to figure out our approach towards the solution. For the ease of understanding, I have tried my hands on both (content and collaborative ) filtering approach on the same data set.
For content-based filtering, the approach is relatively simple we have to just convert the words or text in vector form and to find the closest recommendation to our given movie input title using cosine similarity
Let’s begin with the code.
# mount your drive
from google.colab import drive
# read the CSV file
md = pd. read_csv(‘drive/My Drive/Colab Notebooks/Movie_recommendation/movie_dataset/movies_metadata.csv’)
# dropping rows by index
md = md.drop([19730, 29503, 35587])
#performing look up operation on all movies that are present in links_small dataset
md[‘id’] = md[‘id’].astype(‘int’)
smd = md[md[‘id’].isin(links_small)]
smd[‘tagline’] = smd[‘tagline’].fillna(‘ ‘)
# Merging Overview and title together
smd[‘description’] = smd[‘overview’] + smd[‘tagline’]
smd[‘description’] = smd[‘description’].fillna(‘ ‘)
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer=’word’,ngram_range=(1, 3),min_df=0, stop_words=’english’)
tfidf_matrix = tf.fit_transform(smd[‘description’])
Since the implementation of Tfidf is very simple but needs to have few improvements. The words like “hate” and “don’t hate” have a huge difference but still seems the same for the Tfidf.So how can we get out this?
Additionally, Using the concept of bigram or trigram where Tfidf helps us to create vectors in a pair or more which can differentiate the meaning when comes in such pairs.
Once having the vector of all the words we are now ready to step into the algorithm which will eventually tell us who all vectors are similar to each other.
Finally, we will create a function which will show top best-recommended movies on the given input. For this task, I have to build a micro-frame (flask) for making web services in Python.
Running the application.
Movies similar to “3 Idiots.”
Closing Note: I hope this blog will help you to build your recommendation system. In my coming blog, I’ll try to build a generic recommendation system using various embedding technique and neural network.
Anchit Jain -Technology Professional – Innovation Group – Big Data | AI | Cloud @YASH Technologies