Technology

How a recommendation system suggests anything relevant based on the user interest

By: | Anchit Jain

Publish Date: March 28, 2019

How a recommendation system suggests anything relevant based on the user interest?
For a long time, I have been thinking about how shopping websites like Flipkart or Amazon or movie-based platforms like Netflix or even on medium suggest anything based on user interest.
But things are very simple. Unlike my other blogs, this will be little short and will suffice enough to brief you about “recommendation system” and of course with working code.
In this blog post, I will build a movie recommendation system using The movies dataset and deploy it using Flask.
I hear what Google has to say about it.
A recommender system or a recommendation system (sometimes replacing “system” with a synonym such as platform or engine) is a subclass of information filtering system that seeks to predict the “rating” or “preference” a user would give to an item
and putting in a simple language “a recommendation system suggest anything relevant based on the user interest
The recommendation system is classified into two types Content-Based and Collaborative based recommendation system. Let’s try to understand each one by one.
The idea behind Content-based (cognitive filtering) recommendation system is to recommend an item based on a comparison between the content of the items and a user profile. In simple words, I may get a recommendation for a movie based on the description of other movies.

content based filtering

The theory behind collaborative filtering to work with collaboration with user or movie id. For example, there are two user A and B, user A likes movie P, Q, R, S and user B like movies Q, R, S, T. Since movies Q, R and S are similar to both user, therefore, movie P will be recommended to user B, and movie T will be recommended to used A.

collaborative filtering

We are starting with the understanding of data first. I have used The Movies Dataset. This dataset has metadata on over 45,000 movies and 26 million ratings from over 270,000 users. For our purpose, we will be using movies_metadata.csv and links_small.csv . This dataset describes the one-to-many relationship among the user and ratings. Before we dive in code lets try to figure out our approach towards the solution. For the ease of understanding, I have tried my hands on both (content and collaborative ) filtering approach on the same data set.
For content-based filtering, the approach is relatively simple we have to just convert the words or text in vector form and to find the closest recommendation to our given movie input title using cosine similarity
Let’s begin with the code.

  1. Reading the dataset from google drive into the data frame. Deleting some absurd data and looking for only those movies Id’s that are present in links_small dataset( look up for movies metadata) and an important part is to merge all the metadata into one which in our case are “overview” and “tagline.”
    # mount your drive
    from google.colab import drive
    drive.mount(‘/content/drive’)
    # read the CSV file
    md = pd. read_csv(‘drive/My Drive/Colab Notebooks/Movie_recommendation/movie_dataset/movies_metadata.csv’)
    # dropping rows by index
    md = md.drop([19730, 29503, 35587])
    #performing look up operation on all movies that are present in links_small dataset
    md[‘id’] = md[‘id’].astype(‘int’)
    smd = md[md[‘id’].isin(links_small)]
    smd.shape
    smd[‘tagline’] = smd[‘tagline’].fillna(‘ ‘)
    smd[‘tagline’]
    # Merging Overview and title together
    smd[‘description’] = smd[‘overview’] + smd[‘tagline’]
    smd[‘description’] = smd[‘description’].fillna(‘ ‘)

     
  1. Once gathering all data as per our need we have chosen TF-IDF to create the vectorizer of our words. The reason behind choosing this algorithm is to give less weight to the words that are frequently occurring example (the, is, etc.).When calculating the term frequency, we divide the total number of words in the document so that longer documents do not have a greater influence than shorter documents.
    from sklearn.feature_extraction.text import TfidfVectorizer
    tf = TfidfVectorizer(analyzer=’word’,ngram_range=(1, 3),min_df=0, stop_words=’english’)
    tfidf_matrix = tf.fit_transform(smd[‘description’])

Since the implementation of Tfidf is very simple but needs to have few improvements. The words like “hate” and “don’t hate” have a huge difference but still seems the same for the Tfidf.So how can we get out this?
Additionally, Using the concept of bigram or trigram where Tfidf helps us to create vectors in a pair or more which can differentiate the meaning when comes in such pairs.
Once having the vector of all the words we are now ready to step into the algorithm which will eventually tell us who all vectors are similar to each other.

# Cosine similarity
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
smd = smd.reset_index()
titles = smd[‘title’]
# finding indices of every title
indices = pd.Series(smd.index, index=titles)

 
Finally, we will create a function which will show top best-recommended movies on the given input. For this task, I have to build a micro-frame (flask) for making web services in Python.

# function that returns the 30 most similar movies based on the cosine similarity score.
from flask import Flask
app = Flask(__name__)
@app.route(“/”)
def main():
title = request.args.get(‘movie’)
idx = indices[title]
print(“Index”,idx)
similar_scores = list(enumerate(cosine_sim[idx]))
similar_scores = sorted(similar_scores, key=lambda x: x[1], reverse=True)
similar_scores = similar_scores[1:6]
movie_indices = [i[0] for i in similar_scores]
output = []
for item in titles.iloc[movie_indices]:
output.append(item)
return json.dumps(output)
if __name__ == “__main__”:
app.run()

 
Running the application.

  1. Navigate to the folder of your code.
  2. Once you’re in your project directory, run the Flask application by the command python predict.py
  3. If all went good, you would see the following line on your terminal.
  4. Copy paste this URL to your web browser, and you are all set to see the output.
    web browser

Movies similar to “3 Idiots.”
Closing Note: I hope this blog will help you to build your recommendation system. In my coming blog, I’ll try to build a generic recommendation system using various embedding technique and neural network.
Anchit Jain -Technology Professional – Innovation Group – Big Data | AI | Cloud @YASH Technologies

Anchit Jain
Anchit Jain

Anchit Jain -Technology Professional – Innovation Group – Big Data | AI | Cloud @YASH Technologies

Related Posts.

Pioneering Innovations and Their Transformative Impact
Innovation North America , Technology , Technology And Innovation
Empowering Women In Tech , Technology
Canada’s Thriving Technology Ecosystem
Canada Thriving Technology , Canada’s Technology
Microsoft , Microsoft Azure
IoT Blog
Embedded System Networks , Internet Of Things , IoT
RPA , RPA Adoption , RPA Advantages
RPA Blog
Banking On RPA , Delivering RPA Pilot , RPA , RPA Adoption , RPA Advantages
Analytics , Application Support , Artificial Intelligence , Cloud Computing , DevOps , Enterprises Modernization
Machine Learning , ML , S/4HANA , SAP , Technology
Applications , Mobile First , SAP , SAP Fiori , Technology

A fiery evolution of Fiori

Srihari Tummala

Apache , Cordova Plugin , Mobile Development Framework , Mobility , Technology