FOR DEVELOPERS

A Guide to Content-Based Filtering In Recommender Systems

Content Based Filtering in Recommender System

Recommender systems play a crucial role in our online lives. Popular media and service platforms like Netflix, YouTube, Amazon, Facebook, etc., spend a significant percentage of revenue to deliver quality and personalized adverts and recommendations to drive sales and engagement. Users and customers benefit as well as they can buy products suited to their tastes and discover new, relevant ones. There are two types of recommender systems used for this: collaborative and content-based filtering. In this article, we’ll be looking at both and focusing on the content-based filtering algorithm.

Importance of using recommender systems

When there are numerous options to choose from, it’s natural to be confused, whether it’s selecting a flavor of ice cream or a model of headphones. Recommendation systems help by eliminating the options that do not align with our taste or past behavior. The more they have access to our purchasing history and patterns, the more accurate the recommendations are.

A downside to this approach can be a lack of good suggestions for new customers since the system has no previous data on their habits. To tackle this situation, other methods can be used such as explicitly asking the customer what type of content they want to view or suggesting items that are popular in their geographical location or age.

There are two types of recommender systems:

  1. Collaborative filtering
  2. Content-based filtering

Collaborative filtering

Collaborative filtering-based recommender systems solely rely on past interactions between users and items in order to suggest new products. The features of every individual item are not considered.

In collaborative filtering, the historical data of the user interacting with the items is recorded and stored. This is usually represented by a matrix known as user-item interaction matrix, where rows represent users and columns represent the items. Similar users are grouped and all their interactions are considered when making recommendations to the target user.

Collaborative filtering can be subdivided into two more groups: memory-based approach and model-based approach.

Memory-based collaborative approach

Memory-based approach relies solely on the user-item interaction matrix and mathematical calculations to find nearest neighbors and suggest new items. No machine learning (ML) models are used.

Model-based collaborative approach

An underlying model is used to presuppose the interactions. This model is later tuned and used to rank items the user has not interacted with yet. Items with a higher compatibility score are recommended to the user.

Content-based filtering

Content-based filtering in recommender systems leverages machine learning algorithms to predict and recommend new but similar items to the user. Recommending products based on their characteristics is only possible if there is a clear set of features for the product and a list of the user’s choices.

The recommender system stores previous user data like clicks, ratings, and likes to create a user profile. The more a customer engages, the more accurate future recommendations are.

To understand this, let’s use a simple example of how a content-based recommender system might work to suggest movies.

Let’s suppose there are four movies and a user has seen and liked the first two.

Pictorial representation of content-based filtering in recommender system.webp

The model automatically suggests the third movie rather than the fourth, since it is more similar to the first two. This similarity can be calculated based on a number of features like the actors and actresses in the movie, the director, the genre, the duration of the film, etc.

Important terms

Utility matrix

A utility matrix contains the interaction information between the user and the preferred items. Data gathered from the day-to-day activities of the user is saved in a structured format to find the likes and dislikes of different items the user has interacted with. A value is assigned to every interaction, known as the ‘degree of preference’.

Example of content-based filtering.webp

A few values are missing in the above example of a utility matrix. This is because some users do not interact with every item available on the platform. Note that the goal of the recommender model is to suggest new items based on this utility matrix.

User profile

A user profile is the collection of vectors that define a user’s preferences. The profile is based on the activities and tastes of the user; for example, user ratings, number of clicks on different items, thumbs up or thumbs down on content, etc. This information helps the recommender engine to best estimate newer suggestions.

Item profile

For content-based filtering, we require the different features of every individual item to represent their essential qualities. Going back to the movie example, some necessary attributes of movies that will help the recommender system distinguish between them are actors and actresses, director, year of release, genre, IMDb ratings, etc.

There are generally two popular methods used in content-based filtering: cosine distance and classification approach.

Cosine distance

Here, the cosine distance between the user and item vectors is used to determine preference. Let’s understand with an example: Our target user enjoys watching action movies and somewhat dislikes horror and thrillers. The vector for action movies has positive values and the vector for horror movies has a negative value for that particular user.

Now, consider a new movie released in the sci-fi action genre. Since our user prefers action movies, the cosine angle between the movie vector and the user vector will be a large positive fraction, resulting in a smaller angle which means it's a good recommendation for our user. If the cosine distance is large, we generally ignore the item since it's a bad recommendation.

Classification approach

Classification algorithms like Bayesian classifiers or decision tree models can be used to make recommendations. For example, every level of a decision tree can be used to filter out the various preferences of the user to make a more refined choice.

Content-based filtering: Advantages and disadvantages

Advantages

  • It is easily scalable to a large number of customers since the data of other users is not required for recommending something to a particular user.
  • Since the recommendations are based on the day-to-day activities of the user, all the preferences and parameters of the suggestions are finely tuned to the user’s choice. Therefore, the model can recommend specific niche items that other users might not be interested in.
  • The latest items can be suggested as soon as they are launched, without waiting for a census, since the features are readily available from the start.

Disadvantages

  • Building a content-based recommender engine requires a lot of domain knowledge since the feature selection of the items is mostly hard-coded into the system. Thus, the model is only as good as the knowledge of the one building it.
  • The model can recommend new items based on the present interest of the user. Hence, discovering and expanding to newer avenues that might interest the user is not possible.
  • The cold start problem is a significant drawback since the engine does not have sufficient information about a new user to start making suggestions.
  • It is hard to make new recommendations to not-so-active users.

Collaborative filtering vs content-based filtering for recommender systems

  • Content-based filtering methods require quite an amount of information about an item’s features, rather than its interactions with the user. For products like clothes, these features can be size, color, brand, material, etc., or in the case of movies, actors, genre, director, year of release, etc.
  • Collaborative filtering, on the other hand, uses historical interactions between the users and items to group users with similar tastes and suggest new items, which are popular to the group, to the target user.
  • Content-based filtering models are heavily based on domain knowledge since the item features are hand-engineered into the system. Collaborative filtering does not need such in-depth domain knowledge since all the embeddings are automatically learned.
  • Collaborative filtering systems require only the user behavior data, whereas content-based methods require both user and item data.

In this article, we discussed content-based filtering which is a type of recommender system. We also briefly touched on collaborative filtering, another class of recommender systems. We saw that the content-based approach employs two methods to make the suggestions: classification model approach and vector space, both of which have their advantages and disadvantages.

Recommender systems are used by organizations and companies to automate the process of suggesting new content and products to their consumer base. They are widely used in the current e-commerce/online business environment. The next time you are suggested something online that you seem to like, you know exactly how it ended up on your feed!

Author

  • A Guide to Content-Based Filtering In Recommender Systems

    Turing

    Author is a seasoned writer with a reputation for crafting highly engaging, well-researched, and useful content that is widely read by many of today's skilled programmers and developers.

Frequently Asked Questions

Content-based filtering uses item features to suggest additional products that are similar to what users already like by leveraging their past behavior or explicit feedback. It employs machine learning algorithms to group similar items together based on their intrinsic features.

Content-based filtering is generally used in recommender systems designed for companies offering various products, services, or content.

In the hybrid approach, both collaborative and content-based techniques are used to make recommendations. This helps the recommender engine to get the best of both worlds. Using the methods separately leads to several limitations like the cold start problem, lack of diversity in suggestions, etc. With a hybrid approach, these impediments are easily avoided and more accurate recommendations are possible.

Content-based recommender engines can operate using two methods. One employs a classification model while the other makes use of the vector spacing method. The classification approach uses machine learning models like decision trees, whereas the vector spacing method uses the distance between the user and item vectors to make suggestions.

View more FAQs
Press

Press

What’s up with Turing? Get the latest news about us here.
Blog

Blog

Know more about remote work. Checkout our blog here.
Contact

Contact

Have any questions? We’d love to hear from you.

Hire remote developers

Tell us the skills you need and we'll find the best developer for you in days, not weeks.