Building Recommendation Engines for Platforms

prelogin and postlogin

Posted by Katherine Li on June 30, 2018

Different Types of Intelligent Recommendation Engines

You might have noticed or even purchased some items recommended by Amazon. Actually, there are several types of recommender systems on the website, before and after your logging in.

Pre-Login : Similar Items Recommendation

When you searched for a MacBook with Touch Bar on Amazon, it will list all the similar laptops for you to make a purchase decision. This recommender doesn’t need your purchase history.

Pre-Login : Customers Who Bought This Also Bought Based on other customers’ purchase history, this system will recommend related products, accessories in this case. Since if a customer bought a MacBook with Touch Bar, he probably won’t purchase another Mac, but an adapter instead.

Post-Login: Peronalized Preference Recommender After your logging in, Amazon can leverage your purchase history to recommend items for you, which is more personalized.

Inspired by Amazon, I am going to generate three types of recommenders for a course platform.

Building out AI Recommendation System

Pre-Login : Similar Items Recommendation

Content-Based Recommender System

To recommend similar courses to the course you searched for, I built a content-based recommender system by comparing the similarities of different courses based on their features (as listed below). For example, if two courses share a similar course summary or are under the same categories (Finance course), they should have higher similarity score and then be cross-recommended. Therefore, the similarity matrices of different course’ features can be generated by using different data science algorithms. I gave the importance weights to each matrix/ feature based on their business importance by discussing with the business and calculated the weighted sum matrix.

[Course Summary Similarity] Word Embedding (TF-IDF) + Cosine Similarity

Just a bit data science algorithm introduction about how to calculate the similarity between two course summaries. The first step is natural language preprocessing, removing punctuations, stopwords, space, stemming, etc.

The second step is to map each sentence to a vector and I have tried two methods. Starting with the word embedding, I leveraged a Pre-trained Doc2Vec developed by the Google team which was trained on a shallow neural network. But I found out another method TF-IDF (Term Frequency-Inverse Document Frequency) has a better performance in terms of summary comparison. Basically, if two summaries share similar words they are considered more similar, especially if the words are rare to see in all summaries, other than the common ‘a, the’.

[Key Word Similarity] Word Embedding (GLOVE) + Multi Label Similarity

[Other Feature Similarity] such as brand, category, media format, etc

For data confidential issue, I won’t post my code and dataset for this recommender engine. But that’s a brief overview about my similarity algorithm.

Pre-Login : Customers Who Bought This Also Bought

Collaborative Filtering Recommender

The common way to construct a ‘Customers who bought this item also bought’ algorithm is building a co-occurrence matrix. How does a co-occurrence matrix work? Let’s take an example. We have a user history data, indicating Alice has taken course 1, 2, 3, Charles for 3, and Bod for 3 and 4.

To compute a co-occurrence matrix based on table above, we count that for each course pair (let’s say course 1 and course 3), how many users have taken both those courses. The value for this pair is 2 people (Alice and Charles), and we fill that value into the matrix.

And then we do some normalization (I will skip those steps) to get the Normalized one. We could make a recommendation to a user based on that matrix. For example, if a user searched for course 3, then for the row course 3, the highest value will be 0.67 of course 1. Therefore, course 1 will be first recommended. Basically, the logic is that ‘Customers who bought this item also bought’.

I have posted partial code of that algorithm and generated a fake user history dataset to run it. click here to view the code ->JupyterNotebook

Post-Login: Peronalized Preference Recommender

Collaborative Filtering Recommender System

Sharing the same logic with the previous one, this model will be based on personal history data. For example, a user Katherine has taken course 2 and 3. Back to the co-occurrence matrix, I add up those two rows vertically to get a sum. Based on the value of the sum, I found course 4 has a higher value thus it will rank 1 in the recommender system. (The courses a user has taken will not be recommended.)