The Science -and Art- of Matchmaking: AI vs. Collaborative Filtering
The importance of Matchmaking can never be exaggerated in today’s connected world. Whether it is for recommending a new product to buy, a new video to watch, a book to read, or even some complete stranger to fall for!
Traditionally, matchmaking has been done by Collaborative Filtering techniques such as Matrix Factorisation.
In online shopping applications for instance, matrix factorisation transforms both items and users to the same latent space, which represents the underlying interactions between users and items. The intuition behind matrix factorisation is that the latent features represent how users rate items. Given the latent representations of the users and the items, we can then make predictions of how much the users will like items they have not yet rated (https://buildingrecommenders.wordpress.com/2015/11/18/overview-of-recommender-algorithms-part-2).
A notable limitation of Collaborative Filtering techniques is that they can’t generate accurate predictions for users that do not interact with products (i.e. by liking products or adding them to a wish list, etc.). The same is true for social media platforms when users do not connect with others. In this case, the matchmaking would be unable to generate connections of interest to the user.
The field of Natural Language Processing (NLP) has witnessed significant improvement in recent years, thanks to the massive amount of available data, coupled with increase in computing power and efficiency.
One challenge associated with NLP is data representation. The answer to this challenge is to use what is known as Vector Representation of Words, or Word2Vec for short. Word2Vec, as the name suggests, converts plain text into vectors. However, the usefulness of this algorithm is that similar words are placed near each other within the vector. The vectors can then be embedded with other data and fed into an AI model, perhaps an ANN to perform high quality prediction or classification. As an example, YouTube has recently replaced the Matrix Factorisation based recommender system by an Artificial Neural Network to predict watch time and also to avoid a phenomena known as “click-bait” (see https://research.google.com/pubs/pub45530.html).
Simple Solution to a Complex Problem
Well, not that simple but you get the point! I have been asked to propose a solution to connect users in a social media network based on vague criteria. Simply put, the criteria was unclear due to lack of quality training data. Most of the input data was in clear text (user profile, story, goal, etc.) making it even harder to analyse.
We proposed a two-fold solution:
1. Convert text data into vectors and embed with other numerical and categorical input data
2. Train a k-Means cluster to group users with similar profiles
And that’s it. Other users within the same cluster are presented as potential matches, sorted by Euclidean Distance. The proposed solution may not break any world record, but given the data (with lack of targets), supervised algorithm was not an option.