Improve Content Discovery in Products through AI

This post was originally posted on Medium.


Avast majority of us in today’s digitally native world spend a considerable time of their day online. A huge portion of that time is vyed for by multiple social media platforms.

Loosely bundled as Media Platforms (Social or not) the values proposed by each of them varies:

- Twitter provides a way to ingest quick information and news - LinkedIn promises a valuable network and professional content - Instagram helps you discover great photos and be inspired - And Tiktok and Facebook provide the value of entertainment when you need to switch your brain off. - Medium is committed to exposing you to useful articles

The proposition that all these products make can be loosely mapped across a spectrum — one end being to serve as educational/informative and the other being entertainment. While squandering away our time on the entertainment, we expect not more than a few smiles. However, time spent on products on the other end of the spectrum — Twitter, LinkedIn, Medium expects a certain outcome — learning.

The more each product adds in favor of user needs, the more irreplaceable it becomes, and the higher engagement it drives. To this end, each of these Products tries to drive up their “Learning per user minute”.

It, hence becomes, crucial to products, to recommend content users like and gain value from.

Most products today use a combination of the following features of content (e.g post, article) to judge the relevance and expose them to correct users:

- The recency of content and whether it includes media/links - Profile of author of the content and your relation to the author (e.g strength of connection) - Your interaction with similar content (e.g Likes, Shares)

The entire mechanism of deciding relevance is heavily based upon the premise that you will like what other (similar) users on the platform have liked. The unfortunate consequence of the strategy, however, is:

- You never got a chance to view great content because not everyone/no one in your network appreciated it - Your choice to decide for yourself how content was, got taken away

The simple reason why your choice does not get respected is that a product contains too much content. Each piece is like a simple bag of words to it. The topic of content is barely understood, and the quality is judged using the Number of shares, like a proxy.

Better and more relevant content could be exposed to you if the product can additionally understand — A. What the content is about, and B. Quality ranking with respect to other pieces in the category.

The quality of content is, however, not a standalone characteristic. It is judged based on the topic of content, the depth into that topic, and the kind of arguments that have been made related to that content.


But, don’t the given titles indicate topics? Why is cataloging the theme a difficult thing to do?

The short answer to the first question is, No.

Content headlines are often a misnomer. They are at times intentionally misleading to account for better SEO, and get higher engagement. And then there are those crafty titles which mean and indicate nothing! So, how do we even know the topic of an article?

The first thought that comes to mind is — simple, use something like a word cloud. If the article contains a lot of “Product” then it is likely that the article is about Product. The frequency of a Term in a document, though very useful in a lot of problems, has limited utility here. Again, a single word doesn’t tell anything. We need more granularity.

Okay, So a combination of 2 words, 3 words, 4 words occurring together works? Yes, this would be more accurate, but still does not solve our problem. N-Gram model, which considers the relation between a sequence of words, can be used to understand the content better. But, we need something more. We need an in-depth topic.

To give you an example, labeling content in the “Product Management” category is good. But there are so many subtopics and so many different angles. The question that should be answered is — What concerning “Product Management” does the content talk about?

The theme in “Product Management” might be talking about:

  • How is AI product Management different?

  • Time management for Product Managers

  • Vision is important for a Product Manager

  • How to execute a Product Strategy

The list of topics is almost never-ending.

Two ways or a mix could be used to address the same:

  • A way to judge the in-depth Topic.

  • A Text-summary of a piece is a good indicator of its intention

Let’s talk Topics

Today, a variety of AI models talk about the automatic classification of a database of chats, and messages into Topics. The approach uses: - Unsupervised topic modeling to identify topics within a corpus of textual data - Predict the topic of a new content using a supervised classification algorithm

Read to know more about how topics are detected for a database of chats. A similar approach could be used to judge the “Topic” of content. There are various models including Latent Dirichlet Allocation (LDA), Explicit semantic analysis, which could be further developed to support the use case.

Credits:Christine Doig


A content summary would be great too! If, a 10k word article could be represented into a single or few lines, that would be fairly indicative of what the author wants to convey. A topic of content could also be derived from, or considered similar to the content summary. Text Summarization is getting better and better every day.


How do we judge the quality of that content with respect to all the others available on the same topic?

The quality of a piece can be judged based on multiple criteria including the number of arguments made and the strength of each, the diversity of perspectives considered, the language, and many more. Few of these attributes are subjectively judged while others stay the same across users. Well, most products do not use objectively judge content right now, but they should. A platform should always have an opinion of the content it carries.

There have been interesting breakthroughs in AI that could be leveraged to judge the quality of content. Some of them are: - Understanding the relevance of content to associated topic of an article - Discover the structure/absence of it in a document - Machine Comprehension will help in understanding the topic in-depth and provide some key points a content means to answer

Work in on-way to “develop metrics for Evaluating the quality of narrative thread extracted from news stories” A similar approach to judge the quality of contents could be developed.


Now that the Product knows the content, can we give users more freedom to select the best content?

All of us Product Managers strike a very difficult, but delicate balance of how much to ask from users. Too many choices/inputs could cost us our users. This is exactly why the 5-star rating and NPS mechanisms of feedback are popular and thriving.

But another conscious choice we need to make is — at what point do we say that we are taking too many calls for users? Demarcating between — giving users what they like Vs leading users to what we want them to like (Also, what is available on the platform). There is no right or wrong answer here. There are just opinions. But it would be great, if the calls that we do indeed make for users work in their favor and align with what they want.

The above-outlined approach would help users — discover better quality content and stay more engaged with a Product.

If you like this article and want more, follow me on Twitter and connect with me on LinkedIn here.

5 views0 comments

Recent Posts

See All