The data on social platform are growing everyday.
How can we use such data to discover “What is happening” on Facebook or Twitter everyday or even every hour?
We can use the topic modeling for this problem:
– INPUT: A dataset of Facebook’s posts of the most popular fanpages in Vietnam at a specific time.
– OUTPUT: A set of top hottest topics that are the most popular on Vietnamese Facebook at that time and their keywords.
For this problem, we use the LDA Topic Modeling Technique to solve the challenge.
LDA is a probability model that find the pattern topic distribution in a corpus (a set of documents).
In October 2019, we discovered these are the hottest topics that Vietnamese Facebook-er were talking about the most:
We also can present the fanpages by vectorizing them, and compare them to look for the similarities:
Here is an comparison between pages base on their content in Oct-2019. The more blue a square is, the more similar two fanpages are:
With a proper crawler, an improved Topic Modeling technique, we can track the hot topics on Facebook or any social network in every hour or even every second.