r/MMA Text Analysis: July 26
Introduction:
Using Reddit's API to scrape /r/MMA data is a good way to gather data on the sentiment and attention being paid to certain fighters, events, or ideas. Using the API is relatively simple, and only requires a few lines of code. I found a great video tutorial at the following link.
I scraped the ~1000 most recent r/MMA post titles, describing below commonly occurring phrases and keywords. I also used a basic sentiment classifier to find positively and negatively worded titles.
Findings:
The most common keywords (except for non-meaningful words like “UFC,” “vs,” etc):
spoiler (96 times)
McGregor (90 times)
264 (78 times)
Poirier (74 times)
Bellator (68 times)
Islam (39 times)
The most common phrases (or n-grams) are:
UFC 264 (71 times)
Conor McGregor (55 times)
Dustin Poirier (49 times)
Main Event (41 times)
Discussion Thread (36 times)
Islam Makhachev (26 times)
Using a sentiment classifier to classify post title text as magnitudes of positive or negative (e.g., very positive, mostly neutral, etc.), I found that the following titles are classified most negatively:
“Is It Bad For The UFC That Conor McGregor Lost To Dustin Poirier?”
“Brutal shot from UFC 264 prelim fight”
“Dangerous Conor is desperate McGregor coach Owen Roddy at UFC 264”
The titles classified most positively are:
“Love him or hate him Rise of Sean Strickland”
“Great interview with Dan Hooker prior to his fight with Al Iaquinta back in 2019”
“A great sincere interview from Kamaru Usman”
“Best of the Month on UFC FIGHT PASS The Future is Now”
Thoughts and Further Work:
LDA topic modeling, a more sophisticated sentiment classifier, and gathering data from other subreddits would lead to a better summary of the recent discourse online regarding MMA.