Parth's Portfolio

Methodology

1. Fetching Tweets

Twitter API was used to collect the most recent vaccination related tweets, from 8th to 15th August 2021, using relevant hashtags. CoVaxxy web dashboard developed by Verna, 2021 was used to loosely identify pro and anti vaccination tweets. #getvaccinated was used to fetch tweets leaning pro vaccination. #vaccineinjury, #vaccinesideeffects, #vaccineskill, #nomandatoryvaccine, #covidvaccinescam, #leaveourkidsalone, #notovaccinepassports, #mybodymychoice - hashtags were used to fetch tweets leaning anti vaccination. Around 2500 tweets were collected from either of spectrums.

2. Cleaning and Preparing Tweets for Clustering

Original tweet had urls, user mentions, hashtags, emojis, special characters, mixed letters. All these were removed to reduce complexity and tweet was converted into tidy text. To further reduce complexity, short words and stop words were also removed. At the end text was converted into group of words, called bag of words. This text is further standardized by lemmatization and stemming.

3. Clustering for Sentiment Analysis

Two types of clustering algorithms were used to divide tweets into different sentiments, NMF - Non negative Matrix Factorization and LDA Latent Dirichlet Allocation with the help of sklearn and gensim packages for python. Tweet tokens were vectorized in order to apply clustering. Term Frequency - TF was to feed to LDA algorithm v/s term frequency - inverse document frequency vectorizations - TF-IDF was feed to NMF algorithm.

NMF Clustering

Topics were divided into pro, anti and nuetral sentiments based on the words in given tweet

Words	Sentiment
vaccin covid covidvaccin getvaccin pfizer fulli vaccineswork get effect peopl	Neutral
mybodymychoic freedom choic bodi right want mandat say abort peopl	Anti Vaccination
getvaccin wearamask mask covid peopl children kid amp wear maskup	Pro Vaccination

LDA Clustering

Topics were divided into pro, anti and neutral sentiments based on the words in given tweet

Words	Sentiment
vaccin getvaccin covid amp peopl day year today like wearamask	Neutral
vaccin mybodymychoic covid peopl freedom right choic novaccin getvaccin mandat	Anti Vaccination
getvaccin covid wearamask vaccin mask school kid children peopl maskup	Pro Vaccination

Bot Assessment

I used BotOrNot package developed by Mkearney to to get a bot probability score for every user retweeting given tweet. This exercise was done on top 500 most popular tweets. Up to 100 user retweeting particular tweet were considered to calculate the bot score. At the end, this gave ratio of bots vs humans who liked the given tweet.

Sentiment Analysis of Vaccination Related Tweets and Relationship to Social Bots