The models cover our research question 1, 2 and 5. The remaining research questions do not require modelling and can be answered with some statistical analysis and visualization techniques.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
import scipy.stats as stats
from sklearn.feature_extraction.text import CountVectorizer
from wordcloud import WordCloud
df=pd.read_csv("featured_reviews.csv", parse_dates=["timestamp_created"])
df.columns
Index(['game_name', 'review', 'voted_up', 'timestamp_created', 'author_num_games_owned', 'author_num_reviews', 'author_playtime_at_review', 'author_playtime_last_two_weeks', 'author_playtime_forever', 'review_length', 'difficulty_word_count', 'mentions_difficulty', 'open_world', 'competitive', 'puzzle', 'multiplayer', 'fantasy', 'rpg', 'platformer', 'simulation', 'third_person', 'first_person', 'base_building', 'turn_based', 'crafting', 'soulslike', 'action', 'roguelike', 'adventure', 'metroidvania', 'co_op', '2d', 'crpg', 'sandbox', 'deckbuilding', 'survival', 'strategy', 'shooter', 'experience_level_experienced', 'experience_level_intermediate', 'sentiment_score'], dtype='object')
Research Question 3
Which genres are most associated with mentions of difficulty in reviews?genre_columns=['open_world', 'competitive', 'puzzle',
'multiplayer', 'fantasy', 'rpg', 'platformer', 'simulation',
'third_person', 'first_person', 'base_building', 'turn_based',
'crafting', 'soulslike', 'action', 'roguelike', 'adventure',
'metroidvania', 'co_op', '2d', 'crpg', 'sandbox', 'deckbuilding',
'survival', 'strategy', 'shooter']
# Calculate difficulty mentions for each genre
genre_difficulty_mentions = {
genre: df[df[genre] == 1]['mentions_difficulty'].mean()
for genre in genre_columns
}
genre_difficulty_mentions = sorted(genre_difficulty_mentions.items(), key=lambda x: x[1], reverse=True)
genres, proportions = zip(*genre_difficulty_mentions)
colors = plt.cm.magma(np.linspace(0.2, 0.8, len(proportions)))
plt.figure(figsize=(10, 8))
bars = plt.barh(list(genres), list(proportions), color=colors, alpha=0.9)
for bar in bars:
plt.text(bar.get_width() + 0.01, bar.get_y() + bar.get_height()/2,
f'{bar.get_width():.4f}', va='center', fontsize=9)
plt.title('Proportion of Reviews Mentioning Difficulty by Genre', fontsize=14, pad=15)
plt.xlabel('Proportion Mentioning Difficulty', fontsize=12)
plt.ylabel('Genre', fontsize=12)
plt.gca().invert_yaxis()
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
For "Co_op" and "Metroidvania" genres more than 18% of the reviews use atleast one word related to difficulty. "multiplayer" and "competitive" genres have the least proportion of reviews having words related to difficulty, with around 8% and 7% respectively.
Research Question 4
How does the sentiment score distribution differ between genres?# Calculate average sentiment score for each genre
genre_sentiment = {
genre: df[df[genre] == 1]['sentiment_score'].mean()
for genre in genre_columns
}
# Sort genres by sentiment score
genre_sentiment = sorted(genre_sentiment.items(), key=lambda x: x[1], reverse=True)
# Extract genres and their corresponding average sentiment scores
genres, avg_sentiments = zip(*genre_sentiment)
plt.figure(figsize=(10, 8))
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(avg_sentiments)))
sizes = [50 + 200 * sentiment for sentiment in avg_sentiments]
plt.scatter(avg_sentiments, genres, s=sizes, c=colors, alpha=0.8, edgecolors='k')
for i, sentiment in enumerate(avg_sentiments):
plt.text(sentiment + 0.01, i, f'{sentiment:.4f}', va='center', fontsize=9)
plt.title('Average Sentiment Score by Genre', fontsize=14, pad=15)
plt.xlabel('Average Sentiment Score', fontsize=12)
plt.ylabel('Genre', fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
"competitve" and "multiplayer" genres have the lowest average sentiment scores for the reviews, with 0.18 and 0.21 respectively. "crpg" and "fantasy" have the highest with 0.47 and 0.46 respectively.
Research Question 6
Does the player's experience level (beginner vs. experienced) affect their sentiment towards game difficulty?experience_data = df.melt(
id_vars=['experience_level_experienced', 'experience_level_intermediate'],
value_vars=['sentiment_score'],
var_name='Metric',
value_name='Sentiment Score'
)
experience_data['Experience Level'] = 'Beginner'
experience_data.loc[df['experience_level_experienced'] == 1, 'Experience Level'] = 'Experienced'
experience_data.loc[df['experience_level_intermediate'] == 1, 'Experience Level'] = 'Intermediate'
plt.figure(figsize=(8, 6))
boxplot = sns.boxplot(
x='Experience Level',
y='Sentiment Score',
data=experience_data,
palette="Set3"
)
groups = experience_data.groupby('Experience Level')['Sentiment Score']
for i, experience_level in enumerate(groups.groups):
group_values = groups.get_group(experience_level)
median = group_values.median()
q1 = group_values.quantile(0.25)
q3 = group_values.quantile(0.75)
plt.text(
i, median, f'{median:.2f}',
horizontalalignment='center', color='black', weight='bold'
)
plt.title('Distribution of Sentiment Scores by Experience Level', fontsize=14, pad=15)
plt.ylabel('Sentiment Score', fontsize=12)
plt.xlabel('Experience Level', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
C:\Users\aniru\AppData\Local\Temp\ipykernel_21572\2609910350.py:14: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect. boxplot = sns.boxplot(
All experience levels have the same median sentiment score, with slightly different upper quartile scores. We will perform ANOVA test to determine if there's statistically significant difference in these scores.
from scipy.stats import f_oneway
beginner_scores = experience_data[experience_data['Experience Level'] == 'Beginner']['Sentiment Score']
experienced_scores = experience_data[experience_data['Experience Level'] == 'Experienced']['Sentiment Score']
intermediate_scores = experience_data[experience_data['Experience Level'] == 'Intermediate']['Sentiment Score']
# ANOVA test for all three groups
f_stat, p_value = f_oneway(beginner_scores, experienced_scores, intermediate_scores)
print(f"ANOVA Test Statistic: {f_stat:.4f}, P-value: {p_value}")
ANOVA Test Statistic: 84.6515, P-value: 2.0367702092444877e-37
The test suggest there is a significant enough difference in scores between the experience levels. We can do a Tukey's test to analyse the groups further.
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Prepare data for Tukey's test
tukey_data = experience_data[['Sentiment Score', 'Experience Level']]
# Perform Tukey's HSD test
tukey = pairwise_tukeyhsd(
endog=tukey_data['Sentiment Score'],
groups=tukey_data['Experience Level'],
alpha=0.05
)
print(tukey)
Multiple Comparison of Means - Tukey HSD, FWER=0.05 ============================================================== group1 group2 meandiff p-adj lower upper reject -------------------------------------------------------------- Beginner Experienced 0.0704 0.0 0.0569 0.0839 True Beginner Intermediate 0.0651 0.0 0.0511 0.0792 True Experienced Intermediate -0.0053 0.5772 -0.0176 0.0071 False --------------------------------------------------------------
From this test we can see that there is enough difference in sentiment scores between beginner and experienced players, and beginner and intermediate players, but there isn't statistically enough difference between the scores of intermediate and experienced players.
Research Question 7
How does the number of games owned by a player correlate with their review sentiment?df_filtered = df[['author_num_games_owned', 'sentiment_score']]
# Correlation: Calculate Pearson correlation coefficient
correlation, p_value = stats.pearsonr(
df_filtered['author_num_games_owned'],
df_filtered['sentiment_score']
)
print(f"Pearson Correlation: {correlation:.4f}, P-value: {p_value}")
plt.figure(figsize=(10, 6))
sns.scatterplot(
x='author_num_games_owned',
y='sentiment_score',
data=df_filtered,
alpha=0.6,
color="teal"
)
sns.regplot(
x='author_num_games_owned',
y='sentiment_score',
data=df_filtered,
scatter=False,
color="darkblue",
line_kws={'label': f'y={correlation:.2f}'}
)
plt.title('Correlation Between Number of Games Owned and Review Sentiment', fontsize=14)
plt.xlabel('Number of Games Owned', fontsize=12)
plt.ylabel('Sentiment Score', fontsize=12)
plt.grid(alpha=0.5)
plt.show()
Pearson Correlation: 0.0110, P-value: 0.022958515545024167
It is clear from the analysis that there is virtually no correlation between the number of games authors own and the sentiment scores of their reviews.
Research Question 8
Do reviews mentioning difficulty tend to be more positive or negative overall?# Filter reviews mentioning difficulty
difficulty_reviews = df[df['mentions_difficulty'] == 1]
positive_reviews = difficulty_reviews[difficulty_reviews['voted_up'] == 1]
negative_reviews = difficulty_reviews[difficulty_reviews['voted_up'] == 0]
sns.histplot(positive_reviews['sentiment_score'], color='green', label='Positive', kde=True)
sns.histplot(negative_reviews['sentiment_score'], color='red', label='Negative', kde=True)
plt.legend()
plt.title('Sentiment Score Distribution for Difficulty Mentions')
plt.show()
We can see that majority of the reviews mentioning difficulty tend to be positive, with sentiment scores on the higher side.
Research Question 9
Are difficulty mentions more common in reviews with shorter playtimes compared to longer playtimes?df['playtime_bin'] = pd.cut(df['author_playtime_forever'], bins=[0, 5, 20, 50, 100, float('inf')],
labels=['<5 hours', '5-20 hours', '20-50 hours', '50-100 hours', '>100 hours'])
difficulty_playtime = df.groupby('playtime_bin')['mentions_difficulty'].mean().reset_index()
stick_color = "gold"
head_colors = plt.cm.cool(np.linspace(0.2, 0.8, len(difficulty_playtime)))
plt.figure(figsize=(10, 6))
for index, row in difficulty_playtime.iterrows():
plt.plot([index, index], [0, row['mentions_difficulty']], color=stick_color, alpha=0.8, linewidth=2)
plt.scatter(index, row['mentions_difficulty'], color=head_colors[index], s=100, zorder=3)
for index, row in difficulty_playtime.iterrows():
plt.text(index, row['mentions_difficulty'] + 0.01, f'{row["mentions_difficulty"]:.2%}',
ha='center', va='bottom', fontsize=10, color='black')
plt.ylim(0, 0.3)
plt.title('Proportion of Difficulty Mentions by Playtime', fontsize=14, pad=15)
plt.xlabel('Playtime Bin', fontsize=12)
plt.ylabel('Proportion of Reviews Mentioning Difficulty', fontsize=12)
plt.xticks(range(len(difficulty_playtime)), difficulty_playtime['playtime_bin'], rotation=45, fontsize=10)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
C:\Users\aniru\AppData\Local\Temp\ipykernel_21572\1437077945.py:4: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning. difficulty_playtime = df.groupby('playtime_bin')['mentions_difficulty'].mean().reset_index()
Around 25% of the reviews mentioning review tend to have less than 5 hours of total playtime. It seems like with increasing playtime, the proportion of reviews mentioning difficulty tends to decrease.
Research Question 10.
What are the most commonly expressed words in negative reviews mentioning difficulty?# Filter negative reviews mentioning difficulty
negative_difficulty_reviews = df[(df['mentions_difficulty'] == 1) & (df['voted_up'] == 0)]['review']
vectorizer = CountVectorizer(stop_words='english', max_features=50)
word_freq = vectorizer.fit_transform(negative_difficulty_reviews)
word_freq_df = pd.DataFrame(word_freq.toarray(), columns=vectorizer.get_feature_names_out())
word_cloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq_df.sum().to_dict())
plt.figure(figsize=(10, 6))
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis('off')
plt.title('Most Common Words in Negative Reviews Mentioning Difficulty')
plt.show()
Similar to our previous world clouds, words like "game", "play", "time" and "like" are common here.