The models cover our research question 1, 2 and 5. The remaining research questions do not require modelling and can be answered with some statistical analysis and visualization techniques.

In [27]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import LinearSegmentedColormap
import scipy.stats as stats
from sklearn.feature_extraction.text import CountVectorizer
from wordcloud import WordCloud
In [3]:
df=pd.read_csv("featured_reviews.csv", parse_dates=["timestamp_created"])
In [4]:
df.columns
Out[4]:
Index(['game_name', 'review', 'voted_up', 'timestamp_created',
       'author_num_games_owned', 'author_num_reviews',
       'author_playtime_at_review', 'author_playtime_last_two_weeks',
       'author_playtime_forever', 'review_length', 'difficulty_word_count',
       'mentions_difficulty', 'open_world', 'competitive', 'puzzle',
       'multiplayer', 'fantasy', 'rpg', 'platformer', 'simulation',
       'third_person', 'first_person', 'base_building', 'turn_based',
       'crafting', 'soulslike', 'action', 'roguelike', 'adventure',
       'metroidvania', 'co_op', '2d', 'crpg', 'sandbox', 'deckbuilding',
       'survival', 'strategy', 'shooter', 'experience_level_experienced',
       'experience_level_intermediate', 'sentiment_score'],
      dtype='object')

Research Question 3

Which genres are most associated with mentions of difficulty in reviews?
In [6]:
genre_columns=['open_world', 'competitive', 'puzzle',
       'multiplayer', 'fantasy', 'rpg', 'platformer', 'simulation',
       'third_person', 'first_person', 'base_building', 'turn_based',
       'crafting', 'soulslike', 'action', 'roguelike', 'adventure',
       'metroidvania', 'co_op', '2d', 'crpg', 'sandbox', 'deckbuilding',
       'survival', 'strategy', 'shooter']
In [7]:
# Calculate difficulty mentions for each genre
genre_difficulty_mentions = {
    genre: df[df[genre] == 1]['mentions_difficulty'].mean()
    for genre in genre_columns
}

genre_difficulty_mentions = sorted(genre_difficulty_mentions.items(), key=lambda x: x[1], reverse=True)

genres, proportions = zip(*genre_difficulty_mentions)

colors = plt.cm.magma(np.linspace(0.2, 0.8, len(proportions)))

plt.figure(figsize=(10, 8))
bars = plt.barh(list(genres), list(proportions), color=colors, alpha=0.9)

for bar in bars:
    plt.text(bar.get_width() + 0.01, bar.get_y() + bar.get_height()/2,
             f'{bar.get_width():.4f}', va='center', fontsize=9)

plt.title('Proportion of Reviews Mentioning Difficulty by Genre', fontsize=14, pad=15)
plt.xlabel('Proportion Mentioning Difficulty', fontsize=12)
plt.ylabel('Genre', fontsize=12)
plt.gca().invert_yaxis() 
plt.grid(axis='x', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()
No description has been provided for this image

For "Co_op" and "Metroidvania" genres more than 18% of the reviews use atleast one word related to difficulty. "multiplayer" and "competitive" genres have the least proportion of reviews having words related to difficulty, with around 8% and 7% respectively.

Research Question 4

How does the sentiment score distribution differ between genres?
In [36]:
# Calculate average sentiment score for each genre
genre_sentiment = {
    genre: df[df[genre] == 1]['sentiment_score'].mean()
    for genre in genre_columns
}

# Sort genres by sentiment score
genre_sentiment = sorted(genre_sentiment.items(), key=lambda x: x[1], reverse=True)

# Extract genres and their corresponding average sentiment scores
genres, avg_sentiments = zip(*genre_sentiment)

plt.figure(figsize=(10, 8))
colors = plt.cm.viridis(np.linspace(0.2, 0.8, len(avg_sentiments)))
sizes = [50 + 200 * sentiment for sentiment in avg_sentiments] 

plt.scatter(avg_sentiments, genres, s=sizes, c=colors, alpha=0.8, edgecolors='k')

for i, sentiment in enumerate(avg_sentiments):
    plt.text(sentiment + 0.01, i, f'{sentiment:.4f}', va='center', fontsize=9)

plt.title('Average Sentiment Score by Genre', fontsize=14, pad=15)
plt.xlabel('Average Sentiment Score', fontsize=12)
plt.ylabel('Genre', fontsize=12)
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.tight_layout()

plt.show()
No description has been provided for this image

"competitve" and "multiplayer" genres have the lowest average sentiment scores for the reviews, with 0.18 and 0.21 respectively. "crpg" and "fantasy" have the highest with 0.47 and 0.46 respectively.

Research Question 6

Does the player's experience level (beginner vs. experienced) affect their sentiment towards game difficulty?
In [47]:
experience_data = df.melt(
    id_vars=['experience_level_experienced', 'experience_level_intermediate'],
    value_vars=['sentiment_score'],
    var_name='Metric',
    value_name='Sentiment Score'
)

experience_data['Experience Level'] = 'Beginner'
experience_data.loc[df['experience_level_experienced'] == 1, 'Experience Level'] = 'Experienced'
experience_data.loc[df['experience_level_intermediate'] == 1, 'Experience Level'] = 'Intermediate'

plt.figure(figsize=(8, 6))

boxplot = sns.boxplot(
    x='Experience Level', 
    y='Sentiment Score', 
    data=experience_data,
    palette="Set3"
)

groups = experience_data.groupby('Experience Level')['Sentiment Score']
for i, experience_level in enumerate(groups.groups):
    group_values = groups.get_group(experience_level)
    median = group_values.median()
    q1 = group_values.quantile(0.25)
    q3 = group_values.quantile(0.75)
    
    plt.text(
        i, median, f'{median:.2f}', 
        horizontalalignment='center', color='black', weight='bold'
    )
    

plt.title('Distribution of Sentiment Scores by Experience Level', fontsize=14, pad=15)
plt.ylabel('Sentiment Score', fontsize=12)
plt.xlabel('Experience Level', fontsize=12)
plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()
C:\Users\aniru\AppData\Local\Temp\ipykernel_21572\2609910350.py:14: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  boxplot = sns.boxplot(
No description has been provided for this image

All experience levels have the same median sentiment score, with slightly different upper quartile scores. We will perform ANOVA test to determine if there's statistically significant difference in these scores.

In [14]:
from scipy.stats import f_oneway

beginner_scores = experience_data[experience_data['Experience Level'] == 'Beginner']['Sentiment Score']
experienced_scores = experience_data[experience_data['Experience Level'] == 'Experienced']['Sentiment Score']
intermediate_scores = experience_data[experience_data['Experience Level'] == 'Intermediate']['Sentiment Score']

# ANOVA test for all three groups
f_stat, p_value = f_oneway(beginner_scores, experienced_scores, intermediate_scores)
print(f"ANOVA Test Statistic: {f_stat:.4f}, P-value: {p_value}")
ANOVA Test Statistic: 84.6515, P-value: 2.0367702092444877e-37

The test suggest there is a significant enough difference in scores between the experience levels. We can do a Tukey's test to analyse the groups further.

In [15]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Prepare data for Tukey's test
tukey_data = experience_data[['Sentiment Score', 'Experience Level']]

# Perform Tukey's HSD test
tukey = pairwise_tukeyhsd(
    endog=tukey_data['Sentiment Score'],
    groups=tukey_data['Experience Level'],
    alpha=0.05
)
print(tukey)
     Multiple Comparison of Means - Tukey HSD, FWER=0.05      
==============================================================
   group1      group2    meandiff p-adj   lower  upper  reject
--------------------------------------------------------------
   Beginner  Experienced   0.0704    0.0  0.0569 0.0839   True
   Beginner Intermediate   0.0651    0.0  0.0511 0.0792   True
Experienced Intermediate  -0.0053 0.5772 -0.0176 0.0071  False
--------------------------------------------------------------

From this test we can see that there is enough difference in sentiment scores between beginner and experienced players, and beginner and intermediate players, but there isn't statistically enough difference between the scores of intermediate and experienced players.

Research Question 7

How does the number of games owned by a player correlate with their review sentiment?
In [18]:
df_filtered = df[['author_num_games_owned', 'sentiment_score']]

# Correlation: Calculate Pearson correlation coefficient
correlation, p_value = stats.pearsonr(
    df_filtered['author_num_games_owned'],
    df_filtered['sentiment_score']
)
print(f"Pearson Correlation: {correlation:.4f}, P-value: {p_value}")

plt.figure(figsize=(10, 6))
sns.scatterplot(
    x='author_num_games_owned',
    y='sentiment_score',
    data=df_filtered,
    alpha=0.6,
    color="teal"
)
sns.regplot(
    x='author_num_games_owned',
    y='sentiment_score',
    data=df_filtered,
    scatter=False,
    color="darkblue",
    line_kws={'label': f'y={correlation:.2f}'}
)

plt.title('Correlation Between Number of Games Owned and Review Sentiment', fontsize=14)
plt.xlabel('Number of Games Owned', fontsize=12)
plt.ylabel('Sentiment Score', fontsize=12)
plt.grid(alpha=0.5)
plt.show()
Pearson Correlation: 0.0110, P-value: 0.022958515545024167
No description has been provided for this image

It is clear from the analysis that there is virtually no correlation between the number of games authors own and the sentiment scores of their reviews.

Research Question 8

Do reviews mentioning difficulty tend to be more positive or negative overall?
In [21]:
# Filter reviews mentioning difficulty
difficulty_reviews = df[df['mentions_difficulty'] == 1]

positive_reviews = difficulty_reviews[difficulty_reviews['voted_up'] == 1]
negative_reviews = difficulty_reviews[difficulty_reviews['voted_up'] == 0]

sns.histplot(positive_reviews['sentiment_score'], color='green', label='Positive', kde=True)
sns.histplot(negative_reviews['sentiment_score'], color='red', label='Negative', kde=True)
plt.legend()
plt.title('Sentiment Score Distribution for Difficulty Mentions')
plt.show()
No description has been provided for this image

We can see that majority of the reviews mentioning difficulty tend to be positive, with sentiment scores on the higher side.

Research Question 9

Are difficulty mentions more common in reviews with shorter playtimes compared to longer playtimes?
In [25]:
df['playtime_bin'] = pd.cut(df['author_playtime_forever'], bins=[0, 5, 20, 50, 100, float('inf')],
                            labels=['<5 hours', '5-20 hours', '20-50 hours', '50-100 hours', '>100 hours'])

difficulty_playtime = df.groupby('playtime_bin')['mentions_difficulty'].mean().reset_index()

stick_color = "gold"
head_colors = plt.cm.cool(np.linspace(0.2, 0.8, len(difficulty_playtime)))

plt.figure(figsize=(10, 6))
for index, row in difficulty_playtime.iterrows():
    plt.plot([index, index], [0, row['mentions_difficulty']], color=stick_color, alpha=0.8, linewidth=2)
    plt.scatter(index, row['mentions_difficulty'], color=head_colors[index], s=100, zorder=3)

for index, row in difficulty_playtime.iterrows():
    plt.text(index, row['mentions_difficulty'] + 0.01, f'{row["mentions_difficulty"]:.2%}', 
             ha='center', va='bottom', fontsize=10, color='black')

plt.ylim(0, 0.3)

plt.title('Proportion of Difficulty Mentions by Playtime', fontsize=14, pad=15)
plt.xlabel('Playtime Bin', fontsize=12)
plt.ylabel('Proportion of Reviews Mentioning Difficulty', fontsize=12)
plt.xticks(range(len(difficulty_playtime)), difficulty_playtime['playtime_bin'], rotation=45, fontsize=10)
plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()
C:\Users\aniru\AppData\Local\Temp\ipykernel_21572\1437077945.py:4: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  difficulty_playtime = df.groupby('playtime_bin')['mentions_difficulty'].mean().reset_index()
No description has been provided for this image

Around 25% of the reviews mentioning review tend to have less than 5 hours of total playtime. It seems like with increasing playtime, the proportion of reviews mentioning difficulty tends to decrease.

Research Question 10.

What are the most commonly expressed words in negative reviews mentioning difficulty?
In [29]:
# Filter negative reviews mentioning difficulty
negative_difficulty_reviews = df[(df['mentions_difficulty'] == 1) & (df['voted_up'] == 0)]['review']

vectorizer = CountVectorizer(stop_words='english', max_features=50)
word_freq = vectorizer.fit_transform(negative_difficulty_reviews)
word_freq_df = pd.DataFrame(word_freq.toarray(), columns=vectorizer.get_feature_names_out())

word_cloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(word_freq_df.sum().to_dict())
plt.figure(figsize=(10, 6))
plt.imshow(word_cloud, interpolation='bilinear')
plt.axis('off')
plt.title('Most Common Words in Negative Reviews Mentioning Difficulty')
plt.show()
No description has been provided for this image

Similar to our previous world clouds, words like "game", "play", "time" and "like" are common here.

In [ ]: