Harshit Kumar

Friendship paradox: facebook

Friday, August 18, 2017
4 mins read

According to a 2012 study by Pew Research Center’s Internet and American Life Project¹:

On Facebook, the average person has 245 friends. However, the average friend of a person has 359 Facebook friends. The finding, that people’s friends have more friends than they do, was nearly universal. Only those who had among the 10% largest friends lists (over 780 friends) had friends who on average had smaller networks than their own.

It’s just the digital reflection of what’s known as the friendship paradox², the phenomenon first observed by the sociologist Scott L. Feld in 1991 that most people have fewer friends than their friends have, on average.

The generalized friendship paradox states that the friendship paradox applies to other characteristics as well. For example, one’s co-authors are on average likely to be more prominent, with more publications, more citations and more collaborators, or one’s followers on Twitter have more followers.

Let’s check this for our facebook friends.

Get the data

Navigate to friends list on your facebook profile. Scroll down enough so all your friends are on the page, Select All (Ctrl-A) and Copy (Ctrl-C). Then, paste (Ctrl-V) the content copied into any regex editor³.
Note: The data can also be grabbed by web scrapping using Python’s beautifulsoup and requests library.

Clean the data

Now, we need to find all instances of count of friends from the content pasted. It can be done by regex expression [,\d]+ friends. Grab all instances of text like 345 friends and save it as txt file, say, facebook_friends.txt.

Note: We won’t get count of all of our friends via this method because friends page on facebook lists some friends with mutual friends count if they have privacy set to Only me.

Analyse the data

Now, open python console and do some analysis on the data.

Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux

# Import necessary modules  
In[2]: import pandas as pd
  ...: import matplotlib.pyplot as plt

# read the data
In[3]: df = pd.read_table('facebook_friends.txt', sep=' ', thousands=',', header=None, names=['friend_count', 'text'])
# check the structure of data
In[4]: print(df.head())
   friend_count     text
0           637  friends
1           101  friends
2           350  friends
3          1191  friends
4           300  friends
# description of data
In[5]: print(df.describe())
       friend_count
count    116.000000
mean     594.146552
std      829.748765
min       14.000000
25%      218.500000
50%      380.000000
75%      647.000000
max     4972.000000

Now, let’s plot the data.

# average no of friends
In[6]: avg = df.mean()
# median no of friends
In[7]: median = df.median()
# count of my friends
In[8]: my_friends = 208

# plot a histogram
In[9]: df.hist(bins=20)
In[10]: plt.xlabel('Friend count')
In[11]: plt.ylabel('Number')
In[12]: plt.suptitle("Histogram of Friend Counts")
In[13]: for (x, c) in zip([avg, median, my_friends], ['k', 'b', 'r']):
   ...:    plt.vlines(x, 0, 40, colors=c)
In[14]: plt.show()

The vertical lines: – black, blue, red – represent the mean, the median, and my own personal friend count, respectively.

It appears the paradox holds true for me as well!
I have 208 facebook friends, but on average, my friends have 594 facebook friends.

Footnotes:
1: 2012 Pew Research Center’s study ↩
2: Friendship paradox ↩
3: Regex editor ↩
4: Image source

Data Science Python

« Turtle in Python: A Traffic light Email spam filtering: Text analysis in R »

Friendship paradox: facebook

Get the data

Clean the data

Analyse the data

You May Also Like