For our Music MOOC Play With Your Music (PWYM) 5282 folks signed up to learn together. We wanted to put them in groups based on their musical taste hoping to improve group cohesion and collaboration.
We decided to use The Echo Nest since they have an API that helps with doing just that! The Echo Nest provides taste profiles – a collection of songs or artists representing a user’s musical preference. The Echo Nest can then relate one taste profile to another – higher scores mean that taste profiles are more alike.
During signup we asked users to input 5 artists they like. Users had to input free text and this resulted in creatively spelled artist names. The Echo Nest API deals with this by trying to resolve artist names, so the problem wasn’t disastrous! This is a step that can be improved in the future.
The top 100 artists (look carefully, Beatles are there twice).
For every user that signed up we created a taste profile with the 5 artists they provided. We used Pyechonest – a Python library implementing The Echo Nest API – to create the taste profiles.
Here is an excerpt of the Python code to create the taste profiles:
For every taste profile we retrieved a likeness score for the 40 most similar taste profiles. Pyechonest didn’t implement this call, so we used the Python requests library to call the API.
Here is an excerpt of the code to get likeness scores for taste profiles:
This formed a weighted graph that we stored in an adjacency list. See the graph below of only a small part of the data – finding groups where all the people are somewhat alike seems daunting!
Graph of only 81 people!
Next we had to create actual groups. We decided on groups of 40 people to leave enough space for lurkers without overwhelming groups with too much messaging.
We wanted to put users in groups so that the taste profiles for users in the same group would have high similarity scores. Every user could not simply be grouped with the 39 users whose taste profiles matched best, since there would be no guarantee that those users would be similar to each other.
We were not the first people with this problem! The k-means clustering algorithm does almost what we wanted! The only problems was that k-means doesn’t create groups of equal size.
We adapted the algorithm to keep the group size fixed. Here is the pseudo code for the algorithm:
randomly assign users to groups
for every group
for every user in the current group
calculate the possible group scores for the user in all the other groups
if the user has a higher group score in one of the other groups
find the user in the other group with the lowest group score
swap the two users
while we are significantly improving the average group score, go back to step 2
We used this algorithm to group the 5282 users who signed up into 133 groups. The last group had only 2 users so we added them into the groups where they fit the best to end up with 132 groups.
By using the Echo Nest API and adapting existing algorithms we we’re able to try out a novel way of grouping learners. The idea of being grouped this way appealed to many learners who signed up and they awaited being assigned to a group with much anticipation. This in turn resulted in higher engagements by the learners around musical interest.
If experiments like this appeals to you, reach out to folks at P2PU with your suggestion for novel ways of grouping users in different subject areas and suggestions for measuring the success here.
A big thank you goes out to The Echo Nest and specifically Paul Lamere for creating a useful service, their excellent support and sponsoring us the additional API bandwidth we needed to pull this off!