A Rhyme Detection analysis Framework and Webservice for Hip-Hop/Rap Lyrics
Created by Alex Marozick, Abduarraheem Elfandi, and Juno Mayer
Want to contribute to the project? Check out our Wiki for installation and usage instructions
Example Playlist for Testing our Spotify Integration:
https://open.spotify.com/playlist/5SiGw4acTtJWGeAEZjsZHY?si=xr3wHmaSQnKhLPCF-CDRJQ
https://open.spotify.com/playlist/5TZkls9cEOzWDR6qCxwDot?si=TT1T9favTGynYp0w7qUvIg
Rap Analyzer visualizes the way that a rapper rhymes in a given song by color coding groups of words that rhyme. Simply search a song on the homepage or log in with Spotify to analyze your favorite Hip-Hop playlist and recently listened songs. Your searches query our custom database of analyzed lyrics of over 100 artists.
We used the LyricsGenius python library to collect lyrics for over 100 artists. This library runs on a combination of the official Genius API and web-scraping tools. Due to Intellectual Property law and licensing, we cannot provide this database for other developers looking to contribute to this project, however, we have provided the scripts which we used to build our database. More information on using these tools is available in our wiki
English is a fickle language. For Example: through, rough, and trough share the same '-ough' ending despite being pronounced differently (/THro͞o/, /rəf/, /trôf/). This presents an interesting problem as rhymes cannot be detected via spelling alone. Moreover, rappers frequently coerce pronunciations to force rhymes where they would not normally occur.
To account for these problems, we break words down into a list of phonemes (phonetic syllables which describe how a word is pronounced) using the Python Natual Language Toolkit and the CMU Pronouncing Dictionary (CMUDict) which contains 39 phonemes and three levels of syllabic stress. For example, Cheese
-> CH IY Z
. We compare the ending phonemes of words to look for rhymes and address pronunciation coercion in two ways:
-
Multiple Word Pronunciations
CMUDict contains multiple pronunciations for many words. For Example, we catch a rhyme between
"Business"
and"Witness"
by comparing two pronunciations of"Business"
(B IH1 Z N AH0 S
andB IH1 Z N IH0 S
) with the single pronunciations of"Witness"
(W IH1 T N AH0 S
) -
Rebuilding Truncated Suffixes
Many rappers use a truncated suffix, ending present participle
-ing
words within'
Dodgin' bullets, reapin' what you sow And stackin' up the footage, livin' on the go
Dodgin
andReapin
are not official words listed in the CMUDict, so we rebuild these suffixes toDodging
andReaping
to catch rhymes between words of this form and words with a completeing
endingNote: For words that are not in the CMUDict, direct string equality is used.
Our algorithm considers one verse or chorus at a time to mitigate false positives derived from rhymes detected across sections. In a given section, our algorithm operates as follows:
rhymeNumberList = [-1] * Length of section
rhymeNumber = 0
for A in section:
for B in section[index of A onward]:
if A rhymes with B:
Mark A with rhymeNumber
Mark All instances of B with RhymeNumber
rhymeNumbers[IndexOf(A)] = rhymeNumber
RhymeNumber++
A section's rhyme information is stored as a list of rhyme numbers: word A and word B rhyme if RhymeList[IndexOf(A)] == RhymeList[IndexOf(B)]
, We use rhyme numbers to determine which color to ascribe to a set of words on the Rap Analyzer Web service.
In addition to viewing one song at a time by searching on the Rap Analyzer Homepage, users can log in with their Spotify account to grant Rap Analyzer access to their playlists and recently listened songs. Users can then select a playlist or number of recently listened songs they wish to analyze. Highlighted Lyrics are presented to the user with a dropdown menu to switch between songs.
A word's Rhyme Number is used to determine the color of its highlight: Words with the same rhyme number rhyme with each other and are given the same highlight color. A word's rhyme number is mapped to an HSL hue value from 0 to 360. To obtain a set of diverse highlighting colors. Rhyme Numbers and are mapped to hue values as follows:
Rhyme Number Hue
[0,17] [n * 10 for n in range(0,360) if n % 2 == 0] (20,40,60,...360)
[18,36] [n * 10 for n in range(0,360) if n % 2 != 0] (10,30,50,...350)
The brightness value of a color alternates between a light and dark highlight to further differentiate sets of rhymes
A rhyme number of -1 corresponds to no highlighting.
Web-scraping lyrics using LyricsGenius proved too inefficient to rely on for realtime analysis of more than one song at a time. We opted to build a MongDB database of over 100 artists (most of which had 100-300 songs), their lyrics, and their rhyme information in the following format:
{Database:
{'artist1' : [
{'song': "songName1" , 'lyrics' : "lyrics go here", 'rhyme' [[List of Rhyme Numbers]]}
{'song': "songName2" , 'lyrics' : "lyrics go here", 'rhyme' [[List of Rhyme Numbers]]}
]
}
{'artist2' : [
{'song': "songName1" , 'lyrics' : "lyrics go here", 'rhyme' [[List of Rhyme Numbers]]}
{'song': "songName2" , 'lyrics' : "lyrics go here", 'rhyme' [[List of Rhyme Numbers]]}
]
}
}
Artist and song names were made lowercase and stripped of any special characters. ('.' replaced with '' and '$' replaced with 's' to conform to MongoDB restrictions) MongoDB's Text indexing to allows users to search our database in a case-insensitive manner.
-
Using Rhyme Information to generate Statistics:
We intend to calculate statistics for artists based on rhyming such as their average amount of unique rhymes per song. This information will be merged with the present Spotify page to allow a user to compare artists on their favorites list -
Improving Rhyme Detection
There are many ways to improve our algorithm, including ignoring tense and plural word endings as rappers often coerce rhymes between present and past tense, or single and plural words. Furthermore, we could implement an algorithm that searches for common strings of vowels in words to detect multi-word rhyme schemes. Eventually, we want to apply Machine Learning models to rhyme detection to allow for the coverage of areas where our algorithm falls short.