Simple recommendation system in Ruby
I’m looking for new job; so yesterday I went through my old Rails projects, trying to describe them for my updated CV. I found interesting old project where I wrote recommendation system. Nothing fancy, just simple tag based recommendation for blog articles. I decided to extract some of the code and blog about it.
Algorithm used for recommendation is based on Jaccard Index and is also known as the Jaccard similarity coefficient. Jaccard Index was invented by botany professor Paul Jaccard and is basically a number that represents similarity and diversity of sample sets.
How it works?
Take current item (blog post for example), and its properties that describe it the best (tags, categories or words). Then you take every other item and you calculate division
of intersection
and union
. Mathematical can this be represented with following equation.
Equation will give you number between 0 and 1. You can also easily calculate dissimilarity between items - represented with next equation.
Example implementation with Ruby
I’ll be recommending books based on words in titles. So simple Book
class will do the job just fine.
class Book < Struct.new(:title)
# Array of unique words longer than 2 characters.
# Could also be array of tags or categories.
def words
@words ||= self.title.gsub(/[a-zA-Z]{3,}/).map(&:downcase).uniq.sort
end
end
BookRecommender
class is initialized using current book and array of books. The method recommendations
will loop over array and set jaccard_index
for each item; then items will be sorted accordingly.
class BookRecommender
def initialize book, books
@book, @books = book, books
end
def recommendations
# Map jaccard_index to each item and sort array
@books.map! do |this_book|
# We can define jaccard_index getter in runtime as singleton...
this_book.define_singleton_method(:jaccard_index) do
@jaccard_index
end
# also setter
this_book.define_singleton_method("jaccard_index=") do |index|
@jaccard_index = index || 0.0
end
# Calculate intersection between sets
intersection = (@book.words & this_book.words).size
# ... and union
union = (@book.words | this_book.words).size
# Assign the division and rescue if division is not possible with 0
this_book.jaccard_index = (intersection.to_f / union.to_f) rescue 0.0
this_book
# Sort items
end.sort_by { |book| 1 - book.jaccard_index }
end
end
For demonstration purposes…
# ...
# Read data and define array of books
BOOKS = DATA.read.split("\n").map { |l| Book.new(l) }
# Define current book
current_book = Book.new("Ruby programming language")
# Do recommendation...
books = BookRecommender.new(current_book, BOOKS).recommendations
books.each do |book|
puts "#{book.title} (#{'%.2f' % book.jaccard_index})"
end
__END__
Finding the best language for the job
Could Ruby save the day
Python will rock your world
Is Ruby better than Python
Programming in Ruby is fun
Python to the moon
Programming languages of the future
Sample output for title “Ruby programming language” with Jaccard index visible on the right side.
Programming in Ruby is fun (0.50)
Programming languages of the future (0.17)
Is Ruby better than Python (0.17)
Could Ruby save the day (0.14)
Finding the best language for the job (0.12)
Python to the moon (0.00)
Python will rock your world (0.00)
Conclusion
This is purely Ruby based solution - source code can be found on my gist. I’ve also written PostgreSQL versions that uses tags. Please have in mind that if the sets are bigger the code will get slower; so don’t do recommendations each time you “open” item. Define background service that will calculate recommendations in background!
Hope that you can use this in your projects and please - give me some feedback. Thanks! :)