Fun with Ferret

Posted on September 05, 2006

I've been using Lucene for many years on many different projects. Thanks to Dave Balmain, this library is ported to Ruby and it seems to be even faster than original. It's called ferret. I've just started playing with it and so far it looks very promising.

Sometime ago, Zenspider posted a neat trick that allows searching ri database. Since Ruby version 1.8.5, this database is expanded significantly in size, not to mention that every installed gem extends it with its own documentation. It might take some time if you just go through all these yaml files and grep for an interesting information.

This particular task nicely fits with what ferret can offer.

Install ferret

    $ sudo gem install ferret

Build index using the following script (ri_indexer):

    $ cat ri_indexer
    #!/usr/bin/env ruby

    require "rdoc/ri/ri_driver"
    require "rubygems"
    require "ferret"
    require "find"
    require "yaml"
    include Ferret

    INDEX_FILE = File.expand_path('~/.ri_index')

    fis = Index::FieldInfos.new
    fis.add_field :name, :term_vector => :no
    fis.add_field :content, :store => :no
    fis.create_index(INDEX_FILE)

    index = I.new(:path => INDEX_FILE, :create => true)

    dirs = RI::Paths::PATH
    dirs.each do |dir|
      Find.find(dir) do |fn|
        next unless File.file?(fn)
        doc = YAML.load(File.read(fn))
        next unless doc.respond_to?(:comment)
        next unless doc.comment
        index << {
          :name => doc.full_name, 
          :content => doc.comment.map{|f|f.body if f.respond_to?(:body)}.join("\n")
        }
      end
    end
    index.optimize
    index.close
    $ ./ri_indexer

Now you can use this script for searching (ri_search):

    $ cat ri_search
    #!/usr/bin/env ruby

    require "rubygems"
    require "ferret"
    require "find"
    require "rdoc/ri/ri_driver"
    include Ferret

    INDEX_FILE = File.expand_path('~/.ri_index')

    query = ARGV.join(' ')
    ARGV.clear

    RI::Options.instance.use_stdout = true
    ri = RiDriver.new
    index = I.new(:path => INDEX_FILE)
    index.search_each(query) do |id, score|
      puts
      begin
        ri.get_info_for(index[id][:name])
      rescue Exception
        puts $!.message
      end
    end
    $ ./ri_search kill

This is way more faster than the original script. Also this script accepts quite sophisticated query expressions. For example,

    $ ri_search rescue AND public
    $ ri_search +split -String

Refer to ferret's trac web site, where you can find more information about this wonderful library.

Comments
  1. HelawhimelaDecember 18, 2007 @ 12:26 AM

    Viagra Soft Tabs (Sildenafil) are quick-dissolving lozenges, used to treat male impotence. Proving its high quality and reliability, Viagra Soft has reduced the start time from 30 to 15 minutes with the same response time of 5 hours.