Category Archives: ruby

Quora answers “Brilliant” problem

Came across this question on Quora (Quora is an infinite time sink BTW!) The original question is at

Wonder why people post puzzle-type questions on Quora!

Anyway, the highest ranked answer gives a nice way to approach the problem. I would not be able to think that way and just used the brute-force:
Here is the ruby code to solve it:

def testit(a)
  (x,y,z)= a/100%10,a/10%10,a%10
  return a == x*x+y*y+z*z + 543

q=(111..999).select{|q| testit(q)}

The answer is 2626, the last 3 digits being 626.

RSS Feeds for Indian Columnists

Frustrated by Indian Express's inability to provide individual syndication feeds for its columnists, I have written scripts to parse the HTML pages and generate the feeds myself.

Here are the feeds for
Shekhar Gupta
Tavleen Singh.
R. Jagannathan (DNA India)
Arun Shourie
Sudheendra Kulkarni
Ila Patnaik
Pratap Bhanu Mehta

If you want this for another columnist, let me know and I will add that too. This is very easy to do for Indian Express columnists as I already have the script, but I can also help with other websites.

P.S. The script is in ruby and I will release the source after I fix some things and clean it up some more.

Update: May 29, 2009
Added new feeds for Arun Shourie, Sudheendra Kulkarni and Ila Patnaik

Update: June 22, 2009 - Added columns feed for R. Jagannathan of DNA India

Update: August 11, 2009 - Feeds for C. Rajamohan , Harsha Bhogle , Shailaja Bajpai.

Update: Sept. 10, 2009 - You can use my shared page from Google Reader to see all the new posts from all of these columnists on a single page.

Update: July 18, 2011 - Added Karan Thapar.

opml to csv converter

This is a first step in being able to make all  my  planets    configurable from anywhere. The following ruby script parses the opml file specified on the command line and generates a comma separated file with XML feed URL and feed title. In case of nested outline elements, it just picks the elements which actually have xmlUrl attribute (this will flatten the opml hierarchy which is used by google reader - for implementing labels and bloglines - for implementing folders)

require 'csv'
require "rexml/document"
include REXML
if ARGV.length >=1
fname = ARGV[0]
fname = "opml.xml"

doc ='csvfile.csv', 'w') do |writer|
doc.elements.each("//outline[@type='rss']") {|element|
writer < <  [element.attribute("xmlUrl").value, element.attribute("text").value]

Using Ruby for integer format conversions

Here are some recipes for interpreting stream of bytes as different C types. I will keep adding more to this as I go...

Convert a byte stream (embedded in a string) into 1 byte signed integers:
=> [-4, -3, -2, -1]
Convert a byte stream (embedded in a string) into 1 byte unsigned integers:
=> [252, 253, 254, 255]

Fixing the geocoder gem.

I wanted to use the ruby geocoder library on the windows machine, but the installation of the gem failed due to some weird error. I checked the rubyforge project page to see if someone else had a similar problems and someone actually had, but the bug was open for a long time. I decided to fix this issue and found that the problem was due to the fact that windows platform does not allow characters '?' and '&' in the filename with any escaping, period. The said files were used (in a very innovative way I must say!) to test the library by modifying http.rb to return the test datafile contents instead of fetching the URL from the net. (yay open classes in ruby!). The way I fixed the problem was to change the filenames to use '_' instead of '?' and '__' instead of '&'.

I wrote to the developer, but there was no response. Anyway I managed to create a new GEM with the changed files so that this should be installable on windows now. Here are the files if you want to try installing the gem. (Also including the tgz because... it got generated anyway!)

More about Phishtank API

Here is what will be good-to-have from API:

  • Good documentation about each interface e.g. how is callback_url used by auth.frob.request API ?
  • Description of all possible fields in return response (all possible XML elements and their possible values)
  • Some test URL's and emails which will return known responses (i.e. phishy URL, good URL, not in the database etc.)
  • Developer mailing list/wiki
  • Response should always honor the responseformat parameter if specified and valid

Phish Tank is a new service which aims to help weed out phishing URLs and email addresses using wisdom of the crowds. Users can submit emails/URLs which they suspect of fraud and others can vote if they really are fraudulent or not. I think it is a great concept. There is a REST API using which applications can embed this webservice within them. So for example, there could be a outlook plugin which will display "phishy" email addresses in a special way in order to alert the user immediately. Same for web browsers which can render phishing websites in a special stylesheet. The applications can also add interface for the user to submit suspect pages and email easily without using web browsers.

I checked out the API and it does not feel like it is fully baked! There are interfaces for authorization and checking email/url status and submitting new emails/urls. Some things that stand out immediately are:

  • Exclusive use of SSL for the API access.
  • Parameter authentication (i.e. including cryptographic digest of all the parameters to ensure that parameters are not changed using man-in-the-middle attack)
  • Choice of xml or php output.

The api calling sequence works like this:

  1. User registers on the web for API access and gets api key and shared secret
  2. Using the API, application gets a frob (what is behind the name ?) and authorization url using auth.frob.request
  3. User has to authorize the frob using the authorization url specified in the response. (optionally you can specify callback url which the server will call for authorization, I will need to check this from home when I have access to a server -- the docs are very thin about the mechanism)
  4. Once authorized, app uses the frob and gets a token for short time API access (30 minutes in my tests) (auth.token.request)
  5. App can check token status which tells remaining time on token.(auth.token.status)
  6. App can revoke the token when it is done using it. (auth.token.revoke)
  7. The APIs for check.url,, submit.url, then use the token.

I did not understand why there is a need for FROB in this, why can't you just get the token from api key and shared secret ? What problem are they solving by this indirection ?

Anyway, here is the ruby script that I used for testing this... I am planning to turn this into a module, but providing it here for early access...

P.S. the check_url interface is not working, I am getting invalid token error. and the same token can be revoked successfully.
P.P.S. The API uses SSL (no cleartext api available) and ruby's open-uri library insists on checking the server SSL certificate which always fails (probably because signer needs to be trusted by openssl), I had to change it locally to ignore ssl verification in order to proceed.

Update (Oct/12/06): the check.url interface is finally working. For this API, the signature needs to be calculated before escaping the url. I refactored the ruby script a bit to remove redundant code and moved the configuration to a seperate file. I still need to work with the response parser and make it general for all types of responses. XML parsing gets so ugly so fast, it's amazing!

Ruby script to get a list of all mp3 files in a directory sorted durationwise.

Here is some ruby practice...

def mp3len(dirname)
require 'mp3info'
Dir[dirname].each do |f| do |info|

puts "Enter directory containing mp3 files"
dirname = File.expand_path(dirname)
puts "Searching #{dirname}"
dirname += "/**/*.mp3"

s=mp3len(dirname).sort {|a,b| a[1] < => b[1]}
s.each{|q| puts "#{q[0]} => #{q[1]} seconds"}

Update: Here is the link to the file on github (for updates) or the git repository

Escaping URLs vs escaping HTML

I often get confused between two types of escaping you need to do when developing web applications: URL escaping and HTML escaping. This is a short note about when should you use what.

URL Escaping (or HTTP encoding or URL encoding) is used to escape the characters not allowed to be permitted for a URL (e.g. To generate the query string to be passed from forms). The way to encode these characters is to use % and hexadecimal code for the ASCII character code. e.g. %20 for space character. (RFC 1738 defines the syntax and symantics of URLs)

HTML escaping is used when writing HTML documents where you do not want the browser to interpret the HTML special characters. e.g. You need to use &amp; to represent &. Wikipedia has an article describing escape sequences for comonly used special HTML characters.

Scripting languages have libraries which can do URL and HTML encoding and decoding for you:

Python: urllib.quote and urllib.unquote , ???, ???

Ruby: CGI::escape, CGI::unescape, CGI::escapeHTML, CGI::unescapeHTML

Yahoo Geocoding in ruby…

Find lattitude and longitude of any address

Yahoo just released a new beta of their maps webservice. Here is a small ruby script (inspired by Rasmus's PHP code ) that I wrote that returns Lattitude, Longitude of the address provided...

require 'open-uri'
require "rexml/document"
include REXML
puts 'Enter Location: '
doc = result
print "Precision: ", r.attributes["precision"],"\n"
r.children.each { |c| print, " : ",c.text,"\n"}

Update: Here is a link to the script in github or to the .