More about Phishtank API

Here is what will be good-to-have from phishtank.com API:

  • Good documentation about each interface e.g. how is callback_url used by auth.frob.request API ?
  • Description of all possible fields in return response (all possible XML elements and their possible values)
  • Some test URL's and emails which will return known responses (i.e. phishy URL, good URL, not in the database etc.)
  • Developer mailing list/wiki
  • Response should always honor the responseformat parameter if specified and valid

Phish Tank

http://www.phishtank.com is a new service which aims to help weed out phishing URLs and email addresses using wisdom of the crowds. Users can submit emails/URLs which they suspect of fraud and others can vote if they really are fraudulent or not. I think it is a great concept. There is a REST API using which applications can embed this webservice within them. So for example, there could be a outlook plugin which will display "phishy" email addresses in a special way in order to alert the user immediately. Same for web browsers which can render phishing websites in a special stylesheet. The applications can also add interface for the user to submit suspect pages and email easily without using web browsers.

I checked out the API and it does not feel like it is fully baked! There are interfaces for authorization and checking email/url status and submitting new emails/urls. Some things that stand out immediately are:

  • Exclusive use of SSL for the API access.
  • Parameter authentication (i.e. including cryptographic digest of all the parameters to ensure that parameters are not changed using man-in-the-middle attack)
  • Choice of xml or php output.

The api calling sequence works like this:

  1. User registers on the web for API access and gets api key and shared secret
  2. Using the API, application gets a frob (what is behind the name ?) and authorization url using auth.frob.request
  3. User has to authorize the frob using the authorization url specified in the response. (optionally you can specify callback url which the server will call for authorization, I will need to check this from home when I have access to a server -- the docs are very thin about the mechanism)
  4. Once authorized, app uses the frob and gets a token for short time API access (30 minutes in my tests) (auth.token.request)
  5. App can check token status which tells remaining time on token.(auth.token.status)
  6. App can revoke the token when it is done using it. (auth.token.revoke)
  7. The APIs for check.url, check.email, submit.url, submit.email then use the token.

I did not understand why there is a need for FROB in this, why can't you just get the token from api key and shared secret ? What problem are they solving by this indirection ?

Anyway, here is the ruby script that I used for testing this... I am planning to turn this into a module, but providing it here for early access...
phishtank.rb
config.yml

P.S. the check_url interface is not working, I am getting invalid token error. and the same token can be revoked successfully.
P.P.S. The API uses SSL (no cleartext api available) and ruby's open-uri library insists on checking the server SSL certificate which always fails (probably because signer needs to be trusted by openssl), I had to change it locally to ignore ssl verification in order to proceed.

Update (Oct/12/06): the check.url interface is finally working. For this API, the signature needs to be calculated before escaping the url. I refactored the ruby script a bit to remove redundant code and moved the configuration to a seperate file. I still need to work with the response parser and make it general for all types of responses. XML parsing gets so ugly so fast, it's amazing!

Book Update

Haven't been updating the blog recently... Here are some books I read in the last few weeks...

  • A View from the TOP (Audio Book) by Zig Ziglar. A very good audio programme about achieving significance in all aspects of life - Health, Finance, Relationships, Spirituality. This may be the first time I encountered someone being so open about his religious beliefs in a self-help program.
  • Think and Grow Rich - CoverThink and Grow Rich by Napoleon Hill - about creating a burning desire to achieve success and generating ideas.
  • Digital FortressDigital Fortress by Dan Brown - I was fascinated by earlier two books by Dan Brown - Angels and Demons and The DaVinci Code and this one deals with topics that are dearer to me - Security, Encryption, NSA. But I did not find it as gripping as the first two. I was particularly turned off by the concepts (e.g. mutation strings) that the author tries to create for the story to advance, such things just turn the brain off. (That makes me think that maybe I enjoyed the first two books because I do not have any knowledge about the topics of Pope, Illuminati, Christian history)
  • Deception PointDeception Point - by Dan Brown. This was even more boring about a new discovery by NASA, the politics, cover-ups, yawn...

Ruby script to get a list of all mp3 files in a directory sorted durationwise.

Here is some ruby practice...

def mp3len(dirname)
require 'mp3info'
summary=[]
Dir[dirname].each do |f|
Mp3Info.open(f) do |info|
summary.push([f,info.length])
end
end
summary
end

puts "Enter directory containing mp3 files"
dirname=gets.chomp
dirname = File.expand_path(dirname)
puts "Searching #{dirname}"
dirname += "/**/*.mp3"

s=mp3len(dirname).sort {|a,b| a[1] < => b[1]}
s.each{|q| puts "#{q[0]} => #{q[1]} seconds"}

Update: Here is the link to the file on github (for updates) or the git repository

Using GPG from behind proxies.

I have struggled a lot to get the GPG working inside corporate firewalls. It is so cumbersome to set the tools to automatically request keys from keyserver for signature validations. Finally found the magic options for doing this from behind HTTP proxy. Just keeping this command here for reference.

gpg --keyserver x-hkp://pgp.mit.edu --keyserver-options honor-http-proxy --recv-key 060C80C2

Fair Use and Google Book Search

There is a big copyright violation fight going on between Copyright Holders and Google about what is fair use and what is a violation. I came across a great article by Cory Doctorow on this issue. He is firmly on the side of Google on this issue and lists the three main points of contention:

  1. Google should cut copyright holders in for a slice of any revenue that comes from this.
  2. Google should have obtained permission before scanning the GBS books
  3. Although Google only shows excerpts, wily hackers could eventually piece together enough excerpts to reproduce the entire GBS library and then post it on the Internet

He then goes and explains how each of the three points are invalid. As he rightly points out, the biggest threat as an author isn't piracy, it's obscurity.

He has quotes Tim O'reilly, who says Piracy is progressive taxation. This appears in this article which is also a must read. That article was written in 2002 and was about legality of online file sharing.

If I were a copyright holder, (I mean a big copyright holder), my own stand would be to let google scan and index all my work (with possible penaulties if any evidence was found that people can hack google's system to reconstruct the complete work). The publishing industry will definitely move online and I will gain more if people can find links to my work when they are searching for related content.

Book update

Here are the books read in the past few weeks:

Blink : The Power of Thinking Without ThinkingBlink : The Power of Thinking Without Thinking by Malcolm Gladwell is about snap judgements (the author terms this as thinslicing)that we make about things, people. It gives a lot of examples where people make judgements about certain objects (e.g. whether a statue is genuine or fake) or people (whether the teacher is good or not). In many cases the judgements are amazingly correct with no scientific logic behind the judgement, but in other cases they are plain wrong. Malcolm has also written about the topic in this New Yorker article. Though the book has a lot of fascinating examples, I found much of the material common-sense and as you would guess there is no positive or negative about these judgements.

How to Win Friends and Influence PeopleHow to Win Friends and Influence People by Dale Carnegie is about various things that make you popular among people. As we all find out by experience, people that are most popular and/or make a lot of money are not necessarily geniuses, but they all have very good people skills. Some pieces of advice from the book: Never critisize, Praise (not flatter) people to give them importance, Make people want to do the things that you want them to do, Smile. Overall a good book with a nice conversational style (though the examples given are those of people way out in past - hey, the book was written in 1936!)

Permission Marketing : Turning Strangers Into Friends And Friends Into CustomersPermission Marketing : Turning Strangers Into Friends And Friends Into Customers by Seth Godin is about a new way of marketing where the marketer instead of interrupting the consumer, builds a long term relation with him. They offer some goodies in order to get permissions to send messages to the consumers and then they continue offering more and more baits to obtain more permissions. A very good read. Even though Seth works for Yahoo, it seems like their competitor is using his concepts in much more effective ways.

Emacs Rectangle Editing quick help

Enter picture mode (ESC-x picture-mode)

Marking rectangle

  • Go to top left corner, press CTRL-Space.
  • Go to bottom right, press CTRL-x-x (This selects the rectangle)

Working on rectangle

  • Press CTRL-x-r-k to kill the rectangle (and make it available as 'last-killed-rectangle')
  • Press CTRL-x-r-d to delete the rectangle
  • Press CTRL-x-r-y to yank 'last-killed-rectangle' with its top right at the mark.
  • Press ESC-x-clear-rectangle to fill the rectangle with spaces.

Finally quit the picture mode with CTRL-c-c

Book updates

This blog is turning into just a list of books I am reading. Anyway, here are some new books read in last month or so:

Execution: The Discipline of Getting Things Done -- by Larry Bossidy and Ram Charan (Re-read).

Working with Emotional Intelligence -- by Daniel Goleman - develops the ideas of Emotional Intelligence further and provides practical approach for implementing it. The concept of Emotional Intelligence (to me) is very obvious and I did not have much to take from this book.

Permission Marketing : Turning Strangers Into Friends And Friends Into Customers
by Seth Godin

Currently reading: Ready for Anything: 52 Productivity Principles for Work and Life -- by David Allen.

Escaping URLs vs escaping HTML

I often get confused between two types of escaping you need to do when developing web applications: URL escaping and HTML escaping. This is a short note about when should you use what.

URL Escaping (or HTTP encoding or URL encoding) is used to escape the characters not allowed to be permitted for a URL (e.g. To generate the query string to be passed from forms). The way to encode these characters is to use % and hexadecimal code for the ASCII character code. e.g. %20 for space character. (RFC 1738 defines the syntax and symantics of URLs)

HTML escaping is used when writing HTML documents where you do not want the browser to interpret the HTML special characters. e.g. You need to use &amp; to represent &. Wikipedia has an article describing escape sequences for comonly used special HTML characters.

Scripting languages have libraries which can do URL and HTML encoding and decoding for you:

Python: urllib.quote and urllib.unquote , ???, ???

Ruby: CGI::escape, CGI::unescape, CGI::escapeHTML, CGI::unescapeHTML

Playing mp3 links right in your browser

The del.icio.us folks have a nifty javascript piece of code which adds a small button to all the mp3 links on your webpage. (This infact embeds a small shockwave/flash script for each link). All you need to do is include the following code in the head section of your webpage:


<script type="text/javascript" src="http://del.icio.us/js/playtagger"></script>

Here is a link to their webpage (Notice the small play button just before the link to audio file.

Here are some popular audio links contributed by del.icio.us users.

Update: And then there is Yahoo! Media player with nicer look and more features. You just need to add
<script type="text/javascript" src="http://mediaplayer.yahoo.com/js"></script>

at the end of the html (just before closing </body>)

Yahoo Geocoding in ruby…

Find lattitude and longitude of any address

Yahoo just released a new beta of their maps webservice. Here is a small ruby script (inspired by Rasmus's PHP code ) that I wrote that returns Lattitude, Longitude of the address provided...

require 'open-uri'
require "rexml/document"
include REXML
url='http://api.local.yahoo.com/MapsService/V1/geocode?appid=yahoomap.rb&location='
puts 'Enter Location: '
address=gets
address=URI.escape(address)
result=URI(url+address).read
doc = Document.new result
r=doc.elements["/ResultSet/Result"]
print "Precision: ", r.attributes["precision"],"\n"
r.children.each { |c| print c.name, " : ",c.text,"\n"}

Update: Here is a link to the script in github or to the .