Category Archives: Programming

Escaping URLs vs escaping HTML

I often get confused between two types of escaping you need to do when developing web applications: URL escaping and HTML escaping. This is a short note about when should you use what.

URL Escaping (or HTTP encoding or URL encoding) is used to escape the characters not allowed to be permitted for a URL (e.g. To generate the query string to be passed from forms). The way to encode these characters is to use % and hexadecimal code for the ASCII character code. e.g. %20 for space character. (RFC 1738 defines the syntax and symantics of URLs)

HTML escaping is used when writing HTML documents where you do not want the browser to interpret the HTML special characters. e.g. You need to use & to represent &. Wikipedia has an article describing escape sequences for comonly used special HTML characters.

Scripting languages have libraries which can do URL and HTML encoding and decoding for you:

Python: urllib.quote and urllib.unquote , ???, ???

Ruby: CGI::escape, CGI::unescape, CGI::escapeHTML, CGI::unescapeHTML

Playing mp3 links right in your browser

The del.icio.us folks have a nifty javascript piece of code which adds a small button to all the mp3 links on your webpage. (This infact embeds a small shockwave/flash script for each link). All you need to do is include the following code in the head section of your webpage:


<script type="text/javascript" src="http://del.icio.us/js/playtagger"></script>

Here is a link to their webpage (Notice the small play button just before the link to audio file.

Here are some popular audio links contributed by del.icio.us users.

Update: And then there is Yahoo! Media player with nicer look and more features. You just need to add
<script type="text/javascript" src="http://mediaplayer.yahoo.com/js"></script>

at the end of the html (just before closing </body>)

Yahoo Geocoding in ruby…

Find lattitude and longitude of any address

Yahoo just released a new beta of their maps webservice. Here is a small ruby script (inspired by Rasmus's PHP code ) that I wrote that returns Lattitude, Longitude of the address provided...

require 'open-uri'
require "rexml/document"
include REXML
url='http://api.local.yahoo.com/MapsService/V1/geocode?appid=yahoomap.rb&location='
puts 'Enter Location: '
address=gets
address=URI.escape(address)
result=URI(url+address).read
doc = Document.new result
r=doc.elements["/ResultSet/Result"]
print "Precision: ", r.attributes["precision"],"\n"
r.children.each { |c| print c.name, " : ",c.text,"\n"}

Update: Here is a link to the script in github or to the .

Library Lookup greasemonkey script for San Diego public library and San Diego County library

I promised here that I will polish my library lookup script and post it here, but haven't yet found any time to do that. I am posting it here so someone could work on it and make it better...

I have tried to look for the ISBN in both SDCL and SDPL, and it inserts the message and the link correctly. However in some of the cases, amazon inserts some more elements inside the same div container, so the links appear much later. (not immediately following the title). I have an idea of how to fix this and am going to fix this very soon...

Get the LibraryLookup Greasemonkey script from here.

(Usage: RIght click on the link and select "Install User Script" from the menu. OR open it in forefox, go to tools/Install this user script, click ok and now on you will see if any book that you are browsing on amazon.com is in SDCL or SDPL!)

Update: 11/30/05. The new bookburro extension for firefox supports SDPL and SDCL libraries by default, I will not have to update my miserly scripts now...

Django middleware

In the django project settings there is a key called MIDDLEWARE_CLASSES which is a tuple of strings implementing the middleware methods. Django base handler (TBD explain what this class does) reads this setting and initializes three of its own attributes: _request_middleware, _view_middleware and _response_middleware. It goes through the list of middleware classes instantiates each of them and adds the bound method process_request to the _request_middleware attribute, process_view method to view_middleware and process_reponse method to _response middleware.

If the middleware class method returns something (indicating that it has taken some action), the basehandler get_response method returns immediately. Otherwise it proceeds further.

There are 4 middleware classes built in:

  • AdminUserRequired middleware class implements the process_view method and silently returns if the user in the request is logged in and is a valid admin user. Otherwise it returns login page with appropriate error message.
  • CommonMiddleware implements process_view and process_response methods. process_view rejects forbidden user-agents and prepends URL with www and appends trailing slash if desired. process_response checks if there is a matching flat file present that can be sent for 404's and can also send email note to managers about broken links. process_response also manages ETags
  • CacheMiddleware implements process_request to check pages (not containing any query string) against cache. process_response adds pages to cache as needed.
  • XViewMiddleware (used internally for documentation) implements process_view and attaches 'X-View' header to internal HEAD requests.

Update: 10/14/2005. There are some updates to the middleware that ships with django and is now "officially" documented here.

django decorators

Django framework has used some design patterns. There is a directory called decorator which currently has two decorators: (decorator is just a method which dynamically adds additional functionality to original method depending on the situation)

  • funcA = login_required(funcA)
    This replaces the funcA with a function which checks if the user is logged in and calls original function if the user is indeed logged in and redirects the anonymous users to login page.
  • funcB=cache_page(funcB,cache_timeout,key_prefix)
    The original function is changed to look into the cached pages.

Getting to know the django web framework

I was just about to abandon python and join the ruby camp to be able to use the wonderful rails framework for web application development. (They do have very good documentation and impressive video demo which you should check out!) But then came the announcement of Django. I really like the python language and feel that I can understand someone else's python code, (though ruby looks equally fascinating). I read through the tutorials and checked out the svn repository and worked with the tutorials. This framework looks easy to use and seemingly makes your application portable enough to use any of the underlying database backends (postgresql, mysql, sqlite) and webservers (apache, lighthttpd, twistedweb etc). The initial few days after the announcement, there were very hectic updates on the code and documentation front (with support for sqlite backend , standalone server and new tutorial and documentation about generic views and form fields coming in a matter of a couple of days). This has now gradually slowed down.

I decided to write a sample application (rebate tracking) and immediately hit some issues. I am trying to make the user registration and login/logout part work, but am not following how that is hooked into the framework. The users added with admin interface do not get recognized by the authentication code. Tried IRC help, but haven't been able to get anyone who can help. I am studying the code right now. I am going to look closely into the decorator and middleware code now. I will write about my progress here.

Greasemonkey: Control your web!

Greasemonkey is a plugin for Firefox browser that lets you assign DHTML scripts to various domains. What's the big deal you ask ? This lets you correct some annoying problems some websites have or even add some nice features to your regular websites.

There are tons of user contributed scripts for doing fun things, like Adding waypoints to google maps., Remove ads from indiatimes.com etc.

The disadvantages of this are 1. this is firefox specific and 2. This works only on the computer that you installed the extension and scripts on. But hey! it's still way cool...

Update: 02/22/05
Here is a very nice application of this: Let's say you are browsing amazon.com for some books, how about checking out your local public library's catalogue for the same book and displaying that information right next to amazon book title ? This has been done by Jon Udel (but looks like he has taken down his script) and Bill Stilwell. I managed to tweak his script to search San Diego Public Library. Hurray! Contact me if you are interested. I will polish the script and put it here in a few days anyway...

Some more blog posts on greasemonkey:
http://taint.org/2005/03/16/201734a.html
http://javascript.weblogsinc.com/entry/1234000273026520/

**Update: 05/05/12**
Hmmm... [Dive into Greasemonkey](http://diveintogreasemonkey.org/ "Dive into Greasemonkey")

**Update: 05/18/05**
Wow this monkey is getting bigger and bigger.

[Here is](http://www.wired.com/news/technology/0,1282,67527,00.html "Firefox Users Monkey With the Web") a link to a recent wired article on greasemonkey.

Rebate tracker use cases

  • Adding a new rebate:
    Step 1 Enter zip code for mailing address (zip+ext)
    Step 2 A list of rebates with same submission zip code is displayed
    Step 3
    Case 1 - Click on one of the links to prepopulate rebate form with entered information.
    (Display Submission address, Contact information, rebate validity, postmark date, expect check date)
    OR
    Case 2 - Enter new rebate and capture all information:
    (Product Name*), (Rebate amount*), (Rebate Description*), (Purchase valid from*), (Purchase valid until*), (Must postmark by*), (Expect check in*), (to ) weeks, (Submission address*), (City*), (State*), (Zip*), (Zipext), (Product Website), (Rebate Form URL), (Enquiry Phone), (Enquiry Email), (Inquiry Website)
    Display info as in Case 1.

    Step 4 Get (Purchase Location), (Purchase Date), (Purchase Price), (Date Mailed), (Postage), (Status: one of Mailed, Processing, Approved, Declined, Received, Void), (date Completed)
    Allow user to save/cancel data

  • Change Status of a rebate
    Currently active (status != Declined, Received, Void) rebates are displayed. One of them is selected, display the same data as in Step 4. and allow user to change and save data.
  • Periodic run
    Go through currently active rebates and email reminder for all rebates falling after (mail date + Expect check low)

Lateral Thinking…

How will you write a program to find jumbled words ?

The shotgun approach is the first one anyone is bound to follow at first. i.e. For all permutations of the letters, find if there is a match in the dictionary of words. You might do some optimizations to ignore repetitions etc. But this is O(n^2) complexity solution.

I read an elegant way to solve this here. The trick is to notice that the real answer and the jumbled word look the same when they the letters are sorted.
(Let's ignore the time to sort the words for now, which is O(n*log(n)) I believe for decent algorithms.)

Here is a python snippet to solve the jumble:

#!/bin/env python3
def find_jumble(jumble, word_file='/usr/dict/words'):
    sorted_jumble = sort_chars(jumble)
    for dictword in open(word_file, 'r').readlines():
        if sorted_jumble == sort_chars(dictword):
            yield dictword

def sort_chars(word):
    w = list(word.strip().lower())
    w.sort()
    return w

while(1):
    inp = input("Enter word: ")
    if not inp: break
    for ans in find_jumble(inp):
        print("Answer = ", ans)