Category Archives: Python

First impressions on Google Wave

I was really looking forward to get a chance to play with Google Wave. If you have not heard about this, Google is experimenting with a new means of communication and collaboration (to destroy any remains of your unused time!). Watch the video of the demo from Google IO conference to get a feel for the technology.

The catch line is - How would email be if it is designed today ?

Being Google, they are fairly open about the entire technology.Google Wave is a collection of several components which work in concert to bring us this amazing way to collaborate and communicate. There is the wave server (which hosts the waves. Google provides an implementation and others are free to implement it in their own control), federation protocol (which is open specifications protocol and allows the servers to talk to each other), the client (typically your web browser which you use to interact with the wave server, but there is a sample text client and emacs based client in development as well!), the gadgets (small pieces of code that are embedded in documents and provide rich look and feel and additional functionality to the wave) and the robots (robot participants in the wave which can do cool things like correct spelling as you type, syntax highlight code while it is being pasted in the wave, translate language etc.)

I have spent some time in developing a robot called Nokar (meaning assistant or servant in Marathi/Hindi) which can do several things when invited to a conversation - Insert images based on specified keywords, translate text between a set of 20 languages among some other geeky functions. The intention was to learn about the robot protocol. I also created some pages which use the embed API. This allows any web page to embed a wave conversation (or a subset of it). I am also going to experiment with the Gadgets in the next few weeks. I will try to document my process in next few posts.

Simple way to share files on intranet.

If you want to download files from machine A to machine B and have python installed on machine A, here is a very simple way to do it:

On machine A, open a command window and change directory to where the files are and run this command:

python -m SimpleHTTPServer

This command starts a web server serving files from that directory.

On machine B, just open a browser and type the ip address of machine A and port 8000 and you can see all the files. When the transfer is done, simply press control-C in the command window. A simple way to temporarily share your files across the network!

django unicode integration: fix for venus djando template

I just upgraded django tree which recently merged in the unicode support. This immediately broke django templates for venus. Here is what you need to change in planet/shell/dj.py to account for new django changes:

43c43,46
< f.write(t.render(context)) --- > ss = t.render(context)
> if isinstance(ss,unicode):
> ss=ss.encode('utf-8')
> f.write(ss)

This is probably due to render returning unicode strings which need to be converted to byte-streams.

Update: I found out that my changes broke it for people using older version of django. I have updated the patch above to account for that.

django queryset weirdness

I needed to reset the django admin password and found this page which tells us how to do this. It did not work for me, however I tried to get the user object explicitly, reset the password and save it and it worked!
Here is a session describing the behavior:

> python manage.py shell
In [1]: from django.contrib.auth.models import User
In [2]: u=User.objects.all()
In [3]: u[0].password
Out[3]: 'sha1$0913d$6c5cfefb89b3c77dc8573e466a943c7acd177f6b'
In [9]: u[0].set_password('testme')
In [10]: u[0].save()
In [11]: u[0].password
Out[11]: 'sha1$0913d$6c5cfefb89b3c77dc8573e466a943c7acd177f6b'
In [12]: q=User.objects.get(id=1)
In [13]: q
Out[13]: <user : root>
In [14]: q.set_password('testme')
In [15]: q.save()
In [16]: q.password
Out[16]: 'sha1$24e9c$868ff39f08c3bde96397e33ea6a8847a658a16bc'

Maybe this is related to queryset caching, which would print the same password even after calling set_password. But it looks like in the first method, the password does not even get written to the database. This might be a bug. I will need to run more tests and report (or possibly patch) it...

Update: Perhaps I am not clear in the above post. The weirdness is that in the above two scenarios u[0] and q should essentially reference the same user object, but calling set_password (and later save) method on the u[0] reference does not seem to work (i.e. the password attribute remains unchanged and changed password cannot be used on the admin pages) but calling the same method on q reference seems to work!

Update2:I got it finally. Thanks all (Esp SmileyChris) ! every array slice is a new query! so in the first solution, I need this to make it work:

u=User.objects.all()
u[0].password # to display current hash (Query #1)
q=u[0] # (Query #2)
q.set_password('testme')
q.save()
q.password
u[0].password # (Query #3) this should print updated hash!

Escaping URLs vs escaping HTML

I often get confused between two types of escaping you need to do when developing web applications: URL escaping and HTML escaping. This is a short note about when should you use what.

URL Escaping (or HTTP encoding or URL encoding) is used to escape the characters not allowed to be permitted for a URL (e.g. To generate the query string to be passed from forms). The way to encode these characters is to use % and hexadecimal code for the ASCII character code. e.g. %20 for space character. (RFC 1738 defines the syntax and symantics of URLs)

HTML escaping is used when writing HTML documents where you do not want the browser to interpret the HTML special characters. e.g. You need to use &amp; to represent &. Wikipedia has an article describing escape sequences for comonly used special HTML characters.

Scripting languages have libraries which can do URL and HTML encoding and decoding for you:

Python: urllib.quote and urllib.unquote , ???, ???

Ruby: CGI::escape, CGI::unescape, CGI::escapeHTML, CGI::unescapeHTML

Django middleware

In the django project settings there is a key called MIDDLEWARE_CLASSES which is a tuple of strings implementing the middleware methods. Django base handler (TBD explain what this class does) reads this setting and initializes three of its own attributes: _request_middleware, _view_middleware and _response_middleware. It goes through the list of middleware classes instantiates each of them and adds the bound method process_request to the _request_middleware attribute, process_view method to view_middleware and process_reponse method to _response middleware.

If the middleware class method returns something (indicating that it has taken some action), the basehandler get_response method returns immediately. Otherwise it proceeds further.

There are 4 middleware classes built in:

  • AdminUserRequired middleware class implements the process_view method and silently returns if the user in the request is logged in and is a valid admin user. Otherwise it returns login page with appropriate error message.
  • CommonMiddleware implements process_view and process_response methods. process_view rejects forbidden user-agents and prepends URL with www and appends trailing slash if desired. process_response checks if there is a matching flat file present that can be sent for 404's and can also send email note to managers about broken links. process_response also manages ETags
  • CacheMiddleware implements process_request to check pages (not containing any query string) against cache. process_response adds pages to cache as needed.
  • XViewMiddleware (used internally for documentation) implements process_view and attaches 'X-View' header to internal HEAD requests.

Update: 10/14/2005. There are some updates to the middleware that ships with django and is now "officially" documented here.

django decorators

Django framework has used some design patterns. There is a directory called decorator which currently has two decorators: (decorator is just a method which dynamically adds additional functionality to original method depending on the situation)

  • funcA = login_required(funcA)
    This replaces the funcA with a function which checks if the user is logged in and calls original function if the user is indeed logged in and redirects the anonymous users to login page.
  • funcB=cache_page(funcB,cache_timeout,key_prefix)
    The original function is changed to look into the cached pages.

Getting to know the django web framework

I was just about to abandon python and join the ruby camp to be able to use the wonderful rails framework for web application development. (They do have very good documentation and impressive video demo which you should check out!) But then came the announcement of Django. I really like the python language and feel that I can understand someone else's python code, (though ruby looks equally fascinating). I read through the tutorials and checked out the svn repository and worked with the tutorials. This framework looks easy to use and seemingly makes your application portable enough to use any of the underlying database backends (postgresql, mysql, sqlite) and webservers (apache, lighthttpd, twistedweb etc). The initial few days after the announcement, there were very hectic updates on the code and documentation front (with support for sqlite backend , standalone server and new tutorial and documentation about generic views and form fields coming in a matter of a couple of days). This has now gradually slowed down.

I decided to write a sample application (rebate tracking) and immediately hit some issues. I am trying to make the user registration and login/logout part work, but am not following how that is hooked into the framework. The users added with admin interface do not get recognized by the authentication code. Tried IRC help, but haven't been able to get anyone who can help. I am studying the code right now. I am going to look closely into the decorator and middleware code now. I will write about my progress here.

Zeroconf for seamless networking…

I have HP's all in one device which is ethernet enabled. So all computers on the LAN can print to it/scan from it etc. The printer seems to use DHCP server to acquire the IP address and did not provide any name to the DHCP server. The windows version of software managed to detect the printer correctly and also everytime the address is changed. Then after the obligatory nmap port scan I discovered that it runs something called rendezvous protocol for autodiscovery. This is now standardized by IETF Zeroconf working group and is promoted by apple for seamless network configuration, autodiscovery etc. for SoHo users (and used by iPod, is built into MacOSX etc.) This also seems to be supported by HP, IBM.

There is ofcourse a competitive proposal called uPnP
(universal plug and pray play) which is endorsed by Microsoft.

Anyway, found some interesting links on zeroconf:
A very good article on O'Reilly network about zeroconf.

An interview with Stuart Cheshire (now with Apple) who authored zeroconf IETF RFC's.

An interesting thread on linux mailing list about adding this stuff to linux and politics behind such a thing!

Python and its implementation of zeroconf.

Rebate tracker use cases

  • Adding a new rebate:
    Step 1 Enter zip code for mailing address (zip+ext)
    Step 2 A list of rebates with same submission zip code is displayed
    Step 3
    Case 1 - Click on one of the links to prepopulate rebate form with entered information.
    (Display Submission address, Contact information, rebate validity, postmark date, expect check date)
    OR
    Case 2 - Enter new rebate and capture all information:
    (Product Name*), (Rebate amount*), (Rebate Description*), (Purchase valid from*), (Purchase valid until*), (Must postmark by*), (Expect check in*), (to ) weeks, (Submission address*), (City*), (State*), (Zip*), (Zipext), (Product Website), (Rebate Form URL), (Enquiry Phone), (Enquiry Email), (Inquiry Website)
    Display info as in Case 1.

    Step 4 Get (Purchase Location), (Purchase Date), (Purchase Price), (Date Mailed), (Postage), (Status: one of Mailed, Processing, Approved, Declined, Received, Void), (date Completed)
    Allow user to save/cancel data

  • Change Status of a rebate
    Currently active (status != Declined, Received, Void) rebates are displayed. One of them is selected, display the same data as in Step 4. and allow user to change and save data.
  • Periodic run
    Go through currently active rebates and email reminder for all rebates falling after (mail date + Expect check low)

Lateral Thinking…

How will you write a program to find jumbled words ?

The shotgun approach is the first one anyone is bound to follow at first. i.e. For all permutations of the letters, find if there is a match in the dictionary of words. You might do some optimizations to ignore repetitions etc. But this is O(n^2) complexity solution.

I read an elegant way to solve this here. The trick is to notice that the real answer and the jumbled word look the same when they the letters are sorted.
(Let's ignore the time to sort the words for now, which is O(n*log(n)) I believe for decent algorithms.)

Here is a python snippet to solve the jumble:

#!/bin/env python
def find_jumble(jumble, word_file='/usr/dict/words'):
    sorted_jumble = sort_chars(jumble)
    for dictword in open(word_file, 'r').xreadlines():
        if sorted_jumble == sort_chars(dictword):
            yield dictword

def sort_chars(word):
    w = list(word.strip().lower())
    w.sort()
    return w

while(1):
    inp = raw_input("Enter word: ")
    if not inp: break
    for ans in find_jumble(inp):
        print "Answer = ", ans