Category Archives: Internet

Yahoo Geocoding in ruby…

Find lattitude and longitude of any address

Yahoo just released a new beta of their maps webservice. Here is a small ruby script (inspired by Rasmus's PHP code ) that I wrote that returns Lattitude, Longitude of the address provided...

require 'open-uri'
require "rexml/document"
include REXML
url='http://api.local.yahoo.com/MapsService/V1/geocode?appid=yahoomap.rb&location='
puts 'Enter Location: '
address=gets
address=URI.escape(address)
result=URI(url+address).read
doc = Document.new result
r=doc.elements["/ResultSet/Result"]
print "Precision: ", r.attributes["precision"],"\n"
r.children.each { |c| print c.name, " : ",c.text,"\n"}

Update: Here is a link to the script in github or to the .

Library Lookup greasemonkey script for San Diego public library and San Diego County library

I promised here that I will polish my library lookup script and post it here, but haven't yet found any time to do that. I am posting it here so someone could work on it and make it better...

I have tried to look for the ISBN in both SDCL and SDPL, and it inserts the message and the link correctly. However in some of the cases, amazon inserts some more elements inside the same div container, so the links appear much later. (not immediately following the title). I have an idea of how to fix this and am going to fix this very soon...

Get the LibraryLookup Greasemonkey script from here.

(Usage: RIght click on the link and select "Install User Script" from the menu. OR open it in forefox, go to tools/Install this user script, click ok and now on you will see if any book that you are browsing on amazon.com is in SDCL or SDPL!)

Update: 11/30/05. The new bookburro extension for firefox supports SDPL and SDCL libraries by default, I will not have to update my miserly scripts now...

Anonymity on Internet

On the internet nobody knows you are a dog
On the internet nobody knows you are a dog
Unfortunately it's only true in cartoons! Basically you are leaving a surprisingly easy trail of the websites you visit. Visit Test anonymity if you want to find what web servers can know about you. A determined person can find out about the websites you browsed, what you did on each of them etc.

There are some commercial services like anonymizer that insert a random proxy between you and the destination web server. There are also a number of HTTP/Socks proxies that you can use. But then all of your traffic is subject to monitoring by these people.

Freenet project takes anonymity to other extreme and you can access content that you may not access otherwise, and also provides anti sensorship / banning features. But it has always been very slow, prone to protocol changes. (i.e. sites working the previous day do not work the next day because of release of new protocol and peer software).

Tor project takes another approach for this. The endpoints are still the same, but all your packets are routed using random combinations of tor routers. The routing technology is called onion routing where the encryption is only between hops in the route and none of the intermediate hops know either the contents of the packet or the sender. There is a provision for hidden services(any TCP protocol), which are not accessible from regular internet, which comes close to achieving what freenet does. I have been using tor for some time now and noticing some things:
* The performance is improving a great deal (as more and more tor nodes are commissioned, it will yield better performance)
* You can get routed through completely different continent, so going to google might open their german page (because they send you german page if they detect your IP address is from germany)
* This service might be easily abused by spammers who will definitely want to route spam through tor, child pornographer who can host "hidden services", illegal content downloaders. (Though I believe many tor nodes block SMTP and peer-to-peer traffic). I guess there is a price to be paid for "really free speech"

Zeroconf for seamless networking…

I have HP's all in one device which is ethernet enabled. So all computers on the LAN can print to it/scan from it etc. The printer seems to use DHCP server to acquire the IP address and did not provide any name to the DHCP server. The windows version of software managed to detect the printer correctly and also everytime the address is changed. Then after the obligatory nmap port scan I discovered that it runs something called rendezvous protocol for autodiscovery. This is now standardized by IETF Zeroconf working group and is promoted by apple for seamless network configuration, autodiscovery etc. for SoHo users (and used by iPod, is built into MacOSX etc.) This also seems to be supported by HP, IBM.

There is ofcourse a competitive proposal called uPnP
(universal plug and pray play) which is endorsed by Microsoft.

Anyway, found some interesting links on zeroconf:
A very good article on O'Reilly network about zeroconf.

An interview with Stuart Cheshire (now with Apple) who authored zeroconf IETF RFC's.

An interesting thread on linux mailing list about adding this stuff to linux and politics behind such a thing!

Python and its implementation of zeroconf.

Amazon’s “Statistically Improbable Phrases”

Amazon.com has quitely introduced this concept of "Statistically Improbably Phrases" (SIP's). They scan the entire books (with the permission of publisher/authors) for users to be able to "search inside" the book. During this they do some analysis of some phrases that occur frequently in the book, which otherwise does not occur outside of that book. So for the book: WCDMA for UMTS : Radio Access for Third Generation Mobile Communications, they list the following SIP's:

mobile transmission power, time division scheduling, power control signalling, uplink transmission power, more multipath diversity, cell interference ratio, own cell interference, air interface load, power control headroom, allocated bit rates, physical layer procedures section, kbps real time data, cell change order, uplink loading, radio resource management algorithms, adjacent channel interference problems, soft handover base stations, enhanced access channel, outer loop power control, fast power control, uplink coverage, downlink orthogonal codes, macro diversity gain, admission control strategy, system information blocks

Now using these SIP's you can search other books having them. I think this is a very clever idea of word frequency analysis to artificially gain knowledge about keywords in the book. So by searching for the SIP "mobile transmission power" they generate the following books:

10 references in WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, Revised Edition by Harri Holma (Editor), Antti Toskala (Editor)

7 references in WCDMA for UMTS, 2nd Edition by Harri Holma (Editor), Antti Toskala (Editor)

5 references in WCDMA: Towards IP Mobility and Mobile Internet by Tero Ojanpera (Editor), Ramjee Prasad (Editor)

1 reference in Adaptive Blind Signal and Image Processing by Andrzej Cichocki, Shun-ichi Amari

1 reference in Wireless Networks by P. Nicopolitidis, et al

BTW, here is what amazon.com says about SIP's

Update: (2005/05/05) There is an wired article on this topic: Judging a Book by Its Contents which talks about SIP's

Greasemonkey: Control your web!

Greasemonkey is a plugin for Firefox browser that lets you assign DHTML scripts to various domains. What's the big deal you ask ? This lets you correct some annoying problems some websites have or even add some nice features to your regular websites.

There are tons of user contributed scripts for doing fun things, like Adding waypoints to google maps., Remove ads from indiatimes.com etc.

The disadvantages of this are 1. this is firefox specific and 2. This works only on the computer that you installed the extension and scripts on. But hey! it's still way cool...

Update: 02/22/05
Here is a very nice application of this: Let's say you are browsing amazon.com for some books, how about checking out your local public library's catalogue for the same book and displaying that information right next to amazon book title ? This has been done by Jon Udel (but looks like he has taken down his script) and Bill Stilwell. I managed to tweak his script to search San Diego Public Library. Hurray! Contact me if you are interested. I will polish the script and put it here in a few days anyway...

Some more blog posts on greasemonkey:
http://taint.org/2005/03/16/201734a.html
http://javascript.weblogsinc.com/entry/1234000273026520/

**Update: 05/05/12**
Hmmm... [Dive into Greasemonkey](http://diveintogreasemonkey.org/ "Dive into Greasemonkey")

**Update: 05/18/05**
Wow this monkey is getting bigger and bigger.

[Here is](http://www.wired.com/news/technology/0,1282,67527,00.html "Firefox Users Monkey With the Web") a link to a recent wired article on greasemonkey.

Delicious

I use Del.icio.us as my online bookmark manager. It's so simple to use, yet so powerful. I especially like the capability to post the bookmark to my account using simple javascript bookmarks.

Here are my bookmarks.

These are the good things about this goody:

  • Everything is wide open, with no proprietary crap.
  • Bookmarks are tagged using one or more keywords that you choose.
  • There are tons of autogenerated RSS feeds: for all your bookmarks, for other's bookmarks, for everyones bookmarks, for a particular tag from all , you get the idea.
  • There is a nice API allowing you to use the data as you please (like on a sidebar on your website for starters)

One does wonder about how long a good thing can last for free. Hopefully this will last.

I use the following shell script to regularly backup my bookmarks to an xml file:


#!/bin/sh
umask 077
read -p "Enter your password : " -s pw
curl -u amit:${pw} 'http://del.icio.us/api/posts/all' > del.icio.us-backup-`date +%G%j%k%M%S`.xml 

Bittorrent

Bittorrent is a peer-peer protocol used for file distribution. What is good about it is that every downloader also acts as an uploader. The file is divided into smaller sized chunks, each with SHA1 hash. This is great for countering "slashdot effects", downloading the iso images immediately after they are released. I doubt, though, if it is very useful for longer term links...

There are many websites which host the torrent files for movies, music, TV shows, software apps.

It will be a cool project to keep searching for keywords appearing on such websites and automatically download the torrents to browse. May be have RSS feeds based on keywords... Hmmmm...

extended del.icio.us bookmarklets

Via: http://www.cs.ucf.edu/~cmillward/delish.php

extended del.icio.us bookmarklets

bookmarklets that I've found to extend del.icio.us functionality

Note:

In the scripts that post to your account, you will need to change USERNAME to your own username. I tried to name the
bookmarklets as usefully as possible, so hopefully the link title is appropriate.

del.icio.us linkulator

Use this bookmarklet to look at the del.icio.us history for any link you come across
via negatendo. written by Brett O'Connor.
extended del.icio.us post with prompt
This will post the current page to your del.icio.us

account and include in the extended field whatever text you have
selected on the page. If no text is selected, it will prompt you to
enter some.
modified by Seb. orginally posted by Bowen Dwelle.


extended del.icio.us post
This is my slight modification of script from above. It posts to del.icio.us, but it does not prompt you if you have not selected any text. For the most part, I find this more convenient.

Drupal Markdown plugin progress…

After spending a couple of days figuring out the drupal module engine, I think I do have a workable version of markdown plugin now ready. It works for me and a few others for now.

Some people have a concern for lock-in into a particular text format. This is because, in drupal, the data is stored in the text format (drupal/textile) and it is processed every time node is viewed. There are good and bad things about this: The good thing being, you are working at a higher level (really?) compared to raw HTML. So all of your modifications will be at that level. The bad thing is the lock-in: i.e. you are commited to markdown or textile format!

There are a couple of ways to counter this. One beauty of markdown is that there exists html2text which converts HTML to text... valid markdown text! But unfortunately
html2txt(markdown(txt)) != txt (it's close but not exact, it can never be)

So maybe we should store the markdown output to the database (instead of text format as now) and run html2txt everytime we want to edit/modify ?

Update: 2005/05/05 Recent versions of drupal already include markdown and textile plugin

How to chose good passwords ?

Came across this nice snippet on WSJ:
(via: Rajesh Jain of emergic.org)

I came across this article by Jeremy Wagstaff which is still as relevant today:

Base the password on mnemonics or acronyms, not words or names. Use your favorite song titles, movies, football teams as starters. It's got to be something that you know a lot about, but not something that other people can find out about you -- such as your birthday, your place of birth, or your kids' names. The first letters of the movie The Year of Living Dangerously, for example, could be used in conjunction with its two main stars, Mel Gibson and Sigourney Weaver, to read "tyoldmgsw."

That's just the start. Now you have something you can remember, but it's still just basic letters. You need to turn some of them into numbers, punctuation symbols and capitals. Try turning the "o" into a similar-looking zero, the "l" into a one and the "s" into a five. That would give you "ty01dmg5w" which is a lot better, and still easy to remember, since the numbers are similar to the letters they've replaced.

This, sadly, is still not good enough. The people who write hacking programs are on to this kind of trick, so your password is still vulnerable. It needs an extra trick or two. Try capitalizing the family-name letters, alter the 0 to similar-looking bracket marks (), and move the numeric characters one key to the left on your keyboard.

If your passwords are as good as that, then you should be safe. But there's still a weakness, and it's still human. Never give your passwords to anyone, don't reuse them for different accounts, and change them every few months. Store them on your personal digital assistant if you like, but remember that, even if it's in a well-encrypted file, all your valuable information is just one password away from being accessed by someone. If they steal your device, chances are they're eager enough to try to crack the password protecting all your passwords. Passwords are better kept in your head, triggered by things you'll never forget.

Contact me, but spam me NOT…

There is no mailto link on this website to contact me, the obvious reason being spammers who harvest the email addresses from webpages. Earlier they used to hound the newsgroups, then they turned to collecting mailto links, and now they are just scanning the pages to collect anything that looks like an email address.

One way to fight spam is to waste their resources. I had some hidden links on my pages which were not visible to "normal" visitors. (i.e. link in HTML comment or link with text of the same color as page background etc.) The only people who will access those links are spammers (and a few curious who want to view source of each webpage they visit ;-)) The link then takes them to a dynamically generated webpage, which contain a lot of links to similar dynamic pages and has plenty of meaningless email addresses. The intention is to just pollute the spam databases (i.e. making the value of the database less by adding lots of noise to the signal, if you know what I mean). Here is one example of such a scheme. You have to be careful though, to disallow search engine spiders fall into this trap.

I have now taken those links down. The benefits to me are minimal and spammers do get more powerful (by not following these links), this this is more of a cat and mouse game.

I am thinking of just adding a simple contact me form, using which people who do not know my email address can send email to me. To keep spammers away, it will have a simple puzzle (something like what day is today, which color is sky etc) selected randomly from a list and used to validate the form. I am sure there is something already existing to do exactly this...

Markdown plugin for drupal

I use drupal as the content management engine for some of my community websites that I maintain. I think drupal is one the best CMS engines out there. The recent versions using xtemplate theme look very pleasing to the eyes.

Unfortunately to add new content you have to type in HTML, and most of the HTML editors produce non-compliant HTML. I was looking for a way to type in simple formatted text which will be picked by some drupal module and entered as compliant HTML into the database ready to serve! I came across something called markdown which is exactly what I need. There is additional software called html2text which takes html and converts to markdown format. It should be possible (and dare I say easy) to integrate these two little beasts into drupal. I do definitely want to work on this in my spare time.

Few links:
How to create modules

Macrotags comes very close to description of my job, maybe should try to reuse its code base

drupal textile plugin

Markdown PHP
Continue reading Markdown plugin for drupal