Amazon’s “Statistically Improbable Phrases”

Amazon.com has quitely introduced this concept of "Statistically Improbably Phrases" (SIP's). They scan the entire books (with the permission of publisher/authors) for users to be able to "search inside" the book. During this they do some analysis of some phrases that occur frequently in the book, which otherwise does not occur outside of that book. So for the book: WCDMA for UMTS : Radio Access for Third Generation Mobile Communications, they list the following SIP's:

mobile transmission power, time division scheduling, power control signalling, uplink transmission power, more multipath diversity, cell interference ratio, own cell interference, air interface load, power control headroom, allocated bit rates, physical layer procedures section, kbps real time data, cell change order, uplink loading, radio resource management algorithms, adjacent channel interference problems, soft handover base stations, enhanced access channel, outer loop power control, fast power control, uplink coverage, downlink orthogonal codes, macro diversity gain, admission control strategy, system information blocks

Now using these SIP's you can search other books having them. I think this is a very clever idea of word frequency analysis to artificially gain knowledge about keywords in the book. So by searching for the SIP "mobile transmission power" they generate the following books:

10 references in WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, Revised Edition by Harri Holma (Editor), Antti Toskala (Editor)

7 references in WCDMA for UMTS, 2nd Edition by Harri Holma (Editor), Antti Toskala (Editor)

5 references in WCDMA: Towards IP Mobility and Mobile Internet by Tero Ojanpera (Editor), Ramjee Prasad (Editor)

1 reference in Adaptive Blind Signal and Image Processing by Andrzej Cichocki, Shun-ichi Amari

1 reference in Wireless Networks by P. Nicopolitidis, et al

BTW, here is what amazon.com says about SIP's

Update: (2005/05/05) There is an wired article on this topic: Judging a Book by Its Contents which talks about SIP's