Escaping URLs vs escaping HTML

I often get confused between two types of escaping you need to do when developing web applications: URL escaping and HTML escaping. This is a short note about when should you use what.

URL Escaping (or HTTP encoding or URL encoding) is used to escape the characters not allowed to be permitted for a URL (e.g. To generate the query string to be passed from forms). The way to encode these characters is to use % and hexadecimal code for the ASCII character code. e.g. %20 for space character. (RFC 1738 defines the syntax and symantics of URLs)

HTML escaping is used when writing HTML documents where you do not want the browser to interpret the HTML special characters. e.g. You need to use & to represent &. Wikipedia has an article describing escape sequences for comonly used special HTML characters.

Scripting languages have libraries which can do URL and HTML encoding and decoding for you:

Python: urllib.quote and urllib.unquote , ???, ???

Ruby: CGI::escape, CGI::unescape, CGI::escapeHTML, CGI::unescapeHTML

2 thoughts on “Escaping URLs vs escaping HTML

Leave a Reply