Updated Reposted from my other journal http://brianary.blogspot.com/2005/07/entify-your-html.html
To the embarassingly uninformed third party vendors of web-based applications, I present a quick look at HTML entities. This is Chapter One stuff in even the most basic HTML book, but I still get puzzled, dismissive, and even indignant replies when I request fixes for simple HTML bugs.
Three important characters:
These characters are special to HTML for processing. In the text or attribute values of a page, you must use entities that stand for them:
&(respectively). In attributes,
"should also be replaced with
"(you can also use
"in text, but it isn't a requirement).
The Web Is A Big Place
If you forget to entify your special characters, some browsers will sometimes let you get away with it. If you intend to produce code for the widest possible audience (which is the whole point of the Internet, after all), it is best not to assume your indiscretions will always go unnoticed; better to do it right to start with, and you won't have to double check every support call ($$$) to see if unentified HTML is part of the problem.
Unentified HTML Is Insecure HTML
All Cross-Site Scripting (XSS) attacks are caused by unentified HTML, and can be prevented using entities. The liability of such an attack, though potentially considerable, is nothing compared to the loss of client trust.
Every web development language has a single function you can call to entify the contents of string or text variables (numeric and date/time variables do not typically require escaping), e.g.
Server.HTMLEncode()in Active Server Pages or
htmlentities()in PHP. In cases where the language does not provide such a function, writing one is trivial: four search-and-replace calls (do the ampersand first).
It just kills me how often I see unencoded HTML (of the severity that actually breaks things), and how defensive companies get when it's pointed out. As if it were a lengthy or difficult fix.