HTML Purifier

Download HTML Purifier

HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications. Tired of using BBCode due to the current landscape of deficient or insecure HTML filters? Have a WYSIWYG editor but never been able to use it? Looking for high-quality, standards-compliant, open-source components for that application you're building? HTML Purifier is for you!

I'd just like to say we use HTML Purifier in IRIS for filtering emails against XSS attacks and we've been more than impressed.
— Chris Corbyn, Senior IRIS Developer

Background

There are a number of open-source HTML filtering solutions out there on the web already (i.e. PEAR's HTML_Safe, kses and SafeHtmlChecker.class.php). What sets HTML Purifier apart from them? Aren't all of these choices “secure”?

When it comes to HTML, attention to detail is key. Does the library demonstrate an in-depth knowledge of the DTD that defines HTML? Does it perform its filtering off a robust whitelist rather than a usually out-dated blacklist? Does it go through the care to check every single attribute in the document for validity? Does it actually understand tag markup, or pay lip-service with a series of deficient regexes and str_replace's?

Somewhere along the way, all of HTML Purifier's predecessors fall flat. HTML_Safe dooms itself to attacks of the future by using a blacklist. Configurable filters like kses and PHP Input Filter still cannot validate the contents inside attributes. With all these gaps in coverage, none of the usual libraries come close to achieving standards-compliance. There is a user-unfriendly, draconic XML-based filter called Safe HTML Checker, but even it forgets that <a> tags cannot be nested within each other!

Know thy enemy. Wily hackers have a huge arsenal of XSS hidden within the depths of the HTML specification. HTML Purifier takes its effectiveness from the fact that it will decompose the whole document into tokens, and rigorously process the tokens by removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs. HTML Purifier's comprehensive algorithms are complemented by a breadth of knowledge, ensuring that richly formatted documents pass through unstripped.

Compare HTML Purifier with other filters

To my knowledge, there is nothing else in the wild that offers protection from XSS, standards-compliance, and the corrective processing of poorly formed HTML simultaneously. Don't take my word for it though: do your research. Investigate the other libraries, and decide for yourself who you would prefer to be the gatekeeper to your system.

To find out more, you can read the Comparison for a play-by-play analysis of the major filter libraries currently out there.

[Y]ou save my day by allowing me not to write another damned HTML parser.
— Joseph Halter, Technical Director at Akira Web

News

SVN viewer and migration

Tue, 17 April 2007 20:08:11 EDT

ViewVC for viewing our SVN repository and RSS changelog feeds for most of our HTML pages (for example, the changelog for this page is at index.rss) were rolled out a few weeks ago. Feel free to check them out.

Also, I've purchased the htmlpurifier.org domain so this website will be migrating to that address soon. I'm not in any particular hurry to get the migration done, but I hope to see some other changes in the website as well when the move is made. ;-)

Pro:PHP Podcast mention

Mon, 09 April 2007 23:23:44 EDT

I'd like to thank Pro::PHP podcast for mentioning HTML Purifier on their April 5, 2007 show. I've always been a fan of their informative podcasts, and was delighted to discover that they had decided to include HTML Purifier on the program list (even though it was at the very end).

Against my better judgment, I have a few clarifications I'd like to make about the podcast:

  • While HTML Purifier can use Tidy, it's completely optional. Tidy is exploited for pretty-printed HTML.
  • We do use the XSS cheatsheet for testing the library, but I actually did not know about the cheat-sheet until the library was well under development.
  • Yes, the top domain is actually a school band website that I'm borrowing hosting from. I'm playing around with getting a dedicated domain at htmlpurifier.org.

Once again, thanks for mentioning the library, perhaps someday I'll do a screencast going through some of HTML Purifier's major features.

HTML Purifier 1.6.0 released

Sun, 01 April 2007 23:40:59 EDT

Sorry, no April Fool's joke this year. To compensate, we have the 1.6.0 “Long Overdue” release. This version contains support for a number of deprecated attributes HTML Purifier should have had from the very beginning, including the name, bgcolor, border, width and height attributes. The CSS property 'height', rel and rev attributes and ID blacklist regexps are also available. In addition, HTML Purifier will give a friendly error message when you try to enable an element or attribute that doesn't exist.

All in all, this is a fairly compact release, but it does address some common requests brought up in the Forums, so I suggest you upgrade anyway. You can check News for a complete changelog, but there's not much else.

A note to you distributors

Wed, 28 March 2007 21:05:12 EDT

Yes, TikiWiki and PHProjekt, I'm looking at you. I am absolutely delighted that these two fairly popular and robust open-source projects are using my library. However, I am not at all pleased at the fact that you have not been keeping up to date with HTML Purifier releases.

I entreat yea, please sign up for the announcement list and keep my library up-to-date! It's not difficult, I keep backwards compatibility, and it makes your users happy! Especially that DOM XML bug, which seems was far more serious than I originally thought it was. That is all.

Update: I'm happy to say that PHProjekt has updated the library to 1.6.0. Still waiting on a response from TikiWiki though.

PEAR channel available

Sat, 24 March 2007 20:27:42 EDT

At the prompting of Lars Olesen, HTML Purifier now has its very own PEAR channel. This means that installing HTML Purifier is as simple as:

pear channel-discover hp.jpsband.org
pear install hp/HTMLPurifier

Plugins

HTML Purifier is a great library to integrate with existing CMSes and other applications or WYSIWYG editors. Currently, we have plugins for:

This plugin is on top of my favorite list[.] I am going to heavily depend on it since my clients insist on having WYSIWYG and I insist on having pages that validate and are semantically sound.
— David Molliere, MODx Marketing & Design Team

Plugins for other major applications gladly accepted!

Demo

Enter your HTML and see how it will be filtered!

HTML Purifier Input
XHTML 1.0 Strict output?

...or try these sample inputs:

Download

The current version is 1.6.0. Pick your distribution:

The PHP5-strict version is exactly the same as the regular version with a few tweaks to prevent it from complaining with E_STRICT warnings.This library is open-source, licensed under the LGPL v2.1+.

HTML Purifier is also available as a PEAR package. You can install it by executing:

pear channel-discover hp.jpsband.org
pear install hp/HTMLPurifier

You can also grab the latest developmental code from our Subversion repository. Simply execute this command:

svn co http://hp.jpsband.org/svnroot/htmlpurifier/trunk ./

...or browse anonymously at that address. Previous releases can be obtained by browsing the release directory or checking code out of the tags/ directory. You can also use ViewVC to view the repository.

SHA-1 checksums:

088569ae55d99bdbbee6031215ecc26f60489b70 htmlpurifier-1.6.0-strict.tar.gz
3deb033d6b20c22e7883cf2f7f719605fe6dd161 htmlpurifier-1.6.0-strict.zip
b4eed7787b84b7a86b24beaa5394616600780ceb htmlpurifier-1.6.0.tar.gz
3e375e83bc782e031362ce49c559e0d4f2511b6f htmlpurifier-1.6.0.zip

There are also .sig files which you can use to cryptographically verify that the release is from me, Edward Z. Yang. You can find my public key here (0x869C48DA). My key's fingerprint is: 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA.

Verify with these commands:

gpg --verify $filename.sig

You can be notified of new releases by a low-traffic announce list. Subscribe here:

Name: E-mail:

Resources

Spread the Word!

Help spread awareness about HTML Purifier by:

Contact

You can send me an email at htmlpurifier@jpsband.org. However, I prefer that you use the forums for asking general support questions (response time will be the same, I promise!) Any emails I receive will be considered public: if I think a solution I thought up to help you would be particularly useful to others, expect it to show up on the website.

Baccarat Game - Microgaming casinos - RTG online casinos