HTML Purifier

Configuration Documentation

Table of Contents

Core

Core features that are always available.

Core.Encoding

Type:Case-insensitive string
Default:
"utf-8"
Used by:HTMLPurifier/Encoder.php
If for some reason you are unable to convert all webpages to UTF-8, you can use this directive as a stop-gap compatibility change to let HTML Purifier deal with non UTF-8 input. This technique has notable deficiencies: absolutely no characters outside of the selected character encoding will be preserved, not even the ones that have been ampersand escaped (this is due to a UTF-8 specific feature that automatically resolves all entities), making it pretty useless for anything except the most I18N-blind applications, although %Core.EscapeNonASCIICharacters offers fixes this trouble with another tradeoff. This directive only accepts ISO-8859-1 if iconv is not enabled.

Core.EscapeNonASCIICharacters

Type:Boolean
Default:
false
Used by:HTMLPurifier/Encoder.php
This directive overcomes a deficiency in %Core.Encoding by blindly converting all non-ASCII characters into decimal numeric entities before converting it to its native encoding. This means that even characters that can be expressed in the non-UTF-8 encoding will be entity-ized, which can be a real downer for encodings like Big5. It also assumes that the ASCII repetoire is available, although this is the case for almost all encodings. Anyway, use UTF-8! This directive has been available since 1.4.0.

Core.AcceptFullDocuments

Type:Boolean
Default:
true
Used by:HTMLPurifier/Lexer.php
This parameter determines whether or not the filter should accept full HTML documents, not just HTML fragments. When on, it will drop all sections except the content between body.

Core.CleanUTF8DuringGeneration

Type:Boolean
Default:
false
Used by:HTMLPurifier/Generator.php
When true, HTMLPurifier_Generator will also check all strings it escapes for UTF-8 well-formedness as a defense in depth measure. This could cause a considerable performance impact, and is not strictly necessary due to the fact that the Lexers should have ensured that all the UTF-8 strings were well-formed. Note that the configuration value is only read at the beginning of generateFromTokens.

Core.XHTML

Type:Boolean
Default:
true
Used by:HTMLPurifier/Generator.php
Determines whether or not output is XHTML or not. When disabled, HTML Purifier goes into HTML 4.01 removes XHTML-specific markup constructs, such as boolean attribute expansion and trailing slashes in empty tags. This directive was available since 1.1.

Core.TidyFormat

Type:Boolean
Default:
false
Used by:HTMLPurifier/Generator.php

Determines whether or not to run Tidy on the final output for pretty formatting reasons, such as indentation and wrap.

This can greatly improve readability for editors who are hand-editing the HTML, but is by no means necessary as HTML Purifier has already fixed all major errors the HTML may have had. Tidy is a non-default extension, and this directive will silently fail if Tidy is not available.

If you are looking to make the overall look of your page's source better, I recommend running Tidy on the entire page rather than just user-content (after all, the indentation relative to the containing blocks will be incorrect).

This directive was available since 1.1.1.

Core.EscapeInvalidTags

Type:Boolean
Default:
false
Used by:HTMLPurifier/Strategy.php
When true, invalid tags will be written back to the document as plain text. Otherwise, they are silently dropped.

Core.EscapeInvalidChildren

Type:Boolean
Default:
false
Used by:HTMLPurifier/ChildDef.php
When true, a child is found that is not allowed in the context of the parent element will be transformed into text as if it were ASCII. When false, that element and all internal tags will be dropped, though text will be preserved. There is no option for dropping the element but preserving child nodes.

Core.RemoveInvalidImg

Type:Boolean
Default:
true
Used by:HTMLPurifier/Strategy/RemoveForeignElements.php
This directive enables pre-emptive URI checking in img tags, as the attribute validation strategy is not authorized to remove elements from the document. This directive has been available since 1.3.0, revert to pre-1.3.0 behavior by setting to false.

Attr

Features regarding attribute validation.

Attr.EnableID

Type:Boolean
Default:
false
Used by:HTMLPurifier/AttrDef/HTML/ID.php
Allows the ID attribute in HTML. This is disabled by default due to the fact that without proper configuration user input can easily break the validation of a webpage by specifying an ID that is already on the surrounding HTML. If you don't mind throwing caution to the wind, enable this directive, but I strongly recommend you also consider blacklisting IDs you use (%Attr.IDBlacklist) or prefixing all user supplied IDs (%Attr.IDPrefix). This directive has been available since 1.2.0, and when set to true reverts to the behavior of pre-1.2.0 versions.

Attr.IDPrefix

Type:String
Default:
""
Used by:HTMLPurifier/AttrDef/HTML/ID.php
String to prefix to IDs. If you have no idea what IDs your pages may use, you may opt to simply add a prefix to all user-submitted ID attributes so that they are still usable, but will not conflict with core page IDs. Example: setting the directive to 'user_' will result in a user submitted 'foo' to become 'user_foo' Be sure to set %HTML.EnableAttrID to true before using this. This directive was available since 1.2.0.

Attr.IDPrefixLocal

Type:String
Default:
""
Used by:HTMLPurifier/AttrDef/HTML/ID.php
Temporary prefix for IDs used in conjunction with %Attr.IDPrefix. If you need to allow multiple sets of user content on web page, you may need to have a seperate prefix that changes with each iteration. This way, seperately submitted user content displayed on the same page doesn't clobber each other. Ideal values are unique identifiers for the content it represents (i.e. the id of the row in the database). Be sure to add a seperator (like an underscore) at the end. Warning: this directive will not work unless %Attr.IDPrefix is set to a non-empty value! This directive was available since 1.2.0.

Attr.IDBlacklistRegexp

Type:String (or null)
Default:
null
Used by:HTMLPurifier/AttrDef/HTML/ID.php
PCRE regular expression to be matched against all IDs. If the expression is matches, the ID is rejected. Use this with care: may cause significant degradation. ID matching is done after all other validation. This directive was available since 1.6.0.

Attr.AllowedRel

Type:Lookup array
Default:
Array
(
)
Used by:HTMLPurifier/AttrDef/HTML/LinkTypes.php
List of allowed forward document relationships in the rel attribute. Common values may be nofollow or print. By default, this is empty, meaning that no document relationships are allowed. This directive was available since 1.6.0.

Attr.AllowedRev

Type:Lookup array
Default:
Array
(
)
Used by:HTMLPurifier/AttrDef/HTML/LinkTypes.php
List of allowed reverse document relationships in the rev attribute. This attribute is a bit of an edge-case; if you don't know what it is for, stay away. This directive was available since 1.6.0.

Attr.DefaultTextDir

Type:String
Allowed values: "ltr", "rtl"
Default:
"ltr"
Used by:HTMLPurifier/AttrTransform/BdoDir.php
Defines the default text direction (ltr or rtl) of the document being parsed. This generally is the same as the value of the dir attribute in HTML, or ltr if that is not specified.

Attr.DefaultInvalidImage

Type:String
Default:
""
Used by:HTMLPurifier/AttrTransform/ImgRequired.php
This is the default image an img tag will be pointed to if it does not have a valid src attribute. In future versions, we may allow the image tag to be removed completely, but due to design issues, this is not possible right now.

Attr.DefaultInvalidImageAlt

Type:String
Default:
"Invalid image"
Used by:HTMLPurifier/AttrTransform/ImgRequired.php
This is the content of the alt tag of an invalid image if the user had not previously specified an alt attribute. It has no effect when the image is valid but there was no alt attribute present.

Attr.IDBlacklist

Type:Array list
Default:
Array
(
)
Used by:HTMLPurifier/Strategy/ValidateAttributes.php
Array of IDs not allowed in the document.

URI

Features regarding Uniform Resource Identifiers.

URI.AllowedSchemes

Type:Lookup array
Default:
Array
(
    [http] => 1
    [https] => 1
    [mailto] => 1
    [ftp] => 1
    [irc] => 1
    [nntp] => 1
    [news] => 1
)
Used by:HTMLPurifier/URISchemeRegistry.php
Whitelist that defines the schemes that a URI is allowed to have. This prevents XSS attacks from using pseudo-schemes like javascript or mocha.

URI.OverrideAllowedSchemes

Type:Boolean
Default:
true
Used by:HTMLPurifier/URISchemeRegistry.php
If this is set to true (which it is by default), you can override %URI.AllowedSchemes by simply registering a HTMLPurifier_URIScheme to the registry. If false, you will also have to update that directive in order to add more schemes.

URI.DefaultScheme

Type:String
Default:
"http"
Used by:HTMLPurifier/AttrDef/URI.php
Defines through what scheme the output will be served, in order to select the proper object validator when no scheme information is present.

URI.Host

Type:String (or null)
Default:
null
Used by:HTMLPurifier/AttrDef/URI.php
Defines the domain name of the server, so we can determine whether or an absolute URI is from your website or not. Not strictly necessary, as users should be using relative URIs to reference resources on your website. It will, however, let you use absolute URIs to link to subdomains of the domain you post here: i.e. example.com will allow sub.example.com. However, higher up domains will still be excluded: if you set %URI.Host to sub.example.com, example.com will be blocked. This directive has been available since 1.2.0.

URI.DisableExternal

Type:Boolean
Default:
false
Used by:HTMLPurifier/AttrDef/URI.php
Disables links to external websites. This is a highly effective anti-spam and anti-pagerank-leech measure, but comes at a hefty price: nolinks or images outside of your domain will be allowed. Non-linkified URIs will still be preserved. If you want to be able to link to subdomains or use absolute URIs, specify %URI.Host for your website. This directive has been available since 1.2.0.

URI.DisableExternalResources

Type:Boolean
Default:
false
Used by:HTMLPurifier/AttrDef/URI.php
Disables the embedding of external resources, preventing users from embedding things like images from other hosts. This prevents access tracking (good for email viewers), bandwidth leeching, cross-site request forging, goatse.cx posting, and other nasties, but also results in a loss of end-user functionality (they can't directly post a pic they posted from Flickr anymore). Use it if you don't have a robust user-content moderation team. This directive has been available since 1.3.0.

URI.DisableResources

Type:Boolean
Default:
false
Used by:HTMLPurifier/AttrDef/URI.php
Disables embedding resources, essentially meaning no pictures. You can still link to them though. See %URI.DisableExternalResources for why this might be a good idea. This directive has been available since 1.3.0.

URI.Munge

Type:String (or null)
Default:
null
Used by:HTMLPurifier/AttrDef/URI.php
Munges all browsable (usually http, https and ftp) URI's into some URL redirection service. Pass this directive a URI, with %s inserted where the url-encoded original URI should be inserted (sample: http://www.google.com/url?q=%s). This prevents PageRank leaks, while being as transparent as possible to users (you may also want to add some client side JavaScript to override the text in the statusbar). Warning: many security experts believe that this form of protection does not deter spam-bots. You can also use this directive to redirect users to a splash page telling them they are leaving your website. This directive has been available since 1.3.0.

URI.HostBlacklist

Type:Array list
Default:
Array
(
)
Used by:HTMLPurifier/AttrDef/URI.php
List of strings that are forbidden in the host of any URI. Use it to kill domain names of spam, etc. Note that it will catch anything in the domain, so moo.com will catch moo.com.example.com. This directive has been available since 1.3.0.

URI.Disable

Type:Boolean
Default:
false
Used by:HTMLPurifier/AttrDef/URI.php
Disables all URIs in all forms. Not sure why you'd want to do that (after all, the Internet's founded on the notion of a hyperlink). This directive has been available since 1.3.0.

HTML

Configuration regarding allowed HTML.

HTML.Doctype

Type:String (or null)
Default:
null
Used by:HTMLPurifier/HTMLModuleManager.php
Doctype to use, valid values are HTML 4.01 Transitional, HTML 4.01 Strict, XHTML 1.0 Transitional, XHTML 1.0 Strict, XHTML 1.1. Technically speaking this is not actually a doctype (as it does not identify a corresponding DTD), but we are using this name for sake of simplicity. This will override any older directives like %Core.XHTML or %HTML.Strict.

HTML.Strict

Type:Boolean
Default:
false
Used by:HTMLPurifier/HTMLDefinition.php
Determines whether or not to use Transitional (loose) or Strict rulesets. This directive has been available since 1.3.0.

HTML.BlockWrapper

Type:String
Default:
"p"
Used by:HTMLPurifier/HTMLDefinition.php
String name of element to wrap inline elements that are inside a block context. This only occurs in the children of blockquote in strict mode. Example: by default value, <blockquote>Foo</blockquote> would become <blockquote><p>Foo</p></blockquote>. The <p> tags can be replaced with whatever you desire, as long as it is a block level element. This directive has been available since 1.3.0.

HTML.Parent

Type:String
Default:
"div"
Used by:HTMLPurifier/HTMLDefinition.php
String name of element that HTML fragment passed to library will be inserted in. An interesting variation would be using span as the parent element, meaning that only inline tags would be allowed. This directive has been available since 1.3.0.

HTML.AllowedElements

Type:Lookup array (or null)
Default:
null
Used by:HTMLPurifier/HTMLDefinition.php
If HTML Purifier's tag set is unsatisfactory for your needs, you can overload it with your own list of tags to allow. Note that this method is subtractive: it does its job by taking away from HTML Purifier usual feature set, so you cannot add a tag that HTML Purifier never supported in the first place (like embed, form or head). If you change this, you probably also want to change %HTML.AllowedAttributes. Warning: If another directive conflicts with the elements here, that directive will win and override. This directive has been available since 1.3.0.

HTML.AllowedAttributes

Type:Lookup array (or null)
Default:
null
Used by:HTMLPurifier/HTMLDefinition.php
IF HTML Purifier's attribute set is unsatisfactory, overload it! The syntax is 'tag.attr' or '*.attr' for the global attributes (style, id, class, dir, lang, xml:lang).Warning: If another directive conflicts with the elements here, that directive will win and override. For example, %HTML.EnableAttrID will take precedence over *.id in this directive. You must set that directive to true before you can use IDs at all. This directive has been available since 1.3.0.

CSS

Configuration regarding allowed CSS.

No configuration directives defined for this namespace.

Test

Developer testing configuration for our unit tests.

Test.ForceNoIconv

Type:Boolean
Default:
false
Used by:HTMLPurifier/Encoder.php
When set to true, HTMLPurifier_Encoder will act as if iconv does not exist and use only pure PHP implementations.