HTML Purifier
Configuration Documentation
Table of Contents
Core
Core features that are always available.
Core.Encoding
| Type: | Case-insensitive string |
|---|
| Default: | "utf-8" |
|---|
| Used by: | HTMLPurifier/Encoder.php |
|---|
If for some reason you are unable to convert all webpages to UTF-8, you can use this directive as a stop-gap compatibility change to let HTML Purifier deal with non UTF-8 input. This technique has notable deficiencies: absolutely no characters outside of the selected character encoding will be preserved, not even the ones that have been ampersand escaped (this is due to a UTF-8 specific feature that automatically resolves all entities), making it pretty useless for anything except the most I18N-blind applications, although %Core.EscapeNonASCIICharacters offers fixes this trouble with another tradeoff. This directive only accepts ISO-8859-1 if iconv is not enabled.
Core.EscapeNonASCIICharacters
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/Encoder.php |
|---|
This directive overcomes a deficiency in %Core.Encoding by blindly converting all non-ASCII characters into decimal numeric entities before converting it to its native encoding. This means that even characters that can be expressed in the non-UTF-8 encoding will be entity-ized, which can be a real downer for encodings like Big5. It also assumes that the ASCII repetoire is available, although this is the case for almost all encodings. Anyway, use UTF-8! This directive has been available since 1.4.0.
Core.AcceptFullDocuments
| Type: | Boolean |
|---|
| Default: | true |
|---|
| Used by: | HTMLPurifier/Lexer.php |
|---|
This parameter determines whether or not the filter should accept full HTML documents, not just HTML fragments. When on, it will drop all sections except the content between body.
Core.CleanUTF8DuringGeneration
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/Generator.php |
|---|
When true, HTMLPurifier_Generator will also check all strings it escapes for UTF-8 well-formedness as a defense in depth measure. This could cause a considerable performance impact, and is not strictly necessary due to the fact that the Lexers should have ensured that all the UTF-8 strings were well-formed. Note that the configuration value is only read at the beginning of generateFromTokens.
Core.XHTML
| Type: | Boolean |
|---|
| Default: | true |
|---|
| Used by: | HTMLPurifier/Generator.php |
|---|
Determines whether or not output is XHTML or not. When disabled, HTML Purifier goes into HTML 4.01 removes XHTML-specific markup constructs, such as boolean attribute expansion and trailing slashes in empty tags. This directive was available since 1.1.
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/Generator.php |
|---|
Determines whether or not to run Tidy on the final output for pretty formatting reasons, such as indentation and wrap.
This can greatly improve readability for editors who are hand-editing the HTML, but is by no means necessary as HTML Purifier has already fixed all major errors the HTML may have had. Tidy is a non-default extension, and this directive will silently fail if Tidy is not available.
If you are looking to make the overall look of your page's source better, I recommend running Tidy on the entire page rather than just user-content (after all, the indentation relative to the containing blocks will be incorrect).
This directive was available since 1.1.1.
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/Strategy.php |
|---|
When true, invalid tags will be written back to the document as plain text. Otherwise, they are silently dropped.
Core.EscapeInvalidChildren
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/ChildDef.php |
|---|
When true, a child is found that is not allowed in the context of the parent element will be transformed into text as if it were ASCII. When false, that element and all internal tags will be dropped, though text will be preserved. There is no option for dropping the element but preserving child nodes.
Core.RemoveInvalidImg
| Type: | Boolean |
|---|
| Default: | true |
|---|
| Used by: | HTMLPurifier/Strategy/RemoveForeignElements.php |
|---|
This directive enables pre-emptive URI checking in img tags, as the attribute validation strategy is not authorized to remove elements from the document. This directive has been available since 1.3.0, revert to pre-1.3.0 behavior by setting to false.
Attr
Features regarding attribute validation.
Attr.EnableID
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/ID.php |
|---|
Allows the ID attribute in HTML. This is disabled by default due to the fact that without proper configuration user input can easily break the validation of a webpage by specifying an ID that is already on the surrounding HTML. If you don't mind throwing caution to the wind, enable this directive, but I strongly recommend you also consider blacklisting IDs you use (%Attr.IDBlacklist) or prefixing all user supplied IDs (%Attr.IDPrefix). This directive has been available since 1.2.0, and when set to true reverts to the behavior of pre-1.2.0 versions.
Attr.IDPrefix
| Type: | String |
|---|
| Default: | "" |
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/ID.php |
|---|
String to prefix to IDs. If you have no idea what IDs your pages may use, you may opt to simply add a prefix to all user-submitted ID attributes so that they are still usable, but will not conflict with core page IDs. Example: setting the directive to 'user_' will result in a user submitted 'foo' to become 'user_foo' Be sure to set %HTML.EnableAttrID to true before using this. This directive was available since 1.2.0.
Attr.IDPrefixLocal
| Type: | String |
|---|
| Default: | "" |
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/ID.php |
|---|
Temporary prefix for IDs used in conjunction with %Attr.IDPrefix. If you need to allow multiple sets of user content on web page, you may need to have a seperate prefix that changes with each iteration. This way, seperately submitted user content displayed on the same page doesn't clobber each other. Ideal values are unique identifiers for the content it represents (i.e. the id of the row in the database). Be sure to add a seperator (like an underscore) at the end. Warning: this directive will not work unless %Attr.IDPrefix is set to a non-empty value! This directive was available since 1.2.0.
Attr.IDBlacklistRegexp
| Type: | String
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/ID.php |
|---|
PCRE regular expression to be matched against all IDs. If the expression is matches, the ID is rejected. Use this with care: may cause significant degradation. ID matching is done after all other validation. This directive was available since 1.6.0.
Attr.AllowedRel
| Type: | Lookup array |
|---|
| Default: | Array
(
)
|
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/LinkTypes.php |
|---|
List of allowed forward document relationships in the rel attribute. Common values may be nofollow or print. By default, this is empty, meaning that no document relationships are allowed. This directive was available since 1.6.0.
Attr.AllowedRev
| Type: | Lookup array |
|---|
| Default: | Array
(
)
|
|---|
| Used by: | HTMLPurifier/AttrDef/HTML/LinkTypes.php |
|---|
List of allowed reverse document relationships in the rev attribute. This attribute is a bit of an edge-case; if you don't know what it is for, stay away. This directive was available since 1.6.0.
Attr.DefaultTextDir
| Type: | String |
|---|
| Allowed values: |
"ltr",
"rtl" |
|---|
| Default: | "ltr" |
|---|
| Used by: | HTMLPurifier/AttrTransform/BdoDir.php |
|---|
Defines the default text direction (ltr or rtl) of the document being parsed. This generally is the same as the value of the dir attribute in HTML, or ltr if that is not specified.
Attr.DefaultInvalidImage
| Type: | String |
|---|
| Default: | "" |
|---|
| Used by: | HTMLPurifier/AttrTransform/ImgRequired.php |
|---|
This is the default image an img tag will be pointed to if it does not have a valid src attribute. In future versions, we may allow the image tag to be removed completely, but due to design issues, this is not possible right now.
Attr.DefaultInvalidImageAlt
| Type: | String |
|---|
| Default: | "Invalid image" |
|---|
| Used by: | HTMLPurifier/AttrTransform/ImgRequired.php |
|---|
This is the content of the alt tag of an invalid image if the user had not previously specified an alt attribute. It has no effect when the image is valid but there was no alt attribute present.
Attr.IDBlacklist
| Type: | Array list |
|---|
| Default: | Array
(
)
|
|---|
| Used by: | HTMLPurifier/Strategy/ValidateAttributes.php |
|---|
Array of IDs not allowed in the document.
URI
Features regarding Uniform Resource Identifiers.
URI.AllowedSchemes
| Type: | Lookup array |
|---|
| Default: | Array
(
[http] => 1
[https] => 1
[mailto] => 1
[ftp] => 1
[irc] => 1
[nntp] => 1
[news] => 1
)
|
|---|
| Used by: | HTMLPurifier/URISchemeRegistry.php |
|---|
Whitelist that defines the schemes that a URI is allowed to have. This prevents XSS attacks from using pseudo-schemes like javascript or mocha.
URI.OverrideAllowedSchemes
| Type: | Boolean |
|---|
| Default: | true |
|---|
| Used by: | HTMLPurifier/URISchemeRegistry.php |
|---|
If this is set to true (which it is by default), you can override %URI.AllowedSchemes by simply registering a HTMLPurifier_URIScheme to the registry. If false, you will also have to update that directive in order to add more schemes.
URI.DefaultScheme
| Type: | String |
|---|
| Default: | "http" |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Defines through what scheme the output will be served, in order to select the proper object validator when no scheme information is present.
URI.Host
| Type: | String
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Defines the domain name of the server, so we can determine whether or an absolute URI is from your website or not. Not strictly necessary, as users should be using relative URIs to reference resources on your website. It will, however, let you use absolute URIs to link to subdomains of the domain you post here: i.e. example.com will allow sub.example.com. However, higher up domains will still be excluded: if you set %URI.Host to sub.example.com, example.com will be blocked. This directive has been available since 1.2.0.
URI.DisableExternal
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Disables links to external websites. This is a highly effective anti-spam and anti-pagerank-leech measure, but comes at a hefty price: nolinks or images outside of your domain will be allowed. Non-linkified URIs will still be preserved. If you want to be able to link to subdomains or use absolute URIs, specify %URI.Host for your website. This directive has been available since 1.2.0.
URI.DisableExternalResources
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Disables the embedding of external resources, preventing users from embedding things like images from other hosts. This prevents access tracking (good for email viewers), bandwidth leeching, cross-site request forging, goatse.cx posting, and other nasties, but also results in a loss of end-user functionality (they can't directly post a pic they posted from Flickr anymore). Use it if you don't have a robust user-content moderation team. This directive has been available since 1.3.0.
URI.DisableResources
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Disables embedding resources, essentially meaning no pictures. You can still link to them though. See %URI.DisableExternalResources for why this might be a good idea. This directive has been available since 1.3.0.
URI.Munge
| Type: | String
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Munges all browsable (usually http, https and ftp) URI's into some URL redirection service. Pass this directive a URI, with %s inserted where the url-encoded original URI should be inserted (sample: http://www.google.com/url?q=%s). This prevents PageRank leaks, while being as transparent as possible to users (you may also want to add some client side JavaScript to override the text in the statusbar). Warning: many security experts believe that this form of protection does not deter spam-bots. You can also use this directive to redirect users to a splash page telling them they are leaving your website. This directive has been available since 1.3.0.
URI.HostBlacklist
| Type: | Array list |
|---|
| Default: | Array
(
)
|
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
List of strings that are forbidden in the host of any URI. Use it to kill domain names of spam, etc. Note that it will catch anything in the domain, so moo.com will catch moo.com.example.com. This directive has been available since 1.3.0.
URI.Disable
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/AttrDef/URI.php |
|---|
Disables all URIs in all forms. Not sure why you'd want to do that (after all, the Internet's founded on the notion of a hyperlink). This directive has been available since 1.3.0.
HTML
Configuration regarding allowed HTML.
HTML.Doctype
| Type: | String
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/HTMLModuleManager.php |
|---|
Doctype to use, valid values are HTML 4.01 Transitional, HTML 4.01 Strict, XHTML 1.0 Transitional, XHTML 1.0 Strict, XHTML 1.1. Technically speaking this is not actually a doctype (as it does not identify a corresponding DTD), but we are using this name for sake of simplicity. This will override any older directives like %Core.XHTML or %HTML.Strict.
HTML.Strict
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/HTMLDefinition.php |
|---|
Determines whether or not to use Transitional (loose) or Strict rulesets. This directive has been available since 1.3.0.
HTML.BlockWrapper
| Type: | String |
|---|
| Default: | "p" |
|---|
| Used by: | HTMLPurifier/HTMLDefinition.php |
|---|
String name of element to wrap inline elements that are inside a block context. This only occurs in the children of blockquote in strict mode. Example: by default value, <blockquote>Foo</blockquote> would become <blockquote><p>Foo</p></blockquote>. The <p> tags can be replaced with whatever you desire, as long as it is a block level element. This directive has been available since 1.3.0.
HTML.Parent
| Type: | String |
|---|
| Default: | "div" |
|---|
| Used by: | HTMLPurifier/HTMLDefinition.php |
|---|
String name of element that HTML fragment passed to library will be inserted in. An interesting variation would be using span as the parent element, meaning that only inline tags would be allowed. This directive has been available since 1.3.0.
HTML.AllowedElements
| Type: | Lookup array
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/HTMLDefinition.php |
|---|
If HTML Purifier's tag set is unsatisfactory for your needs, you can overload it with your own list of tags to allow. Note that this method is subtractive: it does its job by taking away from HTML Purifier usual feature set, so you cannot add a tag that HTML Purifier never supported in the first place (like embed, form or head). If you change this, you probably also want to change %HTML.AllowedAttributes. Warning: If another directive conflicts with the elements here, that directive will win and override. This directive has been available since 1.3.0.
HTML.AllowedAttributes
| Type: | Lookup array
(or null)
|
|---|
| Default: | null |
|---|
| Used by: | HTMLPurifier/HTMLDefinition.php |
|---|
IF HTML Purifier's attribute set is unsatisfactory, overload it! The syntax is 'tag.attr' or '*.attr' for the global attributes (style, id, class, dir, lang, xml:lang).Warning: If another directive conflicts with the elements here, that directive will win and override. For example, %HTML.EnableAttrID will take precedence over *.id in this directive. You must set that directive to true before you can use IDs at all. This directive has been available since 1.3.0.
CSS
Configuration regarding allowed CSS.
No configuration directives defined for this namespace.
Test
Developer testing configuration for our unit tests.
Test.ForceNoIconv
| Type: | Boolean |
|---|
| Default: | false |
|---|
| Used by: | HTMLPurifier/Encoder.php |
|---|
When set to true, HTMLPurifier_Encoder will act as if iconv does not exist and use only pure PHP implementations.