What This Tool Does
This tool converts special characters (like <, >, &, ", ') into their HTML entity equivalents (<, >, &, ", ') so they display as literal text in web pages instead of being interpreted as HTML. It also decodes entities back to plain characters. Essential for displaying code, preventing XSS attacks, and escaping user-generated content.
Inputs Explained
- Mode: Encode (text → entities) or Decode (entities → text).
- Scope: Minimal (only unsafe chars) or Full (all non-ASCII).
- Style: Named entities (&) or numeric codes (&) when encoding.
- Input: Paste text or HTML-escaped content to process.
How It Works
Encoding replaces characters using either named entities (for the 5 core HTML characters plus common symbols) or numeric character references (N; or N;). Decoding uses the browser's native parser to resolve all standard HTML5 entity names and numeric codes back to their Unicode characters.
Formula / Logic Used
HTML Entities Encoder & Decoder
Encode special characters to HTML entities or decode them back to readable text.
Step-by-Step Example
Input: <h1>Hello & World</h1>
Encoded (named): <h1>Hello & World</h1>
Encoded (numeric): <h1>Hello & World</h1>
Decoding the encoded output returns the original input.
Use Cases
- Display code in HTML: Show HTML or code snippets on web pages as literal text instead of rendered markup.
- XSS prevention: Escape user-submitted content before rendering to prevent cross-site scripting attacks.
- Email templates: Encode special characters in HTML email content to prevent rendering issues.
- Forum and CMS: Clean user input before storing in databases to avoid HTML injection.
- Debugging: Decode entities in scraped or exported HTML to read the original content.
Assumptions and Limitations
- Encoding only protects against HTML injection in HTML contexts. Use separate escaping for attribute values, JavaScript, CSS, or URL contexts.
- Decoding uses the browser's HTML parser — it handles all standard HTML5 entities but may not recognize deprecated or non-standard ones.
- Numeric entities through (control characters) are technically valid but may not display as expected.
- For complete protection against XSS, encoding alone isn't enough — combine with Content Security Policy and input validation.
Frequently Asked Questions
What's the difference between named and numeric entities?
Named entities (<, &) are easier to read. Numeric entities (<, &) work universally — useful when the encoding is unclear or when including non-ASCII characters in ASCII-only contexts. Both produce the same rendered output.
Do I need to encode all special characters?
For HTML context, only 5 characters must be encoded: <, >, &, ", and '. For safety-critical apps or when your output encoding is uncertain, encoding all non-ASCII characters is more robust.
Does this prevent XSS attacks?
HTML encoding prevents XSS in HTML text contexts. For attribute values, JavaScript strings, or CSS contexts, different encoding is required. Always use context-aware escaping in web applications.
What about Unicode characters like emoji?
The 'Full' scope encodes all non-ASCII characters including emoji. Emoji codepoints above U+FFFF become pairs of surrogate entities. Minimal scope leaves Unicode as-is, which works in UTF-8 HTML.
Why does ''' appear instead of '''?
HTML4 didn't include ' — only XML did. For maximum compatibility across HTML versions, ' is preferred for the apostrophe. HTML5 supports both but we default to the safer form.
Can I decode entities from scraped HTML?
Yes. Paste the HTML content into the tool with Mode set to Decode. All standard HTML5 entities (including named like © and numeric like ©) will resolve to their actual characters.
Is my data stored anywhere?
No. Both encoding and decoding run entirely in your browser using native JavaScript and the browser's HTML parser. Text never leaves your device.
What's htmlspecialchars equivalent?
PHP's htmlspecialchars encodes the 5 HTML-significant characters (<, >, &, ", '). The Minimal scope in this tool does exactly the same thing, producing identical output.
Sources and References
- WHATWG HTML Spec — Named Character References — Complete list of standard HTML5 named entities.
- MDN — HTML Entities — Reference on HTML entity syntax and usage.
- OWASP — XSS Prevention Cheat Sheet — Context-aware escaping for web security.
- W3C — Character Entity References — Historical HTML4 entity reference (still used widely).