HTML एंटिटी एन्कोडर

विशेष वर्णों को HTML entities में और इसके विपरीत कनवर्ट करें।

कोई डेटा आपके डिवाइस से नहीं जाता

सामान्य HTML एंटिटियाँ

वर्णएंटिटीसंख्यात्मकविवरण
&&&एम्परसेंड
<&lt;&#60;इससे छोटा
>&gt;&#62;इससे बड़ा
"&quot;&#34;दोहरा उद्धरण
'&apos;&#39;अपॉस्ट्रॉफ़ी
 &nbsp;&#160;अविच्छेद्य स्पेस
©&copy;&#169;कॉपीराइट
®&reg;&#174;पंजीकृत चिह्न
&trade;&#8482;ट्रेडमार्क
&euro;&#8364;यूरो चिह्न

HTML एंटिटियाँ क्यों इस्तेमाल करें?

<, > और & जैसे वर्णों का HTML में विशेष अर्थ होता है। यदि आप उन्हें HTML में अक्षरशः शामिल करते हैं, तो ब्राउज़र भ्रमित हो जाता है।

अक्सर पूछे जाने वाले प्रश्न

नामित और संख्यात्मक एंटिटियों में क्या अंतर है?

नामित एंटिटियाँ वर्णनात्मक नामों (&amp;, &copy;) का उपयोग करती हैं जबकि संख्यात्मक एंटिटियाँ Unicode कोड पॉइंट का उपयोग करती हैं।

क्या मुझे सभी विशेष वर्णों को एन्कोड करना चाहिए?

न्यूनतम, आपको HTML सामग्री और विशेषता मानों में &, <, > और " को एन्कोड करना चाहिए। आधुनिक ब्राउज़र अधिकांश अन्य को संभालते हैं।

Three forms, one character

An HTML "entity", formally a character reference: is an escape sequence that represents one character using a sequence of regular ASCII characters. The HTML Living Standard defines three concrete syntactic forms:

All three forms are interchangeable when rendered: &copy;, &#169;, and &#xA9; all produce the visible character © because they all resolve to Unicode code point U+00A9. The choice between them is a matter of source-code readability, not browser behaviour. Hex tends to match published Unicode charts (which use U+XXXX notation), so &#x2665; is closer to the official notation U+2665 than &#9829;. Numeric references work for any Unicode character, including astral-plane emoji, &#x1F600; renders as 😀 (U+1F600 GRINNING FACE).

Why entities exist

Three distinct historical and practical reasons:

  1. To escape characters that have syntactic meaning in HTML. The parser uses certain ASCII characters as control symbols. < opens a tag, > closes one, & introduces a reference, and the quote characters delimit attribute values. If you want any of those to appear as literal text, you must escape them.
  2. To represent characters not available in the document encoding. Before UTF-8 became universal on the web (it only crossed the 50% mark around 2010), most HTML was served as US-ASCII, ISO-8859-1, or Windows-1252. In those single-byte encodings, characters like ©, €, ≈, or α simply could not be expressed by a literal byte. Writing &copy;, &euro;, or &#8776; was the only way to reach those code points.
  3. To signal author intent for invisible or ambiguous characters. Even on a UTF-8 page, a literal non-breaking space (U+00A0) is visually identical to a normal space, writing &nbsp; makes the intent obvious to anyone reading the source.

The W3C now recommends using literal Unicode characters where possible rather than entities, "for accessibility and readability." Entities remain useful for the five required escapes, plus genuinely invisible or ambiguous characters.

The Big Five

The five characters you absolutely must escape when inserting untrusted content into HTML are <, >, &, ", and '. OWASP's Cross-Site Scripting Prevention Cheat Sheet enumerates them as the minimum required escape set:

CharNamedDecimalHex
<&lt;&#60;&#x3C;
>&gt;&#62;&#x3E;
&&amp;&#38;&#x26;
"&quot;&#34;&#x22;
'&apos; / &#39;&#39;&#x27;

The rule of thumb: whenever you place untrusted text into HTML output, escape these five characters first. Failure to do so is the root cause of the overwhelming majority of stored and reflected XSS vulnerabilities.

The apostrophe trap

&apos; is not part of HTML 4, it was originally defined only by XML 1.0 and inherited into XHTML 1.0. Internet Explorer prior to version 9 (released 2011) refused to render it as ' and would display the literal text &apos;. The entity was added to HTML5 specifically and is now safe in every modern browser, but for maximum cross-browser, cross-spec compatibility OWASP and most enterprise sanitisation libraries still recommend emitting &#39; instead of &apos; when escaping single quotes, particularly in security-critical code.

When to encode) and when not to

The encoding decision depends on where the text is going to land in the output, not what it contains. This is the single most-misunderstood point in HTML security. OWASP's guidance distinguishes the contexts:

Each context has its own escape rules. Mixing them is itself a vulnerability, for example, percent-encoding a < to %3C does not protect against XSS in HTML element context, where %3C is just the literal text %3C.

Avoid double-encoding. A common bug is to escape data when it's read into the system, again when it's stored, again when it's read out, and again when it's rendered. The result: the user typed 5 < 10, the database stores 5 &amp;lt; 10, the page renders 5 &lt; 10 instead of the original. The discipline is: store raw Unicode, encode once, at the moment of output, for the specific context.

HTML encoding vs URL encoding

Two different escape systems for two different contexts, conflated all the time:

HTML entityURL / percent
StandardHTML Living StandardRFC 3986
Format&name; or &#NN;%HH (hex byte)
ContextHTML markup, element bodies and attributesURLs, query strings, form-encoded request bodies
Space&nbsp; (non-breaking), never plain space%20 or +
JS function- (parser handles)encodeURIComponent() / encodeURI()

A URL inside an HTML attribute value gets both escapes layered: percent-encoding for URL-illegal characters first, then HTML encoding for any & < > " in the resulting URL. This is why query-string ampersands inside an href attribute become &amp; in the HTML serialisation.

A short history

HTML 2.0 (RFC 1866, 1995) inherited SGML's entity mechanism with about 50 named entities for ISO Latin 1. HTML 3.2 (W3C, January 1997) added the mathematical and symbol entities. HTML 4.01 (W3C, December 1999) finalised three entity sets (Latin-1, Special, and Symbol) totalling 252 named entities, which is the source of the "252" figure still seen in older tutorials. HTML5 / WHATWG (Living Standard, ongoing) absorbed and dramatically expanded the table to over 2,000 entries, primarily to cover MathML and a broader Unicode set. XML 1.0 (1998) defines its own minimal set of just the Big Five (&lt; &gt; &amp; &quot; &apos;), that minimal set is the origin of &apos;.

More questions

In modern code, what should I actually use?

Production code generally doesn't hand-roll entity encoding, it calls a library. DOMPurify for client-side HTML sanitisation. html.escape() in Python's standard library. htmlspecialchars() in PHP. html/template in Go (auto-escape on by default). OWASP Java Encoder for Java. In React, writing <div>{userInput}</div> escapes automatically; the escape hatch dangerouslySetInnerHTML is named to discourage casual use. A standalone encoder like this one is useful as a sanity-check / debugging tool, not a replacement for those libraries.

What about tags inside email templates?

Older email clients (Outlook in particular) interpret unencoded & as a malformed attribute and may strip surrounding markup. HTML email developers learn to entity-encode every special character defensively. The same applies to forum BBCode-style systems that rewrite content before storing it; round-trips can introduce unexpected literal entities.

What's textContent vs innerHTML in JavaScript?

The single most important XSS-prevention rule in vanilla JavaScript: use element.textContent = userInput rather than element.innerHTML = userInput. Setting textContent writes the string as literal text, the browser handles all escaping internally. Setting innerHTML parses the string as HTML, executing any <script> tags or event-handler attributes it contains. If markup is genuinely required, use a library like DOMPurify to sanitise first.

Can the encoder handle emoji?

Yes, via numeric references. There are no named entities for emoji, they all use numeric form. &#x1F600; renders as 😀, &#x2764;&#xFE0F; as the red heart ❤️ (heart code point plus emoji presentation selector). The browser handles the implicit UTF-16 surrogate-pair conversion internally; you should not write the surrogate halves manually.

Does anything get sent to a server?

No. Encoding and decoding are pure-function string transforms running entirely in your browser via JavaScript. Nothing about your input is uploaded; the page works offline once it's loaded. This matters because cloud-based encoders that round-trip your test payload can themselves become an XSS vector if the testing site is compromised.

संबंधित टूल