Is HTML Entity Encoder & Decoder free to use?

Yes, HTML Entity Encoder & Decoder is completely free with no sign-up required. All processing happens in your browser for maximum privacy.

Is my data safe when using HTML Entity Encoder & Decoder?

Absolutely. HTML Entity Encoder & Decoder runs 100% in your browser. No files are uploaded to any server, your data never leaves your device.

Does HTML Entity Encoder & Decoder work with large files?

Yes, since it runs in your browser, it can handle reasonably large inputs. Performance depends on your device's capabilities.

Which characters must be encoded as HTML entities?

At minimum, you must encode ampersands (&), less-than signs ( ), and double quotes (") when they appear in HTML content or attribute values to prevent rendering errors and XSS vulnerabilities.

क्या फ़ाइल आकार की कोई सीमा है?

There is no hard server limit since all processing happens in your browser. However, very large files (over 50 MB) may be slower depending on your device memory and processor speed.

HTML एंटिटी एन्कोडर

विशेष वर्णों को HTML entities में और इसके विपरीत कनवर्ट करें।

कोई डेटा आपके डिवाइस से नहीं जाता

इनपुट आउटपुट

सामान्य HTML एंटिटियाँ

वर्ण	एंटिटी	संख्यात्मक	विवरण
&	&	&	एम्परसेंड
<	<	<	इससे छोटा
>	>	>	इससे बड़ा
"	"	"	दोहरा उद्धरण
'	'	'	अपॉस्ट्रॉफ़ी
			अविच्छेद्य स्पेस
©	©	©	कॉपीराइट
®	®	®	पंजीकृत चिह्न
™	™	™	ट्रेडमार्क
€	€	€	यूरो चिह्न

HTML एंटिटियाँ क्यों इस्तेमाल करें?

<, > और & जैसे वर्णों का HTML में विशेष अर्थ होता है। यदि आप उन्हें HTML में अक्षरशः शामिल करते हैं, तो ब्राउज़र भ्रमित हो जाता है।

अक्सर पूछे जाने वाले प्रश्न

नामित और संख्यात्मक एंटिटियों में क्या अंतर है?

नामित एंटिटियाँ वर्णनात्मक नामों (&, ©) का उपयोग करती हैं जबकि संख्यात्मक एंटिटियाँ Unicode कोड पॉइंट का उपयोग करती हैं।

क्या मुझे सभी विशेष वर्णों को एन्कोड करना चाहिए?

न्यूनतम, आपको HTML सामग्री और विशेषता मानों में &, <, > और " को एन्कोड करना चाहिए। आधुनिक ब्राउज़र अधिकांश अन्य को संभालते हैं।

तीन रूप, एक Character

एक HTML «entity», formally एक character reference: एक escape sequence है जो regular ASCII characters की sequence use करके एक character represent करता है। HTML Living Standard तीन concrete syntactic forms define करता है:

Named: & WHATWG named-character-reference table से एक name के बाद, semicolon से terminated। उदाहरण: © के लिए ©।
Decimal numeric: &# Unicode code point को base 10 में, semicolon से terminated। उदाहरण: ©।
Hexadecimal numeric: &#x code point को base 16 में, semicolon से terminated। उदाहरण: ©।

Rendered होने पर तीनों forms interchangeable हैं: ©, ©, और © सभी visible character © produce करते हैं क्योंकि सभी Unicode code point U+00A9 resolve करते हैं। इनके बीच choice source-code readability का matter है, browser behaviour का नहीं। Hex published Unicode charts (जो U+XXXX notation use करते हैं) से match करता है, इसलिए ♥ official notation U+2665 के closer है बजाय ♥ के। Numeric references किसी भी Unicode character के लिए काम करते हैं, astral-plane emoji सहित, 😀 😀 (U+1F600 GRINNING FACE) के रूप में render होता है।

Entities क्यों Exist करती हैं

तीन distinct historical और practical reasons:

HTML में syntactic meaning रखने वाले characters को escape करने के लिए। Parser certain ASCII characters को control symbols के रूप में use करता है। < एक tag open करता है, > बंद करता है, & एक reference introduce करता है, और quote characters attribute values delimit करते हैं। यदि आप उनमें से कोई literal text के रूप में appear करना चाहते हैं, आपको उन्हें escape करना होगा।
Document encoding में available न होने वाले characters represent करने के लिए। UTF-8 के web पर universal बनने से पहले (इसने 2010 के आसपास 50% mark cross किया), अधिकांश HTML US-ASCII, ISO-8859-1, या Windows-1252 के रूप में serve की जाती थी। उन single-byte encodings में, ©, €, ≈, या α जैसे characters simply एक literal byte से express नहीं किए जा सकते थे। ©, €, या ≈ लिखना ही उन code points तक reach करने का एकमात्र तरीका था।
Invisible या ambiguous characters के लिए author intent signal करने के लिए। UTF-8 page पर भी, literal non-breaking space (U+00A0) visually normal space के identical है,   लिखने से source पढ़ने वाले को intent obvious हो जाता है।

W3C अब entities की बजाय जहां possible literal Unicode characters use करने की recommend करता है, «for accessibility and readability.» Entities पांच required escapes के लिए और genuinely invisible या ambiguous characters के लिए useful रहती हैं।

पाँच मुख्य

HTML में untrusted content insert करते समय आपको absolutely escape करने होने वाले पांच characters हैं <, >, &, ", और '। OWASP का Cross-Site Scripting Prevention Cheat Sheet उन्हें minimum required escape set के रूप में enumerate करता है:

Char	Named	Decimal	Hex
<	`<`	`<`	`<`
>	`>`	`>`	`>`
&	`&`	`&`	`&`
"	`"`	`"`	`"`
'	`'` / `'`	`'`	`'`

Rule of thumb: जब भी untrusted text HTML output में place करें, पहले इन पांच characters को escape करें। ऐसा न करना stored और reflected XSS vulnerabilities की overwhelming majority का root cause है।

Apostrophe Trap

' HTML 4 का हिस्सा नहीं है, यह originally केवल XML 1.0 द्वारा define किया गया था और XHTML 1.0 में inherited हुआ। Internet Explorer version 9 (released 2011) से पहले इसे ' के रूप में render करने से इनकार करता था और literal text ' display करता था। Entity को HTML5 में specifically add किया गया और अब हर modern browser में safe है, लेकिन maximum cross-browser, cross-spec compatibility के लिए OWASP और अधिकांश enterprise sanitisation libraries single quotes escape करते समय ' की बजाय ' emit करने की recommend करती हैं, particularly security-critical code में।

कब Encode करें और कब नहीं

Encoding decision इस पर depend करती है कि text output में कहाँ land होगा, उसमें क्या है उस पर नहीं। यह HTML security में सबसे ज़्यादा misunderstood point है। OWASP की guidance contexts को distinguish करती है:

HTML element content: < > & " escape करें (और paranoia के लिए ')।
HTML attribute values: same plus quote character escape करें; हमेशा quoted attributes use करें।
JavaScript context: JavaScript escape use करें, HTML escape नहीं: \xHH या \uHHHH।
CSS context: CSS escape use करें: \HH space के बाद।
URL / URI parameter context: percent-encoding (%HH) use करें, HTML encoding नहीं।

हर context के अपने escape rules हैं। उन्हें mix करना itself एक vulnerability है, उदाहरण के लिए, < को percent-encode करके %3C करने से HTML element context में XSS से protection नहीं मिलती, जहाँ %3C बस literal text %3C है।

Double-encoding avoid करें। एक common bug है data को system में read होने पर escape करना, फिर store होने पर, फिर read out होने पर, और फिर render होने पर। Result: user ने 5 < 10 type किया, database 5 &lt; 10 store करता है, page original की बजाय 5 < 10 render करता है। Discipline है: raw Unicode store करें, एक बार encode करें, output के moment पर, specific context के लिए।

HTML Encoding vs URL Encoding

दो अलग contexts के लिए दो different escape systems, हर समय conflate किए जाते हैं:

	HTML इकाई	URL / प्रतिशत
Standard	HTML Living Standard	RFC 3986
Format	`&name;` या `&#NN;`	`%HH` (hex बाइट)
Context	HTML markup, element bodies और attributes	URLs, query strings, फ़ॉर्म-encoded request bodies
Space	` ` (non-breaking), plain space कभी नहीं	`%20` या `+`
JS फ़ंक्शन	- (parser handle करता है)	`encodeURIComponent()` / `encodeURI()`

HTML attribute value के अंदर URL को दोनों escapes layered मिलते हैं: पहले URL-illegal characters के लिए percent-encoding, फिर resulting URL में किसी भी & < > " के लिए HTML encoding। यही कारण है कि href attribute के अंदर query-string ampersands HTML serialisation में & बन जाते हैं।

संक्षिप्त इतिहास

HTML 2.0 (RFC 1866, 1995) ने ISO Latin 1 के लिए लगभग 50 named entities के साथ SGML का entity mechanism inherit किया। HTML 3.2 (W3C, January 1997) ने mathematical और symbol entities add किए। HTML 4.01 (W3C, December 1999) ने तीन entity sets (Latin-1, Special, और Symbol) finalize किए जिनमें कुल 252 named entities थीं, जो older tutorials में अभी भी दिखने वाली «252» figure का source है। HTML5 / WHATWG (Living Standard, ongoing) ने table को absorb करके 2,000 से अधिक entries तक dramatically expand किया, primarily MathML और broader Unicode set cover करने के लिए। XML 1.0 (1998) केवल Big Five (< > & " ') का अपना minimal set define करता है, वह minimal set ' की origin है।

अधिक प्रश्न

Modern code में, मुझे actually क्या use करना चाहिए?

Production code generally entity encoding hand-roll नहीं करता, यह library call करता है। Client-side HTML sanitisation के लिए DOMPurify। Python की standard library में html.escape()। PHP में htmlspecialchars()। Go में html/template (default में auto-escape on)। Java के लिए OWASP Java Encoder। React में, <div>{userInput}</div> लिखने पर automatically escape होता है; escape hatch dangerouslySetInnerHTML को casual use discourage करने के लिए named है। इस जैसा standalone encoder sanity-check / debugging tool के रूप में useful है, उन libraries का replacement नहीं।

Email templates के अंदर Tags के बारे में क्या?

Older email clients (Outlook विशेष रूप से) unencoded & को malformed attribute के रूप में interpret करते हैं और surrounding markup strip कर सकते हैं। HTML email developers हर special character को defensively entity-encode करना सीखते हैं। यही forum BBCode-style systems पर भी apply होता है जो content को store करने से पहले rewrite करते हैं; round-trips unexpected literal entities introduce कर सकते हैं।

JavaScript में textContent vs innerHTML क्या है?

Vanilla JavaScript में सबसे important XSS-prevention rule: element.textContent = userInput use करें element.innerHTML = userInput की बजाय। textContent set करने पर string literal text के रूप में write होती है, browser internally सारी escaping handle करता है। innerHTML set करने पर string को HTML के रूप में parse करता है, उसमें मौजूद किसी भी <script> tag या event-handler attributes को execute करते हुए। यदि markup genuinely required है, पहले DOMPurify जैसी library से sanitise करें।

क्या Encoder Emoji Handle कर सकता है?

हां, numeric references के via। Emoji के लिए कोई named entities नहीं हैं, सभी numeric form use करते हैं। 😀 😀 के रूप में render होता है, ❤️ red heart ❤️ के रूप में (heart code point plus emoji presentation selector)। Browser implicitly UTF-16 surrogate-pair conversion internally handle करता है; surrogate halves manually नहीं लिखने चाहिए।

क्या कुछ Server को Send होता है?

नहीं। Encoding और decoding pure-function string transforms हैं जो आपके browser में JavaScript के via completely run होती हैं। आपके input के बारे में कुछ भी upload नहीं होता; page एक बार load होने के बाद offline काम करता है। यह matter करता है क्योंकि cloud-based encoders जो आपका test payload round-trip करते हैं खुद XSS vector बन सकते हैं यदि testing site compromised हो।