Mapa de caracteres gratuito
Navegue pelos caracteres Unicode por categoria, busque por nome ou ponto de código e copie para a área de transferência.
Como usar
- Clique em uma aba de categoria para exibir os caracteres desse grupo.
- Clique em um caractere para ver seus detalhes e as opções de cópia.
- Use a barra de busca para encontrar caracteres por nome (ex. « heart ») ou código hex (ex. « 2665 »).
- Clique em Copiar o caractere para copiar o caractere selecionado para a área de transferência.
Perguntas frequentes
O que é um ponto de código Unicode ?
Um ponto de código Unicode é um número único atribuído a cada caractere do padrão Unicode. É escrito em formato hexadecimal, muitas vezes prefixado por U+ (por ex. U+2665 para ♥).
O que é uma entidade HTML ?
Uma entidade HTML é um código especial que representa um caractere em HTML. Por exemplo, ♥ representa ♥. As entidades são úteis quando não é possível digitar diretamente um caractere.
O que é o código CSS ?
O código CSS usa a notação \\ para inserir um caractere a partir do seu ponto de código Unicode nas folhas de estilo. Por exemplo, .heart::before { content: "\\2665"; } insere ♥.
A short history of Unicode
Before Unicode, every region had its own incompatible character encoding: ASCII for English, the ISO 8859 family for European languages (8859-1 Latin-1, 8859-5 Cyrillic, 8859-6 Arabic), Windows code pages 1252 / 1251 / 1253–1258, multibyte sets for East Asian languages (Shift-JIS for Japanese, Big5 for Traditional Chinese, GB2312 for Simplified Chinese, EUC-KR for Korean). Mismatched encodings produced garbled text known by the Japanese term mojibake (文字化け, "character transformation"), opening a Japanese page in the wrong encoding gave you rows of question marks or random Latin-1 letters.
The work began in 1987 at Xerox. Joe Becker, with Lee Collins and Mark Davis at Apple, started investigating a single universal character set that could replace the patchwork. Becker's August 1988 draft proposal, "Unicode 88," explained: "the name 'Unicode' is intended to suggest a unique, unified, universal encoding." The Unicode Consortium was incorporated in January 1991 and shipped Unicode 1.0 in October that year with about 7,100 characters across 24 scripts.
As of Unicode 17.0 (released 9 September 2025) the standard contains about 159,801 characters across 172 scripts, with code space allocated for 1,112,064 valid code points, meaning Unicode has assigned roughly 14% of its possible space and has decades of headroom. Major recent milestones: Unicode 6.0 (2010) was the first version to formally encode emoji (722 of them, taken from the Japanese carriers); Unicode 17.0 added four new scripts (Sidetic, Tolong Siki, Beria Erfe, Tai Yo) and pushed the total CJK ideograph count over 100,000.
Code points, planes, and encodings
A code point is just a number, written in hexadecimal with a U+ prefix, like U+2665 for ♥. Code points are grouped into 17 planes of 65,536 code points each. Almost everything you've ever read lives on Plane 0, the Basic Multilingual Plane (BMP, U+0000 to U+FFFF). Plane 1 (the Supplementary Multilingual Plane) holds historical scripts (Linear B, Egyptian hieroglyphs, Cuneiform), musical notation, and almost all emoji. Planes 2 and 3 are CJK ideograph extensions. Planes 4–13 are unassigned, reserved for the future. Plane 14 carries variation selectors and emoji modifiers. Planes 15 and 16 are private-use areas where fonts and apps assign their own meanings.
A code point is just a number; an encoding is how that number gets stored as bytes. Unicode defines three:
- UTF-8: variable width, 1 to 4 bytes per character. Designed by Ken Thompson and Rob Pike at Bell Labs in 1992 (sketched on a New Jersey diner placemat). The first 128 code points (ASCII) take exactly 1 byte with the same binary value as ASCII, so a pure-ASCII file is already a valid UTF-8 file. As of January 2026, UTF-8 is used by roughly 98.9% of websites, the WHATWG-recommended encoding and the default for new text protocols.
- UTF-16: variable width, 2 or 4 bytes. BMP characters take 2 bytes; characters in supplementary planes take 4 bytes via surrogate pairs (a high surrogate U+D800–U+DBFF plus a low surrogate U+DC00–U+DFFF). Used internally by Windows APIs, Java, JavaScript (string
.lengthcounts UTF-16 code units, which is why an emoji often "counts as 2"), and Qt. Less than 0.004% of public web pages use it as transport. - UTF-32: fixed width, 4 bytes per code point. Simple to index but space-inefficient. Used internally by some Unix runtimes for direct code-point indexing; rare on disk or wire.
The 25 invisible whitespace characters
Unicode formally tags exactly 25 characters with the White_Space=yes property: regular space (U+0020), tab, line feed, carriage return, no-break space (U+00A0, the famous one that looks identical to a regular space but won't break across lines), the typographic widths in U+2000–U+200A, the line / paragraph separators (U+2028 / U+2029), the narrow no-break space common in French typography (U+202F), the medium mathematical space (U+205F), and the full-width ideographic space (U+3000) used in CJK text.
Several characters look invisible but are not classified as whitespace and behave differently from a regular space:
- U+200B Zero-Width Space: allows a line break with no visible gap; not whitespace by Unicode classification.
- U+200D Zero-Width Joiner: the glue inside multi-character emoji like family or profession sequences.
- U+200C Zero-Width Non-Joiner: controls ligature joining.
- U+00AD Soft Hyphen: invisible until the renderer breaks the line.
- U+FEFF Byte Order Mark: at the start of a file declares endianness; in the middle, an invisible no-break space. Excel's UTF-8 CSV exports prepend one, which often shows up in downstream tools as an unexpected leading character on the first column header.
These invisible characters are routinely the cause of "why won't this string match?" debugging sessions, paste any character into a character map's search and it will tell you the actual code point, so you can confirm whether you're looking at a smart quote masquerading as a straight one, or an NBSP where a regular space should be.
Useful character ranges
| Block | Range | Examples |
|---|---|---|
| Latin-1 Supplement | U+0080–U+00FF | à ñ ü © ® ¥ § ° ¶ |
| Greek | U+0370–U+03FF | α β γ π Σ Ω |
| Cyrillic | U+0400–U+04FF | Russian / Ukrainian / Bulgarian etc. |
| General Punctuation | U+2000–U+206F | - – … " " ' ' • † NBSP ZWSP |
| Currency Symbols | U+20A0–U+20CF | € £ ¥ ₩ ₽ ₹ ₿ |
| Letterlike Symbols | U+2100–U+214F | ™ ℠ № ℃ ℉ ℗ |
| Arrows | U+2190–U+21FF | ← → ↑ ↓ ↔ ⇒ ⇐ |
| Mathematical Operators | U+2200–U+22FF | ∑ ∫ ∞ √ ≠ ≤ ≥ ± ∂ ∇ ∈ ∪ ∩ |
| Box Drawing | U+2500–U+257F | ─ │ ┌ ┐ └ ┘ ├ ┤ ┬ ┴ ┼ ═ ║ ╔ ╗ |
| Math Alphanumerics | U+1D400–U+1D7FF | "Fancy text" generators (𝓗𝓮𝓵𝓵𝓸) draw from here. |
Special characters in everyday writing
The "I just need to type one symbol" use case, quick reference of what this tool exists to deliver in two clicks:
- Em dash - U+2014 (
—), sentence-level break. - En dash – U+2013 (
–)) ranges (1950–1975) and pairings (Boston–Hartford). - Ellipsis … U+2026 (
…), three dots as a single character. - Smart quotes: opening " U+201C, closing " U+201D, opening ' U+2018, closing ' U+2019.
- Non-breaking space U+00A0 (
), keeps "100 km" together. - Copyright © U+00A9, Registered ® U+00AE, Trademark ™ U+2122.
- Section § U+00A7, Pilcrow ¶ U+00B6, Degree ° U+00B0.
- Multiplication × U+00D7, Division ÷ U+00F7, neither is the letter
xor a slash.
When you'd reach for a character map
- Typing accented letters without the right keyboard layout: résumé, jalapeño, fiancée, naïve.
- Math and science: pasting ∑, ∫, ≠, π, ±, ∞, μ, Ω into a doc without launching the equation editor.
- Currency: the symbol you need is rarely on your keyboard. Euro €, yen ¥, peso ₱, rupee ₹.
- Punctuation in legal and academic writing: em dashes, smart quotes, the section sign §, the dagger †.
- Fancy display text for social-media bios and branding: Mathematical Alphanumeric Symbols (U+1D400–U+1D7FF) let you stylise text without using an image.
- CLI and TUI design: Box Drawing characters for ASCII-art borders, ncurses programs, and README diagrams.
- Debugging encoding bugs: paste a character to see its actual code point and confirm whether you've got a smart quote masquerading as a straight one.
Security: the homograph problem
Many Unicode characters look identical across scripts. The Cyrillic lowercase "а" (U+0430) is visually indistinguishable from the Latin "a" (U+0061). Attackers register internationalised domain names that look like legitimate ones (for instance an "apple.com" with a Cyrillic а in place of the Latin a) and use them for phishing. A 2017 attack on adoḅe.com used the dotted-below ḅ (U+1E05) to deliver malware. Modern browsers mitigate this with restrictive script-mixing rules, falling back to the ASCII Punycode form (xn--…) when a domain mixes scripts; Safari is particularly conservative. The same lookalike property that makes Unicode rich for human writing makes it dangerous in domains, and a character map is one way to confirm the actual code point of every character at a glance.
More questions
What's the difference between a character and a glyph?
A character is the abstract unit Unicode encodes, the letter A, regardless of typeface. A glyph is the specific drawing of that character in a particular font: A in Helvetica, A in Garamond, A in Comic Sans are all the same character but three different glyphs. Unicode encodes characters; fonts ship glyphs.
Why is "1.0" 7,000 characters but "17.0" is 160,000?
Unicode 1.0 covered 24 scripts, most of the world's living writing systems then in regular computing use. The growth since has come from three places: hugely expanding CJK ideograph coverage (pulling in historical Chinese characters and rare regional variants, Extension J added 4,298 in version 17.0 alone), formally encoding historical scripts (Linear B, Cuneiform, Egyptian hieroglyphs, Phoenician), and standardising emoji from 2010 onward.
What's an HTML entity?
A way to encode a character inside HTML using a special escape syntax. There are named entities for common characters (© for ©, — for -) and numeric entities for any code point (♥ or ♥ for ♥). They're useful when typing the character directly is awkward, say in source code with mixed encodings, or in a system that strips non-ASCII.
What about CSS escapes?
CSS uses backslash plus the hex code point: .heart::before { content: "\2665"; } inserts ♥. Useful inside ::before / ::after generated content, in CSS counter styles, and in any place where the source file's encoding can't be relied on.
Does anything get sent to a server?
No. The character data is bundled with the page; the search and category filtering run locally in JavaScript; Copy uses the browser's Clipboard API. Nothing leaves your device, and the page works offline once it's loaded.