Free URL Parser & Decoder

Parse any URL into its components · protocol, host, port, path, query parameters, and fragment.

URL Anatomy: Six Components, One Long History

A URL is parsed into six conceptual parts: scheme://userinfo@host:port/path?query#fragment. The scheme tells the client which protocol to use (https, http, ftp, mailto, file, data) and is the only part that's always present. The userinfo component (username:password@) is rare in modern use; browsers generally strip it from displayed URLs because it has been a phishing vector since the 1990s. The host is the network location, a registered domain name, an IP address (IPv4 dotted-quad or IPv6 in square brackets) or a special name like localhost. The port is the TCP/UDP port (80 default for HTTP, 443 for HTTPS, etc.); when omitted, the scheme's default applies. The path is the slash-separated hierarchy that identifies the resource within the host. The query string (everything after the ?) carries key-value pairs separated by &, used for filtering, pagination, tracking, form submission. The fragment (everything after #) is the only part of the URL that is never sent to the server, it's processed entirely client-side by the browser to scroll to a specific section or, in single-page apps, to indicate route state.

The query string format itself has a fork: traditional ?key=value&key2=value2 with values percent-encoded per RFC 3986, vs the older form-encoded application/x-www-form-urlencoded convention where + means a space (originally for HTML form submissions). Most parsers handle both, but the conversion is asymmetric: %20 always decodes to a space; + only decodes to a space inside a query string, never inside a path. This is one of the most common URL-parsing bugs in the wild.

A Short History of the URL

The URL (originally "Universal Document Identifier," then "Universal Resource Locator") was invented by Tim Berners-Lee between his March 1989 "Information Management: A Proposal" memo at CERN (the one his boss Mike Sendall annotated "Vague but exciting") and the first publicly browsable web pages of August 1991. The canonical first URL was http://info.cern.ch/hypertext/WWW/TheProject.html, posted 6 August 1991. The 1992 IETF discussions renamed UDIs to URLs to dodge a vocabulary fight. RFC 1738 ("Uniform Resource Locators"), authored by Berners-Lee, Masinter and McCahill, was published in December 1994 as the first formal URL syntax. RFC 2396 followed in August 1998, generalising URLs into the broader URI concept. The current canonical spec is RFC 3986 ("URI Generic Syntax"), published January 2005, edited by Berners-Lee, Roy Fielding and Larry Masinter, an STD 66 Internet Standard, the IETF's highest maturity tier. RFC 3986 is what every URL parser nominally targets. In practice modern browsers diverge from RFC 3986 in numerous edge cases, which is why the WHATWG maintains a separate URL Living Standard at url.spec.whatwg.org; the WHATWG spec explicitly aims to obsolete RFC 3986 and RFC 3987 over time, and the two still differ on things like trailing whitespace handling, percent-encoding sets, and Unicode normalisation.

Unreserved, Reserved, and Percent-Encoded Characters

RFC 3986 §2.3 defines the unreserved characters: the only characters guaranteed safe in any URI component without percent-encoding: A-Z, a-z, 0-9, hyphen (-), period (.), underscore (_), and tilde (~). 66 characters total. Everything else is either a reserved character with structural meaning in some component (gen-delims (:/?#[]@) and sub-delims (!$&'()*+,;=)) or "other" and must be percent-encoded if it appears in a URI. Percent-encoding (RFC 3986 §2.1) takes the byte sequence of a character (in UTF-8 unless the scheme says otherwise) and replaces each byte with %HH where HH is the byte's two-digit hex value. So a UTF-8-encoded é (bytes 0xC3 0xA9) becomes %C3%A9; the Russian word привет becomes %D0%BF%D1%80%D0%B8%D0%B2%D0%B5%D1%82: two bytes per character, six %XX triplets and 36 percent-encoded characters of URL for six Cyrillic letters.

Browsers display percent-encoded paths in two ways: most modern browsers (Chrome, Firefox, Safari) decode and render the original Unicode glyphs in the address bar when the encoding is valid UTF-8, but copy the literal percent-encoded form when the user copies the URL. Older browsers and many web logs show only the percent-encoded form, which is why "pretty Unicode URLs" can be misleading: they look beautiful in the address bar and ugly in any text where they're shared. RFC 3987 ("Internationalized Resource Identifiers", IRIs), published January 2005, formalised Unicode URLs in their non-encoded form; Punycode (RFC 3492, March 2003) defines how internationalised domain names get encoded into ASCII for DNS, label by label, the Chinese top-level label 中国 becomes xn--fiqs8s, so example.中国 resolves at the DNS level as example.xn--fiqs8s. The canonical demonstration is Wikipedia's IRI URLs: https://ja.wikipedia.org/wiki/東京 works in any modern browser even though the underlying request encodes the path as /wiki/%E6%9D%B1%E4%BA%AC.

The WHATWG URL Standard, What Browsers Actually Do

The IETF's RFC 3986 says one thing; browsers do something slightly different. The WHATWG (the browser vendors' standards body) maintains a separate URL Living Standard at url.spec.whatwg.org describing the algorithmic state machine browsers actually run, including handling of leading whitespace, control characters, percent-encoding sets that vary by component, and Unicode normalisation. The WHATWG spec is what the browser URL constructor (new URL(input)) implements, and what Node.js, Deno and Bun all converged on for their built-in URL parsing. The Ada URL parser: written in C++ by Yagiz Nizipli, Daniel Lemire and others, became the WHATWG-conformant parser that has powered Node.js URL parsing since Node.js 18.16.0 (April 2023), replacing the older url.parse() path; it's measurably faster than every previous implementation and is the de facto standard for high-performance URL parsing in 2026. RFC 3986 and the WHATWG spec are still not fully reconciled, and historical divergence still shows up in legacy code paths and older runtime versions.

The Query String, and the URLSearchParams API

The query string is technically just "everything after the ? and before the #", the spec doesn't actually define how to interpret it. The ?key=value&key=value convention with & separators is convention, not requirement. In practice, two query string formats dominate: application/x-www-form-urlencoded (the default HTML form submission format, where + means a space) and the standard URI query convention (where space is always %20). The browser's URLSearchParams API (part of the WHATWG URL Living Standard) handles both formats transparently for parsing and emits the form-encoded variant when stringifying. Repeated keys are legal: ?tag=red&tag=blue&tag=green is valid, and URLSearchParams.getAll('tag') returns ['red', 'blue', 'green']. Different web frameworks handle the repeated-key case differently, Rails and Express collect repeated keys into arrays, while PHP overwrites earlier values with later ones unless the key uses the name[] bracket convention, which is a constant source of cross-framework bugs in API integrations.

Common URL-Parsing Gotchas

Common Use Cases

Privacy: URLs Carry Real Secrets

URLs are not generally treated as secret, but they often carry data that is. OAuth callback URLs include access tokens. Magic-link login URLs include single-use authentication tokens. Password-reset links include reset tokens. Internal API URLs include internal hostnames and routing paths that reveal infrastructure. Even ordinary application URLs reveal user behaviour through query parameters, search terms, filter selections, profile IDs, session identifiers. The Referer header leaks the previous URL to every linked-to site, mitigated by the Referrer-Policy header introduced as a W3C Candidate Recommendation in 2017 (browser defaults still vary). URLs end up in server access logs, in browser history, in browser bookmarks, in CDN logs, in analytics pipelines, in chat-app link previews. A server-side URL parser sees every URL pasted into it; a browser-only parser doesn't. For internal API URLs, OAuth callbacks with tokens, password-reset links, or any URL you wouldn't want copied onto a stranger's hard drive, a browser-only parser is the right architecture. Verify in DevTools' Network tab while you parse, or take the page offline (airplane mode) after it loads.

Frequently Asked Questions

What parts does a URL contain?

Six conceptual parts: scheme (https, http, ftp, mailto), userinfo (rare in modern use, mostly stripped by browsers as a phishing-mitigation), host (domain or IP), port (defaults to 80 for HTTP, 443 for HTTPS), path (slash-separated hierarchy), query (key-value pairs after ?), and fragment (after #, never sent to the server). The full grammar is in RFC 3986 §3 (January 2005, STD 66) and the WHATWG URL Living Standard.

How do I decode URL-encoded characters?

Percent-encoding replaces unsafe characters with a % followed by the byte's hex code: a space is %20, a colon is %3A, a forward slash is %2F, an ampersand is %26, the at-sign is %40. UTF-8 multi-byte characters are encoded byte-by-byte, so é becomes %C3%A9 (two bytes). The parser automatically decodes all percent-encoded characters in the displayed output. The standard JavaScript functions are encodeURIComponent() for encoding individual values and decodeURIComponent() for decoding.

What is a URL fragment?

The fragment (everything after #) is the only part of the URL that's processed entirely client-side, it's never sent to the web server in HTTP requests. Original purpose: scroll the browser to an anchor element with that ID. Modern uses include single-page application route state (#/dashboard/profile), OAuth implicit-flow tokens (now discouraged in favour of authorisation code with PKCE), and PDF page navigation (file.pdf#page=5). Because fragments don't reach the server, they're a place to stash values that shouldn't appear in server logs.

Why does + sometimes mean a space and sometimes mean +?

Two encoding conventions exist. application/x-www-form-urlencoded (the default HTML form submission format) encodes spaces as +; standard percent-encoding (per RFC 3986) encodes spaces as %20. Both are valid in query strings; only %20 is valid in paths and fragments. URLSearchParams handles both transparently. The cross-context bug arises when code uses encodeURIComponent (which encodes space as %20) for query parameters that the server expects in form-encoded form, or vice versa.

Does it handle relative URLs?

The parser expects a full URL with a scheme. For a relative path like /api/users, prepend a base URL (https://example.com/api/users) to parse it. Some relative-URL parsing (resolving against a base URL the way the browser does for href attributes) is on the roadmap, the WHATWG URL constructor's two-argument form (new URL(relative, base)) handles this and is what production code should use.

Are my URLs sent anywhere?

No. Parsing runs entirely in your browser via the WHATWG URL constructor, the URL you paste never leaves your device. Verify in DevTools' Network tab while you click Parse, or take the page offline (airplane mode) after it loads. Safe for OAuth callback URLs containing access tokens, password-reset links containing single-use tokens, internal API URLs that reveal infrastructure, or any URL you wouldn't want copied onto a stranger's hard drive.

Related Tools