Does HTML to Markdown Converter work with large files?

Yes, since it runs in your browser, it can handle reasonably large inputs. Performance depends on your device's capabilities.

Can I use this tool on mobile devices?

Yes, this tool works on any device with a modern browser including phones and tablets.

Is there a character or line limit?

There is no hard limit. The tool can handle files with tens of thousands of lines, though very large inputs may slow down depending on your browser and device.

Free HTML to Markdown Converter

Convert HTML code to clean Markdown syntax.

HTML Input

Markdown Output

Supported HTML Elements

Headings: <h1> through <h6> → # through ######

Emphasis: <strong>, <em>, <del> → **bold**, *italic*, ~~strikethrough~~

Links: <a href> → [text](url)

Images: <img> → ![alt](src)

Code: <code>, <pre> → inline and fenced code blocks

Lists: <ul>, <ol> → - items, 1. items

Tables: <table> → Markdown table syntax

Other: <blockquote>, <hr>, <br>

What HTML-to-Markdown Conversion Actually Does

An HTML-to-Markdown converter parses an HTML fragment, walks the resulting DOM tree, and emits Markdown syntax for each element it recognises. <h1> becomes #; <strong> becomes **bold**; <a href="..."> becomes [text](url); <ul><li> becomes a bulleted list. The tool runs entirely in your browser via JavaScript: paste HTML on the left, click Convert to Markdown, and the formatted output appears on the right. No upload, no server round-trip, no telemetry, verify in DevTools' Network tab while you click Convert, or take the page offline (airplane mode) after it loads and the converter still works. This implementation uses the browser's built-in DOMParser to read the HTML, then a small recursive walker emits the Markdown for each node. It's a hand-written ~150-line converter rather than a wrapper around Turndown, which means it covers the common case cleanly but does not match Turndown's full configurability.

When You Actually Need This Conversion

The reverse direction (Markdown to HTML) is the famous one, every static-site generator and writing tool does it. The forward direction (HTML to Markdown) is less obvious but increasingly common because the writing-tool ecosystem has polarised: HTML is the ambient format of the web (every CMS, newsletter platform, CRM template, old static site emits or stores HTML); Markdown is the native format of every modern documentation, note-taking, and source-controlled-content workflow that has appeared since around 2014. Four real-world workflows generate this conversion need.

Article scraping for note-taking apps. Saving a web article into Obsidian, Notion, Logseq, or Roam, all of which store notes as Markdown files. Obsidian's official Web Clipper (released 2024) and the popular MarkDownload browser extension follow this pattern: extract the article DOM with Mozilla's Readability library, convert to Markdown, save to disk. When you can't reach the underlying HTML through an extension (paywalled content, embedded readers, content arriving by email), copy the rendered text, paste the resulting HTML into a converter, save the Markdown.
CMS migration to a Markdown-first SSG. Moving documentation from WordPress, Confluence, Drupal, Mediawiki, or Movable Type to MkDocs (Tom Christie, Python), Docusaurus (Meta, React), Hugo (Steve Francia, Go), Jekyll (Tom Preston-Werner, Ruby, still the engine behind GitHub Pages), Eleventy (Zach Leatherman, Node), VitePress (Evan You, Vue), or Sphinx-with-MyST. All of these are Markdown-native; existing CMS content has to be exported as HTML and converted en masse.
WYSIWYG editor output to portable Markdown. A writer uses Google Docs, Word, Apple Notes, Evernote or a CMS rich-text editor and the clipboard payload is HTML. They want clean Markdown to commit to Git, paste into a Markdown editor, send to a developer, or include in a docs site. Microsoft Word's "Save as Web Page" produces notoriously dense HTML (full of XML namespaces, mso-prefixed CSS, Office-specific tags) and a converter that ignores anything it doesn't understand and emits clean Markdown is exactly the right cleanup tool.
Email content for archival. Newsletter platforms (Substack, Kit/ConvertKit, Beehiiv, Mailchimp) all send formatted HTML. A reader who stores reference material in Markdown can view-source on the email body, paste the HTML into a converter and file the result into their notes.

The Reference Implementation: Turndown and Its Family

Turndown (Dom Christie) is the dominant JavaScript HTML-to-Markdown library, it started as to-markdown in 2012, was renamed Turndown in 2017 to disambiguate from forks, and is published as turndown on npm under the MIT licence. Its design is rule-based: each rule has a filter (which DOM nodes the rule fires on) and a replacement (a function that produces the Markdown). The constructor accepts options for heading style (atx # vs setext ===), bullet marker (-, +, or *), code block style (indented vs fenced), emphasis delimiter (* vs _), strong delimiter (** vs __), link style (inlined vs referenced) and so on. Tables, strikethrough, task list items and autolinks live in the separate turndown-plugin-gfm package. markdownify (Matthew Tretter) is the equivalent in Python, widely used in scraping pipelines, Jupyter notebook conversion, LangChain document loaders, and LLM dataset preparation. html2text (originally by Aaron Swartz, who also collaborated with John Gruber on the original Markdown design in 2004) is the older Python option, still in use in legacy email pipelines but largely superseded. html-to-markdown (Johannes Kaufmann) is a Go port of Turndown popular for self-contained scraping binaries. Pandoc (John MacFarlane, who chairs the CommonMark project) is the universal document converter, handles tables with merged cells via grid tables, math, citations, footnotes, definition lists, and converts between dozens of formats. Pandoc is the most featureful HTML-to-Markdown tool available, but it's a 60+ MB Haskell binary that has to be installed; it does not run in a browser.

The Fundamental Trade-off: HTML Is Richer Than Markdown

Every HTML-to-Markdown conversion is necessarily lossy because the source format expresses things the destination cannot. Inline styles (<span style="color:red">) have no Markdown grammar, Markdown's emphasis vocabulary is strictly bold/italic/strikethrough/code, with no syntax for arbitrary colour, font or size. CSS classes (<div class="alert">) carry meaning to a stylesheet but none to Markdown. Custom data attributes (data-track-event="...") are part of the page's JavaScript contract, not the document. Tables with merged cells (colspan, rowspan) cannot be expressed in GFM pipe tables. Embedded media (<video>, <audio>, <iframe>) and form controls have no Markdown equivalent. <details><summary> collapsibles, <figure><figcaption>, <ruby> annotations for CJK pronunciation, microdata and microformats, none survive the conversion. For each unsupported construct, the converter author chooses one of three strategies: translate to a Markdown approximation that loses some information, pass through as raw HTML embedded inside the Markdown (Markdown allows this by spec; CommonMark sections 4.6 and 6.6 cover it), or drop entirely. This implementation chooses "translate where there's a clear mapping, otherwise transparent-wrap (render the children, drop the tag)", a predictable, easy-to-reason-about default that handles the common case at the cost of advanced configurability.

The Canonical Mappings

Headings: <h1>–<h6> map to # through ###### (atx style). The older setext form (=== and --- underlines) is also valid for h1 and h2 but rarely used in 2026.
Paragraphs: <p> becomes plain text with surrounding blank lines. A paragraph break in Markdown is one or more blank lines.
Emphasis: <strong> and <b> become **bold**. <em> and <i> become *italic*. <del> and <s> become ~~strikethrough~~ (a GFM extension; not in CommonMark proper).
Links: <a href="url">text</a> becomes [text](url). The reference-link form ([text][1] with [1]: url at the bottom) is also valid Markdown.
Images: <img src="url" alt="text"> becomes ![text](url). The exclamation mark distinguishes images from links.
Code: <code> becomes inline backtick spans. <pre><code> becomes a triple-backtick fenced block. This implementation correctly handles the CommonMark requirement that inline code spans use a longer fence when the content contains backticks (matching the spec rule from CommonMark §6.1).
Lists: <ul> becomes - bulleted lines; <ol> becomes 1., 2., ... numbered lines. CommonMark accepts any starting number; renderers normalise.
Blockquotes: <blockquote> prefixes each line of the children with >.
Horizontal rules: <hr> becomes --- on its own line. *** and ___ are also valid.
Line breaks: <br> becomes a newline. CommonMark also accepts trailing two spaces or a backslash at line end.
Tables: <table> becomes a GFM pipe table, header row, delimiter row of ---, body rows. GFM extension; not in CommonMark core.

Honest Scope: What This Tool Does and Doesn't Do

Three honest limitations to know about. (1) Inline styles and CSS classes are dropped. A <span style="color:red"> becomes unstyled text; a <p class="lede"> loses its class. There's no Markdown grammar for arbitrary inline styling. (2) Tables with merged cells flatten. GFM pipe tables have no syntax for colspan or rowspan; merged-cell information is silently dropped. For complex tables, keep the source as HTML inside the Markdown (CommonMark allows embedded HTML) or use Pandoc for grid-table output. (3) Code blocks emit without language hints. If your HTML contains <pre><code class="language-js">, the language attribute is currently dropped, the output is an unlanguaged fenced block. You can manually add the language identifier after the opening backticks if your destination renderer supports syntax highlighting. The bigger caveat: if you paste the full HTML of a web page (from "View Page Source"), <script> and <style> contents will be emitted as plain text, almost certainly not what you want. The fix is to paste only the article content, or to copy from the rendered view (which strips scripts and styles automatically), or to sanitise the HTML through DOMPurify or similar before conversion.

Markdown's Shape in 2026

Markdown turned twenty-two in 2026. John Gruber published the original Perl script in 2004 with Aaron Swartz as the design collaborator. Tables were deliberately omitted from Gruber's original; the pipe-table syntax familiar to most readers today comes from later dialects, most prominently GitHub Flavored Markdown. CommonMark, the rigorous specification effort organised by Jeff Atwood and John MacFarlane in 2014, is now at version 0.31.2 (28 January 2024) and is the dialect most modern parsers target. GitHub Flavored Markdown (GFM, formalised in version 0.29-gfm on 6 April 2019) is the GFM superset that adds tables, task lists, strikethrough, autolinks, and disallowed-raw-HTML rules. GFM is the dialect most users actually see on the web because of GitHub's scale. Markdown is now the native format of essentially every developer documentation ecosystem; HTML remains the universal output format of the web; the conversion between them is exactly as common as the inverse, and exists for the moment when you need it in a hurry, in a browser, with no installation and no data leaving your device.

Privacy: Why Browser-Only Matters Here

HTML pasted into a converter often contains traces of the original source, internal CMS markup, draft content not yet published, customer data inside email templates, link URLs that reveal internal site structure, image references that point to private asset servers. Server-side converters upload all of it to a third-party service. This tool runs entirely in your browser via JavaScript: the HTML you paste never crosses the network, verify in DevTools' Network tab while you click Convert, or take the page offline (airplane mode) after it loads and the converter still works. Safe for unpublished drafts, customer email templates, internal documentation extracts, or any HTML you wouldn't want copied onto a stranger's hard drive.

Frequently Asked Questions

Does this work with large files?

Yes, because conversion runs in your browser, the practical ceiling is your device's available memory. Tens of thousands of lines convert in well under a second on a modern laptop. Very large inputs (millions of nodes) may briefly freeze the tab while the DOM walker recurses. For batch conversion of an entire CMS export, a script using Turndown in Node or markdownify in Python is the better tool.

What happens to inline styles and CSS classes?

Dropped entirely. Markdown's emphasis grammar covers bold, italic, strikethrough and code; there's no syntax for arbitrary colour, font, size, or class-driven styling. If visual styling matters in your output, either keep the original as HTML or use a richer destination format like AsciiDoc, reStructuredText, or MDX (Markdown plus JSX components, used by Docusaurus). For the article-archival and CMS-migration use cases this tool is built for, dropping styles is the correct behaviour, Markdown's whole point is to strip the visual noise and keep only the structure.

Does this tool work offline?

Yes, once the page has loaded, conversion runs entirely in JavaScript inside your browser. No network calls during conversion. Verify in DevTools' Network tab while you click Convert, or take the device offline (airplane mode) after the page loads and the tool still works.

Is this Turndown?

No. Turndown (Dom Christie's library) is the reference implementation in JavaScript and the obvious tool to reach for in a Node project, but it's a substantial dependency with full configurability for heading style, bullet markers, link style, code block style and so on. This in-browser tool is a smaller hand-written DOM walker (about 150 lines) that targets the common case (headings, paragraphs, emphasis, links, images, lists, blockquotes, fenced code, basic tables) without the configuration surface. For the workflows this site is built for (one-off conversions in a browser, no installation), the smaller implementation is the right shape; for production scraping pipelines that need configurable rules, Turndown remains the right choice.

How are tables handled?

As GFM pipe tables: a header row, a delimiter row of dashes, and one body row per <tr>. Pipe tables are flat, they cannot represent colspan, rowspan, multi-line cell content, lists inside cells, or per-cell alignment. If your HTML table uses any of those features, this converter emits a degraded pipe table that loses the extra structure. For complex tables, two practical options: (a) keep the table as raw HTML inside the Markdown (CommonMark allows embedded HTML) and trust that your destination renderer will pass it through; (b) use Pandoc with grid-table output, which can express merged cells.

Can I paste the full HTML of a web page?

You can, but you probably shouldn't. The full source of a modern web page contains <script> tags with JavaScript code, <style> blocks with CSS, tracking pixels, ad markup, and CMS template comments. This converter doesn't strip script and style content explicitly, so all of that ends up as plain text in your Markdown output. The clean approach: select only the article element in DevTools (right-click the article, "Inspect", then right-click the matching node in the Elements panel and "Copy outerHTML"), or use a content-extraction step (Mozilla's Readability library, or its packaged form in Firefox Reader View) before pasting. For browser-extension workflows that handle the extraction step automatically, see Obsidian Web Clipper or MarkDownload.