Tar Parser Tutorial
This tutorial walks through parsing tar archives, extracting entries, and filtering files by type or path.
Prerequisites
- A Remix V3 project
- A tar archive to parse (from a URL, file, or API)
Step 1: Parse a Tar Archive from a URL
Use parseTar to download and parse a tar archive in one step. It buffers all entries into an array.
import { parseTar } from 'remix/tar-parser'
let response = await fetch('https://registry.npmjs.org/remix/-/remix-1.0.0.tgz')
// If the archive is gzipped, decompress first
let decompressed = response.body!.pipeThrough(new DecompressionStream('gzip'))
let entries = await parseTar(decompressed)
for (let entry of entries) {
console.log(`${entry.header.type}: ${entry.header.name} (${entry.header.size} bytes)`)
}What is DecompressionStream? Tar files are often gzip-compressed (.tar.gz or .tgz). DecompressionStream is a web-standard API that decompresses a stream on the fly. Pipe your response through it before passing to parseTar.
Step 2: Stream Large Archives
For large archives, use the TarParser class to process entries one at a time without holding the entire archive in memory.
import { TarParser } from 'remix/tar-parser'
let response = await fetch('https://example.com/large-archive.tar')
let parser = new TarParser(response.body!)
for await (let entry of parser) {
if (entry.header.type === 'file') {
console.log(`File: ${entry.header.name}`)
// Process the file content
let content = await entry.arrayBuffer()
// ... save, transform, etc.
}
}Step 3: Filter Entries by Type
Tar entries have a type field in their header. Common types are 'file', 'directory', and 'symlink'. Filter entries to process only what you need.
import { parseTar } from 'remix/tar-parser'
let entries = await parseTar(stream)
// Only files (skip directories and symlinks)
let files = entries.filter((e) => e.header.type === 'file')
// Only TypeScript source files
let tsFiles = files.filter((e) => e.header.name.endsWith('.ts'))
for (let entry of tsFiles) {
let source = await entry.text()
console.log(`--- ${entry.header.name} ---`)
console.log(source)
}Step 4: Extract Files to Disk
Combine the tar parser with the fs package to extract files to the local filesystem.
import { parseTar } from 'remix/tar-parser'
import { writeFile } from 'remix/fs'
let response = await fetch('https://example.com/project.tar')
let entries = await parseTar(response.body!)
for (let entry of entries) {
if (entry.header.type !== 'file') continue
let content = await entry.arrayBuffer()
let file = new File([content], entry.header.name)
await writeFile(`./extracted/${entry.header.name}`, file)
console.log(`Extracted: ${entry.header.name}`)
}Step 5: Read Entry Metadata
Each entry's header contains useful metadata beyond the name and type.
import { parseTar } from 'remix/tar-parser'
let entries = await parseTar(stream)
for (let entry of entries) {
let h = entry.header
console.log(`Name: ${h.name}`)
console.log(`Type: ${h.type}`) // 'file', 'directory', 'symlink'
console.log(`Size: ${h.size} bytes`)
console.log(`Mode: ${h.mode.toString(8)}`) // Octal permissions like '644'
console.log(`Modified: ${new Date(h.mtime)}`)
console.log('---')
}Summary
| Concept | What You Learned |
|---|---|
| Basic parsing | parseTar(stream) buffers all entries into an array |
| Streaming | TarParser yields entries one at a time for large archives |
| Gzip handling | Pipe through DecompressionStream('gzip') for .tar.gz files |
| Filtering | Check entry.header.type and entry.header.name |
| Extracting | Combine with writeFile to save entries to disk |
| Metadata | Access name, type, size, mode, mtime on entry.header |
Next Steps
- Write extracted files with fs
- Detect MIME types of extracted files with mime
- See the API Reference for all entry properties