Skip to content

Tar Parser Tutorial

This tutorial walks through parsing tar archives, extracting entries, and filtering files by type or path.

Prerequisites

  • A Remix V3 project
  • A tar archive to parse (from a URL, file, or API)

Step 1: Parse a Tar Archive from a URL

Use parseTar to download and parse a tar archive in one step. It buffers all entries into an array.

ts
import { parseTar } from 'remix/tar-parser'

let response = await fetch('https://registry.npmjs.org/remix/-/remix-1.0.0.tgz')

// If the archive is gzipped, decompress first
let decompressed = response.body!.pipeThrough(new DecompressionStream('gzip'))

let entries = await parseTar(decompressed)

for (let entry of entries) {
  console.log(`${entry.header.type}: ${entry.header.name} (${entry.header.size} bytes)`)
}

What is DecompressionStream? Tar files are often gzip-compressed (.tar.gz or .tgz). DecompressionStream is a web-standard API that decompresses a stream on the fly. Pipe your response through it before passing to parseTar.

Step 2: Stream Large Archives

For large archives, use the TarParser class to process entries one at a time without holding the entire archive in memory.

ts
import { TarParser } from 'remix/tar-parser'

let response = await fetch('https://example.com/large-archive.tar')
let parser = new TarParser(response.body!)

for await (let entry of parser) {
  if (entry.header.type === 'file') {
    console.log(`File: ${entry.header.name}`)

    // Process the file content
    let content = await entry.arrayBuffer()
    // ... save, transform, etc.
  }
}

Step 3: Filter Entries by Type

Tar entries have a type field in their header. Common types are 'file', 'directory', and 'symlink'. Filter entries to process only what you need.

ts
import { parseTar } from 'remix/tar-parser'

let entries = await parseTar(stream)

// Only files (skip directories and symlinks)
let files = entries.filter((e) => e.header.type === 'file')

// Only TypeScript source files
let tsFiles = files.filter((e) => e.header.name.endsWith('.ts'))

for (let entry of tsFiles) {
  let source = await entry.text()
  console.log(`--- ${entry.header.name} ---`)
  console.log(source)
}

Step 4: Extract Files to Disk

Combine the tar parser with the fs package to extract files to the local filesystem.

ts
import { parseTar } from 'remix/tar-parser'
import { writeFile } from 'remix/fs'

let response = await fetch('https://example.com/project.tar')
let entries = await parseTar(response.body!)

for (let entry of entries) {
  if (entry.header.type !== 'file') continue

  let content = await entry.arrayBuffer()
  let file = new File([content], entry.header.name)

  await writeFile(`./extracted/${entry.header.name}`, file)
  console.log(`Extracted: ${entry.header.name}`)
}

Step 5: Read Entry Metadata

Each entry's header contains useful metadata beyond the name and type.

ts
import { parseTar } from 'remix/tar-parser'

let entries = await parseTar(stream)

for (let entry of entries) {
  let h = entry.header

  console.log(`Name:     ${h.name}`)
  console.log(`Type:     ${h.type}`)       // 'file', 'directory', 'symlink'
  console.log(`Size:     ${h.size} bytes`)
  console.log(`Mode:     ${h.mode.toString(8)}`) // Octal permissions like '644'
  console.log(`Modified: ${new Date(h.mtime)}`)
  console.log('---')
}

Summary

ConceptWhat You Learned
Basic parsingparseTar(stream) buffers all entries into an array
StreamingTarParser yields entries one at a time for large archives
Gzip handlingPipe through DecompressionStream('gzip') for .tar.gz files
FilteringCheck entry.header.type and entry.header.name
ExtractingCombine with writeFile to save entries to disk
MetadataAccess name, type, size, mode, mtime on entry.header

Next Steps

  • Write extracted files with fs
  • Detect MIME types of extracted files with mime
  • See the API Reference for all entry properties

Released under the MIT License.