Tar Parser Tutorial

This tutorial walks through parsing tar archives, extracting entries, and filtering files by type or path.

Prerequisites

A Remix V3 project
A tar archive to parse (from a URL, file, or API)

Step 1: Parse a Tar Archive from a URL

Use parseTar to download and parse a tar archive in one step. It buffers all entries into an array.

import { parseTar } from 'remix/tar-parser'

let response = await fetch('https://registry.npmjs.org/remix/-/remix-1.0.0.tgz')

// If the archive is gzipped, decompress first
let decompressed = response.body!.pipeThrough(new DecompressionStream('gzip'))

let entries = await parseTar(decompressed)

for (let entry of entries) {
  console.log(`${entry.header.type}: ${entry.header.name} (${entry.header.size} bytes)`)
}

What is DecompressionStream? Tar files are often gzip-compressed (.tar.gz or .tgz). DecompressionStream is a web-standard API that decompresses a stream on the fly. Pipe your response through it before passing to parseTar.

Step 2: Stream Large Archives

For large archives, use the TarParser class to process entries one at a time without holding the entire archive in memory.

import { TarParser } from 'remix/tar-parser'

let response = await fetch('https://example.com/large-archive.tar')
let parser = new TarParser(response.body!)

for await (let entry of parser) {
  if (entry.header.type === 'file') {
    console.log(`File: ${entry.header.name}`)

    // Process the file content
    let content = await entry.arrayBuffer()
    // ... save, transform, etc.
  }
}

Step 3: Filter Entries by Type

Tar entries have a type field in their header. Common types are 'file', 'directory', and 'symlink'. Filter entries to process only what you need.

import { parseTar } from 'remix/tar-parser'

let entries = await parseTar(stream)

// Only files (skip directories and symlinks)
let files = entries.filter((e) => e.header.type === 'file')

// Only TypeScript source files
let tsFiles = files.filter((e) => e.header.name.endsWith('.ts'))

for (let entry of tsFiles) {
  let source = await entry.text()
  console.log(`--- ${entry.header.name} ---`)
  console.log(source)
}

Step 4: Extract Files to Disk

Combine the tar parser with the fs package to extract files to the local filesystem.

import { parseTar } from 'remix/tar-parser'
import { writeFile } from 'remix/fs'

let response = await fetch('https://example.com/project.tar')
let entries = await parseTar(response.body!)

for (let entry of entries) {
  if (entry.header.type !== 'file') continue

  let content = await entry.arrayBuffer()
  let file = new File([content], entry.header.name)

  await writeFile(`./extracted/${entry.header.name}`, file)
  console.log(`Extracted: ${entry.header.name}`)
}

Step 5: Read Entry Metadata

Each entry's header contains useful metadata beyond the name and type.

import { parseTar } from 'remix/tar-parser'

let entries = await parseTar(stream)

for (let entry of entries) {
  let h = entry.header

  console.log(`Name:     ${h.name}`)
  console.log(`Type:     ${h.type}`)       // 'file', 'directory', 'symlink'
  console.log(`Size:     ${h.size} bytes`)
  console.log(`Mode:     ${h.mode.toString(8)}`) // Octal permissions like '644'
  console.log(`Modified: ${new Date(h.mtime)}`)
  console.log('---')
}

Summary

Concept	What You Learned
Basic parsing	`parseTar(stream)` buffers all entries into an array
Streaming	`TarParser` yields entries one at a time for large archives
Gzip handling	Pipe through `DecompressionStream('gzip')` for `.tar.gz` files
Filtering	Check `entry.header.type` and `entry.header.name`
Extracting	Combine with `writeFile` to save entries to disk
Metadata	Access `name`, `type`, `size`, `mode`, `mtime` on `entry.header`

Next Steps

Write extracted files with fs
Detect MIME types of extracted files with mime
See the API Reference for all entry properties

Tar Parser Tutorial ​

Prerequisites ​

Step 1: Parse a Tar Archive from a URL ​

Step 2: Stream Large Archives ​

Step 3: Filter Entries by Type ​

Step 4: Extract Files to Disk ​

Step 5: Read Entry Metadata ​

Summary ​

Next Steps ​