Lab 1.2 — UV CLI Tool + LaTeX Docs PDF on GitHub Pages#

?> What you’ll build ?> A command-line tool published via UV that anyone can run with uvx your-tool, plus a professional PDF documentation file generated with LaTeX + pandoc, deployed to GitHub Pages along with a Docusaurus-style HTML site.

Time: 60–90 minutes. Difficulty: ⭐⭐⭐☆☆. Ship: a live GitHub Pages URL with your docs + downloadable PDF.

What the Finished Thing Looks Like#

By the end:

uvx tds-csv-YOURNAME sample.csv --top 5
# ┌──────┬────────┐
# │ City │ Count  │
# ├──────┼────────┤
# │ ...  │ ...    │
# └──────┴────────┘

And https://<username>.github.io/tds-csv-YOURNAME/ shows your documentation site with a Download PDF button.

Prerequisites#

  • Completed Lab 1.1 (at least through Step 6) — you understand UV + pyproject.toml.
  • LaTeX + pandoc available locally (see latex.mdx). GitHub Actions has both preinstalled — so local is optional.
  • GitHub Pages enabled on your account.

The Steps#

Step 1 — Plan the CLI

Our CLI will be tds-csv-YOURNAME. Features:

  • Takes a CSV file path.
  • Optionally filters to the top N rows by a given column.
  • Pretty-prints as a table using rich.
Usage: tds-csv [OPTIONS] FILE

  Quickly explore a CSV file.

Options:
  --top INTEGER           Show top N rows [default: 10]
  --by TEXT               Sort by column (default: first column)
  --help                  Show this message and exit.
Step 2 — Scaffold the project
uv init --app --python 3.13 tds-csv-YOURNAME
cd tds-csv-YOURNAME

We use --app (not --lib) because this is a CLI app. UV creates a single-module layout:

tds-csv-YOURNAME/
├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.toml

Add dependencies:

uv add typer "rich>=13" pandas
uv add --dev pytest
Step 3 — Write the CLI

Rename main.py to cli.py and replace its contents:

"""tds-csv — quickly explore a CSV file."""

from pathlib import Path
from importlib.metadata import version as _v

import pandas as pd
import typer
from rich.console import Console
from rich.table import Table

__version__ = _v("tds-csv-YOURNAME")

app = typer.Typer(
    name="tds-csv",
    help="Quickly explore a CSV file.",
    add_completion=False,
)
console = Console()

def _render(df: pd.DataFrame, title: str) -> None:
    table = Table(title=title, show_lines=True)
    for col in df.columns:
        table.add_column(str(col), style="cyan")
    for _, row in df.iterrows():
        table.add_row(*[str(v) for v in row])
    console.print(table)

@app.command()
def main(
    file: Path = typer.Argument(..., exists=True, readable=True, help="CSV file to read."),
    top: int = typer.Option(10, help="Show top N rows."),
    by: str | None = typer.Option(None, help="Sort by column (default: first column)."),
    version: bool = typer.Option(False, "--version", help="Show version and exit."),
) -> None:
    """Render a CSV file as a pretty table."""
    if version:
        console.print(f"tds-csv v{__version__}")
        raise typer.Exit()

    df = pd.read_csv(file)
    sort_col = by or df.columns[0]
    if sort_col not in df.columns:
        console.print(f"[red]Column '{sort_col}' not in CSV[/red]")
        raise typer.Exit(code=1)
    df = df.sort_values(by=sort_col, ascending=False).head(top)
    _render(df, f"{file.name} — top {top} by {sort_col}")

if __name__ == "__main__":
    app()
Step 4 — Wire it up as a CLI entry point

Edit pyproject.toml to expose tds-csv as an entry point:

[project.scripts]
tds-csv = "cli:app"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Full file should look like:

[project]
name = "tds-csv-YOURNAME"
version = "0.1.0"
description = "Quickly explore a CSV file from the command line."
readme = "README.md"
license = "MIT"
requires-python = ">=3.11"
authors = [{ name = "Your Name", email = "[email protected]" }]
dependencies = [
    "typer",
    "rich>=13",
    "pandas",
]

[project.scripts]
tds-csv = "cli:app"

[project.urls]
Homepage = "https://github.com/YOUR-USERNAME/tds-csv-YOURNAME"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[dependency-groups]
dev = ["pytest>=8"]

[tool.hatch.build.targets.wheel]
packages = ["."]
include = ["cli.py"]

Test it locally:

uv sync
uv run tds-csv --help

You should see the Typer help output.

Step 5 — Create a sample CSV and try it
cat > sample.csv <<'EOF'
city,population,state
Chennai,7088000,Tamil Nadu
Mumbai,20411000,Maharashtra
Bangalore,8443000,Karnataka
Hyderabad,6809000,Telangana
Pune,3124000,Maharashtra
Kolkata,14850000,West Bengal
Delhi,28514000,Delhi
EOF

uv run tds-csv sample.csv --top 5 --by population

You should see a nicely formatted table sorted by population.

Step 6 — Try it as a one-shot tool (uvx)

Build the wheel and run it from an ephemeral env:

uv build

# Run without installing globally
uv tool run --from "./dist/tds_csv_YOURNAME-0.1.0-py3-none-any.whl" tds-csv sample.csv --top 3
# or the short form:
uvx --from ./dist/*.whl tds-csv sample.csv --top 3

Later (after publishing to PyPI), anyone can uvx tds-csv-YOURNAME sample.csv.

Step 7 — Write Markdown documentation

Create a docs/ folder and put your documentation in Markdown:

mkdir docs
---
title: tds-csv — User Guide
author: Your Name
date: 2026-05-10
---

# tds-csv

**tds-csv** is a tiny CLI for quickly exploring CSV files. Built for the
*Tools in Data Science* course at IIT Madras, May 2026.

## Installation

```bash
uvx tds-csv-YOURNAME --help

Or install globally:

uv tool install tds-csv-YOURNAME
tds-csv --help

Usage#

Show the top 10 rows#

tds-csv sample.csv

Sort by a specific column#

tds-csv sample.csv --by population --top 5

How It Works#

The tool:

  1. Reads the CSV with pandas.read_csv.
  2. Sorts by the chosen column (defaulting to the first column).
  3. Takes the top N rows.
  4. Renders them with rich as a Unicode table.

Architecture#

The formula for text-to-digital transformation in our case is:

output = Render(SortBy_col(Read(csv))[:N])

License#

MIT — see the LICENSE file.


</details>

<details>
<summary><b>Step 8 — Build the PDF with pandoc + LaTeX</b></summary>

Create a pandoc template for nicer PDF output:

```latex title="docs/template.tex"
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{fancyhdr}
\usepackage{amsmath}
\usepackage{xcolor}

\definecolor{tdsblue}{RGB}{79,70,229}
\hypersetup{
  colorlinks=true,
  linkcolor=tdsblue,
  urlcolor=tdsblue
}

\pagestyle{fancy}
\fancyhf{}
\lhead{$title$}
\rhead{$date$}
\cfoot{\thepage}

\title{\textcolor{tdsblue}{$title$}}
\author{$author$}
\date{$date$}

\begin{document}
\maketitle
\tableofcontents
\newpage

$body$

\end{document}

Locally test:

pandoc docs/index.md -o docs/tds-csv.pdf \
  --template=docs/template.tex \
  --pdf-engine=xelatex \
  --toc \
  --number-sections \
  --highlight-style=tango

Open docs/tds-csv.pdf — you should have a beautifully typeset document with a cover page, TOC, and code highlighting.

?> If pandoc isn’t installed locally ?> Skip this local build and let GitHub Actions do it (Step 11). The Action’s Ubuntu runner has pandoc + texlive pre-installable.

Step 9 — Build a Docusaurus site for the HTML docs

You have two choices:

  • Option A (quick) — use plain HTML or mdBook. Small output, minutes to set up.
  • Option B (professional) — use Docusaurus like the TDS course itself. Takes 10 minutes but matches the course pattern.

We’ll go with Option B — Docusaurus. Initialize:

# in the repo root
npx create-docusaurus@latest site classic --typescript

This creates a site/ folder. Move your documentation in and delete the default content:

rm -rf site/docs/* site/blog
cp docs/index.md site/docs/intro.md

Edit site/docusaurus.config.ts — set url, baseUrl, organizationName, projectName:

const SITE_URL = process.env.SITE_URL ?? 'https://YOUR-USERNAME.github.io';
const BASE_URL = process.env.BASE_URL ?? '/tds-csv-YOURNAME/';

const config = {
  title: 'tds-csv',
  tagline: 'Quickly explore any CSV',
  url: SITE_URL,
  baseUrl: BASE_URL,
  organizationName: 'YOUR-USERNAME',
  projectName: 'tds-csv-YOURNAME',
  // ... (rest of defaults)
};

Test the dev server:

cd site
npm install
npm run start        # opens http://localhost:3000

Stop the dev server (Ctrl+C) and do a production build:

npm run build        # outputs to site/build
Step 10 — Link the PDF from the site

Docusaurus serves anything under site/static/ as a top-level file. Copy the PDF there:

mkdir -p site/static/downloads
cp docs/tds-csv.pdf site/static/downloads/

Reference it in site/docs/intro.md:

[📄 Download the full PDF manual](/downloads/tds-csv.pdf)
Step 11 — Write the GitHub Actions deploy workflow

This workflow rebuilds the PDF on every push and deploys the site to GitHub Pages.

name: Deploy Docs

on:
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: pages
  cancel-in-progress: true

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # --- PDF step ---
      - name: Install pandoc + texlive
        run: |
          sudo apt-get update
          sudo apt-get install -y pandoc texlive-xetex texlive-fonts-recommended texlive-latex-extra

      - name: Build PDF
        run: |
          mkdir -p site/static/downloads
          pandoc docs/index.md -o site/static/downloads/tds-csv.pdf \
            --template=docs/template.tex \
            --pdf-engine=xelatex \
            --toc \
            --number-sections \
            --highlight-style=tango

      # --- Site step ---
      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '24'
          cache: 'npm'
          cache-dependency-path: site/package-lock.json

      - name: Install site deps
        working-directory: site
        run: npm ci

      - name: Build site
        working-directory: site
        env:
          SITE_URL: https://${{ github.repository_owner }}.github.io
          BASE_URL: /${{ github.event.repository.name }}/
        run: npm run build

      - name: Upload Pages artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: site/build

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4
Step 12 — Commit and deploy

Enable Pages on the repo: Settings → Pages → Source: GitHub Actions.

git add .
git commit -m "feat: initial site + PDF docs pipeline"
git push

Watch the Actions tab. The job takes ~3 minutes (pandoc + texlive install is the slow part). When it finishes green, open the URL shown in the deploy step.

You should see:

  • A Docusaurus HTML site at https://<username>.github.io/tds-csv-YOURNAME/
  • A Download PDF link inside it that serves your rendered PDF
Step 13 — Speed up the Action by caching texlive (optional but nice)

Installing TeX Live takes ~90 seconds. You can cache it:

- name: Cache pandoc + texlive
  uses: actions/cache@v4
  id: cache-tex
  with:
    path: /usr/share/texlive
    key: texlive-${{ runner.os }}-v1

- name: Install pandoc + texlive
  if: steps.cache-tex.outputs.cache-hit != 'true'
  run: |
    sudo apt-get update
    sudo apt-get install -y pandoc texlive-xetex ...

This only helps on re-runs — first build is unchanged.

Step 14 — Publish the CLI to PyPI too

Same process as Lab 1.1: add a .github/workflows/release.yml that triggers on v* tags, publishes via Trusted Publishing. Once done, anyone in the world can uvx tds-csv-YOURNAME my.csv.


Troubleshooting#

pandoc "File not found" for template.tex

Your working directory in the Action matters. The Build PDF step runs from the repo root, so docs/template.tex is correct. If you moved things, update the path.

LaTeX error about missing packages

The texlive-fonts-recommended texlive-latex-extra packages cover most needs. If your template uses something exotic, add more packages:

sudo apt-get install -y texlive-science texlive-pictures
Docusaurus build fails with "broken link"

Docusaurus is strict about broken links. Either fix the link or set onBrokenLinks: 'warn' in docusaurus.config.ts.

Site renders at wrong URL (404s on CSS)

Your baseUrl is wrong. For a project site at user.github.io/repo/, baseUrl must be '/repo/' — with the trailing slash.


Knowledge Check#

Q1. What is the purpose of the [project.scripts] section in pyproject.toml?

  • A) It tells UV which scripts to run during the build process
  • B) It registers CLI entry points, allowing users to run your app directly from the terminal
  • C) It defines the test scripts to be executed by GitHub Actions
  • D) It lists the Python scripts that should be ignored by the formatter
Answer

B[project.scripts] creates an executable command (e.g., tds-csv = "cli:app") that invokes a specific Python function when the package is installed.

Q2. When running a tool with uvx tds-csv sample.csv, what happens if the tool is not installed globally?

  • A) UV returns an error and asks you to run uv tool install first
  • B) UV downloads the package, installs it in a temporary, ephemeral environment, runs it, and cleans up
  • C) UV automatically installs it permanently into your global Python environment
  • D) UV uses a web browser to run the command online
Answer

Buvx (or uv tool run) allows you to execute CLI tools seamlessly without cluttering your global environment. It fetches the tool, runs it in an isolated cache, and finishes.

Q3. Why do we use actions/upload-pages-artifact and actions/deploy-pages in the GitHub Actions workflow?

  • A) To upload the PDF document to PyPI
  • B) To store backup copies of the documentation in a secret bucket
  • C) To natively publish the built Docusaurus HTML site to GitHub Pages
  • D) To send the generated site to an external hosting provider like AWS or Vercel
Answer

C — These are the official GitHub Actions for taking a folder of static files (like the build/ folder from Docusaurus) and securely deploying it to GitHub Pages.


What You’ve Learned#

  • Turning UV-managed code into an installable CLI via [project.scripts].
  • Using uvx to run tools in ephemeral environments.
  • Authoring documentation in Markdown and rendering it to a professional PDF with pandoc + custom LaTeX template.
  • Hosting a Docusaurus site on GitHub Pages with Actions.
  • Combining two build outputs (site + PDF) in a single deploy pipeline.

Write a Blog Post#

  • Compare pandoc with just writing .tex by hand — pros and cons.
  • Explain the Docusaurus baseUrl gotcha.
  • Show off your deployed URL!

Next Lab#

Lab 1.3 — Bash automation: daily project summary

References#