Lab 1.2 — UV CLI Tool + LaTeX Docs PDF on GitHub Pages#
?> What you’ll build
?> A command-line tool published via UV that anyone can run with uvx your-tool, plus a professional PDF documentation file generated with LaTeX + pandoc, deployed to GitHub Pages along with a Docusaurus-style HTML site.
Time: 60–90 minutes. Difficulty: ⭐⭐⭐☆☆. Ship: a live GitHub Pages URL with your docs + downloadable PDF.
What the Finished Thing Looks Like#
By the end:
uvx tds-csv-YOURNAME sample.csv --top 5
# ┌──────┬────────┐
# │ City │ Count │
# ├──────┼────────┤
# │ ... │ ... │
# └──────┴────────┘And https://<username>.github.io/tds-csv-YOURNAME/ shows your documentation site with a Download PDF button.
Prerequisites#
- Completed Lab 1.1 (at least through Step 6) — you understand UV + pyproject.toml.
- LaTeX + pandoc available locally (see latex.mdx). GitHub Actions has both preinstalled — so local is optional.
- GitHub Pages enabled on your account.
The Steps#
Step 1 — Plan the CLI
Our CLI will be tds-csv-YOURNAME. Features:
- Takes a CSV file path.
- Optionally filters to the top N rows by a given column.
- Pretty-prints as a table using
rich.
Usage: tds-csv [OPTIONS] FILE
Quickly explore a CSV file.
Options:
--top INTEGER Show top N rows [default: 10]
--by TEXT Sort by column (default: first column)
--help Show this message and exit.Step 2 — Scaffold the project
uv init --app --python 3.13 tds-csv-YOURNAME
cd tds-csv-YOURNAMEWe use --app (not --lib) because this is a CLI app. UV creates a single-module layout:
tds-csv-YOURNAME/
├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.tomlAdd dependencies:
uv add typer "rich>=13" pandas
uv add --dev pytestStep 3 — Write the CLI
Rename main.py to cli.py and replace its contents:
"""tds-csv — quickly explore a CSV file."""
from pathlib import Path
from importlib.metadata import version as _v
import pandas as pd
import typer
from rich.console import Console
from rich.table import Table
__version__ = _v("tds-csv-YOURNAME")
app = typer.Typer(
name="tds-csv",
help="Quickly explore a CSV file.",
add_completion=False,
)
console = Console()
def _render(df: pd.DataFrame, title: str) -> None:
table = Table(title=title, show_lines=True)
for col in df.columns:
table.add_column(str(col), style="cyan")
for _, row in df.iterrows():
table.add_row(*[str(v) for v in row])
console.print(table)
@app.command()
def main(
file: Path = typer.Argument(..., exists=True, readable=True, help="CSV file to read."),
top: int = typer.Option(10, help="Show top N rows."),
by: str | None = typer.Option(None, help="Sort by column (default: first column)."),
version: bool = typer.Option(False, "--version", help="Show version and exit."),
) -> None:
"""Render a CSV file as a pretty table."""
if version:
console.print(f"tds-csv v{__version__}")
raise typer.Exit()
df = pd.read_csv(file)
sort_col = by or df.columns[0]
if sort_col not in df.columns:
console.print(f"[red]Column '{sort_col}' not in CSV[/red]")
raise typer.Exit(code=1)
df = df.sort_values(by=sort_col, ascending=False).head(top)
_render(df, f"{file.name} — top {top} by {sort_col}")
if __name__ == "__main__":
app()Step 4 — Wire it up as a CLI entry point
Edit pyproject.toml to expose tds-csv as an entry point:
[project.scripts]
tds-csv = "cli:app"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"Full file should look like:
[project]
name = "tds-csv-YOURNAME"
version = "0.1.0"
description = "Quickly explore a CSV file from the command line."
readme = "README.md"
license = "MIT"
requires-python = ">=3.11"
authors = [{ name = "Your Name", email = "[email protected]" }]
dependencies = [
"typer",
"rich>=13",
"pandas",
]
[project.scripts]
tds-csv = "cli:app"
[project.urls]
Homepage = "https://github.com/YOUR-USERNAME/tds-csv-YOURNAME"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[dependency-groups]
dev = ["pytest>=8"]
[tool.hatch.build.targets.wheel]
packages = ["."]
include = ["cli.py"]Test it locally:
uv sync
uv run tds-csv --helpYou should see the Typer help output.
Step 5 — Create a sample CSV and try it
cat > sample.csv <<'EOF'
city,population,state
Chennai,7088000,Tamil Nadu
Mumbai,20411000,Maharashtra
Bangalore,8443000,Karnataka
Hyderabad,6809000,Telangana
Pune,3124000,Maharashtra
Kolkata,14850000,West Bengal
Delhi,28514000,Delhi
EOF
uv run tds-csv sample.csv --top 5 --by populationYou should see a nicely formatted table sorted by population.
Step 6 — Try it as a one-shot tool (uvx)
Build the wheel and run it from an ephemeral env:
uv build
# Run without installing globally
uv tool run --from "./dist/tds_csv_YOURNAME-0.1.0-py3-none-any.whl" tds-csv sample.csv --top 3
# or the short form:
uvx --from ./dist/*.whl tds-csv sample.csv --top 3Later (after publishing to PyPI), anyone can uvx tds-csv-YOURNAME sample.csv.
Step 7 — Write Markdown documentation
Create a docs/ folder and put your documentation in Markdown:
mkdir docs---
title: tds-csv — User Guide
author: Your Name
date: 2026-05-10
---
# tds-csv
**tds-csv** is a tiny CLI for quickly exploring CSV files. Built for the
*Tools in Data Science* course at IIT Madras, May 2026.
## Installation
```bash
uvx tds-csv-YOURNAME --helpOr install globally:
uv tool install tds-csv-YOURNAME
tds-csv --helpUsage#
Show the top 10 rows#
tds-csv sample.csvSort by a specific column#
tds-csv sample.csv --by population --top 5How It Works#
The tool:
- Reads the CSV with
pandas.read_csv. - Sorts by the chosen column (defaulting to the first column).
- Takes the top N rows.
- Renders them with
richas a Unicode table.
Architecture#
The formula for text-to-digital transformation in our case is:
output = Render(SortBy_col(Read(csv))[:N])License#
MIT — see the LICENSE file.
</details>
<details>
<summary><b>Step 8 — Build the PDF with pandoc + LaTeX</b></summary>
Create a pandoc template for nicer PDF output:
```latex title="docs/template.tex"
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{fancyhdr}
\usepackage{amsmath}
\usepackage{xcolor}
\definecolor{tdsblue}{RGB}{79,70,229}
\hypersetup{
colorlinks=true,
linkcolor=tdsblue,
urlcolor=tdsblue
}
\pagestyle{fancy}
\fancyhf{}
\lhead{$title$}
\rhead{$date$}
\cfoot{\thepage}
\title{\textcolor{tdsblue}{$title$}}
\author{$author$}
\date{$date$}
\begin{document}
\maketitle
\tableofcontents
\newpage
$body$
\end{document}Locally test:
pandoc docs/index.md -o docs/tds-csv.pdf \
--template=docs/template.tex \
--pdf-engine=xelatex \
--toc \
--number-sections \
--highlight-style=tangoOpen docs/tds-csv.pdf — you should have a beautifully typeset document with a cover page, TOC, and code highlighting.
?> If pandoc isn’t installed locally
?> Skip this local build and let GitHub Actions do it (Step 11). The Action’s Ubuntu runner has pandoc + texlive pre-installable.
Step 9 — Build a Docusaurus site for the HTML docs
You have two choices:
- Option A (quick) — use plain HTML or mdBook. Small output, minutes to set up.
- Option B (professional) — use Docusaurus like the TDS course itself. Takes 10 minutes but matches the course pattern.
We’ll go with Option B — Docusaurus. Initialize:
# in the repo root
npx create-docusaurus@latest site classic --typescriptThis creates a site/ folder. Move your documentation in and delete the default content:
rm -rf site/docs/* site/blog
cp docs/index.md site/docs/intro.mdEdit site/docusaurus.config.ts — set url, baseUrl, organizationName, projectName:
const SITE_URL = process.env.SITE_URL ?? 'https://YOUR-USERNAME.github.io';
const BASE_URL = process.env.BASE_URL ?? '/tds-csv-YOURNAME/';
const config = {
title: 'tds-csv',
tagline: 'Quickly explore any CSV',
url: SITE_URL,
baseUrl: BASE_URL,
organizationName: 'YOUR-USERNAME',
projectName: 'tds-csv-YOURNAME',
// ... (rest of defaults)
};Test the dev server:
cd site
npm install
npm run start # opens http://localhost:3000Stop the dev server (Ctrl+C) and do a production build:
npm run build # outputs to site/buildStep 10 — Link the PDF from the site
Docusaurus serves anything under site/static/ as a top-level file. Copy the PDF there:
mkdir -p site/static/downloads
cp docs/tds-csv.pdf site/static/downloads/Reference it in site/docs/intro.md:
[📄 Download the full PDF manual](/downloads/tds-csv.pdf)Step 11 — Write the GitHub Actions deploy workflow
This workflow rebuilds the PDF on every push and deploys the site to GitHub Pages.
name: Deploy Docs
on:
push:
branches: [main]
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# --- PDF step ---
- name: Install pandoc + texlive
run: |
sudo apt-get update
sudo apt-get install -y pandoc texlive-xetex texlive-fonts-recommended texlive-latex-extra
- name: Build PDF
run: |
mkdir -p site/static/downloads
pandoc docs/index.md -o site/static/downloads/tds-csv.pdf \
--template=docs/template.tex \
--pdf-engine=xelatex \
--toc \
--number-sections \
--highlight-style=tango
# --- Site step ---
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '24'
cache: 'npm'
cache-dependency-path: site/package-lock.json
- name: Install site deps
working-directory: site
run: npm ci
- name: Build site
working-directory: site
env:
SITE_URL: https://${{ github.repository_owner }}.github.io
BASE_URL: /${{ github.event.repository.name }}/
run: npm run build
- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v3
with:
path: site/build
deploy:
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4Step 12 — Commit and deploy
Enable Pages on the repo: Settings → Pages → Source: GitHub Actions.
git add .
git commit -m "feat: initial site + PDF docs pipeline"
git pushWatch the Actions tab. The job takes ~3 minutes (pandoc + texlive install is the slow part). When it finishes green, open the URL shown in the deploy step.
You should see:
- A Docusaurus HTML site at
https://<username>.github.io/tds-csv-YOURNAME/ - A Download PDF link inside it that serves your rendered PDF
Step 13 — Speed up the Action by caching texlive (optional but nice)
Installing TeX Live takes ~90 seconds. You can cache it:
- name: Cache pandoc + texlive
uses: actions/cache@v4
id: cache-tex
with:
path: /usr/share/texlive
key: texlive-${{ runner.os }}-v1
- name: Install pandoc + texlive
if: steps.cache-tex.outputs.cache-hit != 'true'
run: |
sudo apt-get update
sudo apt-get install -y pandoc texlive-xetex ...This only helps on re-runs — first build is unchanged.
Step 14 — Publish the CLI to PyPI too
Same process as Lab 1.1: add a .github/workflows/release.yml that triggers on v* tags, publishes via Trusted Publishing. Once done, anyone in the world can uvx tds-csv-YOURNAME my.csv.
Troubleshooting#
pandoc "File not found" for template.tex
Your working directory in the Action matters. The Build PDF step runs from the repo root, so docs/template.tex is correct. If you moved things, update the path.
LaTeX error about missing packages
The texlive-fonts-recommended texlive-latex-extra packages cover most needs. If your template uses something exotic, add more packages:
sudo apt-get install -y texlive-science texlive-picturesDocusaurus build fails with "broken link"
Docusaurus is strict about broken links. Either fix the link or set onBrokenLinks: 'warn' in docusaurus.config.ts.
Site renders at wrong URL (404s on CSS)
Your baseUrl is wrong. For a project site at user.github.io/repo/, baseUrl must be '/repo/' — with the trailing slash.
Knowledge Check#
Q1. What is the purpose of the [project.scripts] section in pyproject.toml?
- A) It tells UV which scripts to run during the build process
- B) It registers CLI entry points, allowing users to run your app directly from the terminal
- C) It defines the test scripts to be executed by GitHub Actions
- D) It lists the Python scripts that should be ignored by the formatter
Answer
B — [project.scripts] creates an executable command (e.g., tds-csv = "cli:app") that invokes a specific Python function when the package is installed.
Q2. When running a tool with uvx tds-csv sample.csv, what happens if the tool is not installed globally?
- A) UV returns an error and asks you to run
uv tool installfirst - B) UV downloads the package, installs it in a temporary, ephemeral environment, runs it, and cleans up
- C) UV automatically installs it permanently into your global Python environment
- D) UV uses a web browser to run the command online
Answer
B — uvx (or uv tool run) allows you to execute CLI tools seamlessly without cluttering your global environment. It fetches the tool, runs it in an isolated cache, and finishes.
Q3. Why do we use actions/upload-pages-artifact and actions/deploy-pages in the GitHub Actions workflow?
- A) To upload the PDF document to PyPI
- B) To store backup copies of the documentation in a secret bucket
- C) To natively publish the built Docusaurus HTML site to GitHub Pages
- D) To send the generated site to an external hosting provider like AWS or Vercel
Answer
C — These are the official GitHub Actions for taking a folder of static files (like the build/ folder from Docusaurus) and securely deploying it to GitHub Pages.
What You’ve Learned#
- Turning UV-managed code into an installable CLI via
[project.scripts]. - Using
uvxto run tools in ephemeral environments. - Authoring documentation in Markdown and rendering it to a professional PDF with pandoc + custom LaTeX template.
- Hosting a Docusaurus site on GitHub Pages with Actions.
- Combining two build outputs (site + PDF) in a single deploy pipeline.
Write a Blog Post#
- Compare pandoc with just writing
.texby hand — pros and cons. - Explain the Docusaurus
baseUrlgotcha. - Show off your deployed URL!
Next Lab#
Lab 1.3 — Bash automation: daily project summary