AI Coding Context Engineering#

Context engineering is the systematic approach to providing AI coding assistants with the right information at the right level of detail. Unlike general prompt engineering, code context engineering focuses specifically on code-related workflows, project structure, and development processes.

Effective context engineering transforms AI from a simple code generator into an intelligent development partner that understands your project’s architecture, conventions, and goals.

In this module, you’ll learn:

File context: Techniques for providing relevant code snippets and files to AI tools
Spec-first development: Using a specification document as the single source of truth for AI-driven development
Project context: Using CLAUDE.md or AGENTS.md to give AI persistent knowledge about your codebase
Library context: Supplying documentation for third-party libraries to improve AI understanding
Project documentation: Generating and maintaining human and AI-readable docs for better collaboration
llms.txt: Implementing the llms.txt standard to help LLMs understand your website or API
Instruction templates: Using lightweight schemas to eliminate ambiguity in prompts
Best practices and tips: Maintaining context hygiene, cost optimization, and team collaboration

File context#

The easiest way to provide context to AI is by copy-pasting or uploading the code directly. For single files, this is straightforward. For multi-file projects, you can use:

uvx files-to-prompt --cxml file1.py file2.js repo/ ...

This concatenates the files into an XML structure (recommended by Claude, supported by most AI tools) that you can paste or pass to your AI tool.

For small files, this works well. If your code is large, you may want to pass files and chunks selectively.

For git repositories, you can use gitingest. For example https://gitingest.com/sanand0/tools-in-data-science-public extracts the full contents of this course repo.

You may use it in the command line like this:

uvx gitingest https://github.com/sanand0/tools-in-data-science-public

Spec-first development#

Spec-first development places a detailed specification document at the center of your AI coding workflow. Instead of giving ad-hoc instructions to AI tools, you maintain a living spec that serves as the authoritative source for project requirements, constraints, and acceptance criteria.

There is no formal standard for AI coding specs. Here are some sections you might want to include in your README.md (or a separate spec.md):

Meta: Status, Version, Owners, Repo/Issue links, Last reviewed
Problem & Scope: One-paragraph problem; Goals vs Out-of-scope.
Users & Flows: Primary personas; 1–2 happy-path flows; edge cases as bullets.
Requirements: Functional & non-functional (latency, cost, availability, privacy). Use MUST/SHOULD/MAY precisely.
Interfaces & Data Contracts: REST/GraphQL/event contracts (OpenAPI/GraphQL SDL/AsyncAPI); shared JSON Schemas for requests, responses, events, and errors. Prefer contract-first.
AI-Specific Contracts: Prompt & Tool Schemas (inputs/outputs, invariants), strict parsing requirements, safety guardrails, cost, budgets, fallback & timeout policy. If using tool/function calling, document JSON schema + strictness.
Acceptance Examples: Executable examples that double as tests. Keep 1 scenario per behavior.
Evaluation Plan (for LLM features): Golden/regression tests, challenge sets, rubrics, and pass/fail gates. Automate.
Risk, Safety & Privacy: Threat model & mitigations (e.g. aligned to OWASP LLM Top 10); PII handling/redaction rules.
Change Log & Versioning: Human-readable changes; link to related ADRs; follow a clear changelog style.

Some things to remember:

Keep docs as code. Version the spec, run doc linters in CI, and require spec updates for feature PRs (“docs-as-code”).
Keep it updated. Update whenever a new decision/change is made. Record decisions, not debates.
Change transparently. Maintain a CHANGELOG.md in plain language and link each entry to spec sections / architecture decision records.
Avoid bloated specs. Focus on the most important decisions (not reasons).
Avoid vagueness. This can confuse AI. Ideally, replace with executable examples.

Project context#

Project context provides AI tools with persistent knowledge about your codebase.

Codex uses AGENTS.md
Gemini CLI uses GEMINI.md (see “Memory Management” in their docs)
Claude Code uses CLAUDE.md

Claude Code and Gemini CLI provide a /init command to bootstrap these files based on your project structure and README.md. You can edit the generated file.

Each project has its own style and preferences. Most often, these files include instructions on how to build and test the project and coding conventions.

Versioning prompts#

Some developers also version their prompts by adding them to:

Commits. When coding with a CLI, you can manually add the prompt(s) used as part of the commit.
Pull requests. Online coding agents include a link to the task (which includes the prompt) as part of the PR. Example
PROMPTS.md. Developers are increasingly storing prompts in a PROMPTS.md. Examples

Commit messages#

Commit messages provide useful context about code changes, helping AI tools scan through history and understand the intent behind modifications.

Most AI coding agents generate commit messages automatically. Here’s a quick way to do it manually:

git diff --cached | llm "Generate a concise, descriptive commit message for these changes"

Library context#

When using libraries (e.g. from PyPI or npm), providing context about the library can help AI understand it better.

You can add this instruction:

- Use `curl -s https://pypi.org/pypi/package-name/json | jq -r .info.description` to get PyPI package docs
- Use `npm view package-name readme` to get npm package docs
- Use `curl -s https://context7.com/owner/repo/llms.txt` for https://github.com/owner/repo docs

PyPi and NPM provide the first two natively. For GitHub repos, you can create an account on context7.com to add any public GitHub repos you want the docs for.

Project documentation#

Human developers need to understand the codebase to contribute to a project. AI tools can help with this.

A simple approach is to use a coding agent to generate documentation from the codebase. Here are examples:

uvx files-to-prompt --cxml . | llm "Generate a comprehensive README.md for this project" > README.md
claude -p "Update README.md with project overview, setup, and usage examples"
codex exec "Update README.md with project overview, setup, and usage examples"

DeepWiki is a tool that generates documentation from GitHub repos. For example

https://deepwiki.com/openai/openai-python has the generated documentation for openai/openai-python
https://deepwiki.com/sanand0/tools-in-data-science-public has the generated documentation for this Tools in Data Science course

You can replace github.com with deepwiki.com and generate docs for your own public repos.

llms.txt#

/llms.txt is a standard to provide information to help LLMs use a website at inference time. For example:

Several tools can generate llms.txt files.

Context right-sizing: Include only the essential information that AI needs for current tasks. For large context, you can:

Break-it up. OpenAI’s llms.txt is 2.8MB, for example. So they’ve broken it up into llms-models-pricing.txt, llms-guides.txt, and llms-api-reference.txt.
Use tool-calling or MCP servers to fetch additional context on demand rather than bloating every prompt.
Compress it. Tools like llm-min.txt use LLMs to compress large llms.txt. Or tools that measure prompt ablation to find the most important parts of a prompt.

Instruction templates#

Use lightweight schemas to eliminate ambiguity. For example:

## Code Review Schema

| Aspect          | Rating (1-5) | Issues | Suggestions |
| --------------- | ------------ | ------ | ----------- |
| Security        |              |        |             |
| Performance     |              |        |             |
| Maintainability |              |        |             |
| Test Coverage   |              |        |             |

## Bug Report Schema

- **Reproduction Steps**: [numbered list]
- **Expected Behavior**: [one sentence]
- **Actual Behavior**: [one sentence]
- **Environment**: [OS, browser, version]
- **Logs**: [relevant error messages]

Best practices and tips#

Context hygiene
- Version control all context files: Treat CLAUDE.md, spec.md, and related files as first-class code artifacts that should be versioned and reviewed.
- Keep contexts focused: Resist the urge to include everything. Each context file should serve a specific purpose and audience.
- Regular context maintenance: Review and update context files as your project evolves. Outdated context can be worse than no context.
Cost optimization
- Monitor token usage: Track how much context you’re including in each AI interaction and optimize for the highest-value information.
- Use tiered context: Start with minimal context and let AI tools request additional information as needed.
- Cache common contexts: Store frequently-used project context in AI tool memories to avoid repeating large context blocks.
Team collaboration
- Standardize context formats: Ensure all team members use consistent file names and structures for project context.
- Share context libraries: Create reusable context templates for common project types and development patterns.
- Document context decisions: Explain why certain information is included or excluded from project context files.