Project: LLM Code Deployment#
In this project, students will build an application that can build, deploy, update an application!
- Build. The student:
- receives & verifies a request containing an app brief
- uses an LLM-assisted generator to build the app,
- deployes to GitHub Pages,
- then pings an evaluation API with repo details
- Evaluate. The instructors:
- run automated static, dynamic (Playwright), and and LLM checks
- store and publish the results after the deadline
- send a second request tailored to the student’s codebase
- Revise. The student
- verifies secret
- updates the app based on the request
- re‑deploys Pages
- then pings a second evaluation API with repo metadata.
Request#
The request is a JSON file like this:
{
// Student email ID
"email": "[email protected]",
// Student-provided secret
"secret": "...",
// A unique task ID.
"task": "captcha-solver-...",
// There will be multiple rounds per task. This is the round index
"round": 1,
// Pass this nonce back to the evaluation URL below
"nonce": "ab12-...",
// brief: mentions what the app needs to do
"brief": "Create a captcha solver that handles ?url=https://.../image.png. Default to attached sample.",
// checks: mention how it will be evaluated
"checks": [
"Repo has MIT license",
"README.md is professional",
"Page displays captcha URL passed at ?url=...",
"Page displays solved captcha text within 15 seconds"
],
// Send repo & commit details to the URL below
"evaluation_url": "https://example.com/notify",
// Attachments will be encoded as data URIs
"attachments": [
{ "name": "sample.png", "url": "data:image/png;base64,iVBORw..." }
]
}Build#
Students will:
- Host an API endpoint that accepts a JSON POST sent via:
curl https://example.com/api-endpoint \ -H "Content-Type: application/json" \ -d '{"brief": "...", ...}' - Check if the
secretmatches what they had shared in the Google Form. - Send a HTTP 200 JSON response
- Parse the request and attachments. Use LLMs to generate minimal app.
- Create a repo & push.
- Use the GitHub API / CLI app with a personal access token.
- Use a unique repo name based on
.task. - Make your repo public
- Add an MIT
LICENSEat repo root - Enable GitHub Pages and make it reachable (200 OK)
- Avoid secrets in git history (trufflehog, gitleaks)
- Write a complete
README.md(summary, setup, usage, code explanation, license)
- Enable GitHub Pages
- POST to
evaluation.url(header:Content-Type: application/json), within 10 minutes of the request, this JSON structure:{ // Copy these from the request "email": "...", "task": "captcha-solver-...", "round": 1, "nonce": "ab12-...", // Send these based on your GitHub repo and commit "repo_url": "https://github.com/user/repo", "commit_sha": "abc123", "pages_url": "https://user.github.io/repo/" } - Ensure a HTTP 200 response. On error, re-submit with a 1, 2, 4, 8, … second delay.
Revise#
Students will:
- Accept a second POST request (
{"round": 2}) to add/modify features, refactor the code, etc. - Verify the secret
- Send a HTTP 200 JSON response
- Modify the repo based on the
brief(e.g. “handle SVG images”)- Update
README.mdaccordingly
- Update
- Modify code accordingly & push to redeploy GitHub pages
- POST to the same
evaluation.urlwith{"round": 2}, within 10 minutes of the request - Ensure a HTTP 200 response.
Evaluate#
Instructors will:
- Publish a Google Form where students can submit their API URLs, their
secret, and their GitHub repo URLs - For each submission, create a unique task request.
- POST the request to their latest API URL.
- If the response is not HTTP 200, try up to 3 times over 3-24 hours. Then fail.
- Accept POST requests on the
evaluation_url. Add it to queue to evaluate and return a HTTP 200 response. - Evaluate the repo based on the task-specific as well as common checks and log these.
- Repo-level rule-based checks (e.g.
LICENSEis MIT) - LLM-based static checks (e.g. code quality,
README.mdquality) - Dynamic checks (e.g. use Playwright to load your page, run and test your app)
- Repo-level rule-based checks (e.g.
- Save the results in a
resultstable. - For all
{"round": 1}requests, generate and POST a unique round 2 task request (even if checks failed). - Publish the database after the deadline.
Instructors may, at their discretion, send up to 3 such tasks.
Stuff below this is work in progress. Some stuff may change.
Evaluation Script#
Setup:
- Download the submissions as a
submissions.csvwithtimestamp,email,endpoint,secretcolumns. - Set up a remote database with tables:
tasksfor tasks sent. Updated byround1.py,round2.py. Fields:timestamp,email,task,round,nonce,brief,attachments,checks,evaluation_url,endpoint,statuscode,secretrepossubmitted. Updated byevaluation_urlAPI. Fields:timestamp,email,task,round,nonce,repo_url,commit_sha,pages_urlresultsevaluated. Updated byevaluate.py. Fields:timestamp,email,email,task,round,repo_url,commit_sha,pages_url,check,score,reason,logs
- Create a series of parametrizable task templates
Evaluation scripts:
- Create a
round1.pyscript. For eachsubmissions.csvrow, it will:- Skip if
taskstable has a matchingemail,secret,round=1- indicating a succesful round 1 - Generate the task with fields:
- task:
{template.id}-{hash({ brief, attachments })[:5]} - round: 1
- nonce: UUID7
- brief + attachments + checks: randomly picked from task templates and parametrized with seed (email, YYYY-MM-DD-HH), expiring hourly
- evaluation_url: #TODO
- POST it to the
endpointand receive the HTTP status code - Log into the
taskstable
- Skip if
- Create an
evaluation_urlAPI endpoint. This will:- Accept a JSON payload
- If the
taskstable has a matchingemail,task,round,nonce, insert these fields along withrepo_url,commit_sha,pages_url, into therepostable and return a HTTP 200. - Else return a HTTP 400 with reason.
- Create an
evaluate.pyscript. It will go through each row inreposand:- Check if the
repo_urlwas created after task request time - Check if
repo_url@commit_shahas an MITLICENSEin the root folder - Send the
README.mdatrepo_url@commit_shato an LLM for document quality evaluation - Send the code at
repo_url@commit_shato an LLM for code quality evaluation - Use PlayWright to visit
pages_urlrun a series of checks based on the templates - Log into the
results
- Check if the
- Create a
round2.pyscript. For each row in therepostable, it will:- Skip if
resultstable has a matchingemail,task,round=2- indicating a succesful round 2 - Generate a task with the same fields as
round1.py, except:- brief + attachments + checks: randomly picked from the same task template but for round 2
- POST it to the
endpointand receive the HTTP status code - Log into the
taskstable
- Skip if
Sample task templates:
id: sum-of-sales
brief: Publish a single-page site that fetches data.csv from attachments, sums its sales column, sets the title to "Sales Summary ${seed}", displays the total inside #total-sales, and loads Bootstrap 5 from jsdelivr.
attachments:
- name: data.csv
url: data:text/csv;base64,${seed}
checks:
- js: document.title === `Sales Summary ${seed}`
- js: !!document.querySelector("link[href*='bootstrap']")
- js: Math.abs(parseFloat(document.querySelector("#total-sales").textContent) - ${result}) < 0.01
round2:
- brief: Add a Bootstrap table #product-sales that lists each product with its total sales and keeps #total-sales accurate after render.
checks:
- js: document.querySelectorAll("#product-sales tbody tr").length >= 1
- js: >-
(() => {
const rows = [...document.querySelectorAll("#product-sales tbody tr td:last-child")];
const sum = rows.reduce((acc, cell) => acc + parseFloat(cell.textContent), 0);
return Math.abs(sum - ${result}) < 0.01;
})()
- brief: Introduce a currency select #currency-picker that converts the computed total using rates.json from attachments and mirrors the active currency inside #total-currency.
attachments:
- name: rates.json
url: data:application/json;base64,${seed}
checks:
- js: !!document.querySelector("#currency-picker option[value='USD']")
- js: !!document.querySelector("#total-currency")
- brief: Allow filtering by region via #region-filter, update #total-sales with the filtered sum, and set data-region on that element to the active choice.
checks:
- js: document.querySelector("#region-filter").tagName === "SELECT"
- js: document.querySelector("#total-sales").dataset.region !== undefinedid: markdown-to-html
brief: Publish a static page that converts input.md from attachments to HTML with marked, renders it inside #markdown-output, and loads highlight.js for code blocks.
attachments:
- name: input.md
url: data:text/markdown;base64,${seed}
checks:
- js: !!document.querySelector("script[src*='marked']")
- js: !!document.querySelector("script[src*='highlight.js']")
- js: document.querySelector("#markdown-output").innerHTML.includes("<h")
round2:
- brief: Add tabs #markdown-tabs that switch between rendered HTML in #markdown-output and the original Markdown in #markdown-source while keeping content in sync.
checks:
- js: document.querySelectorAll("#markdown-tabs button").length >= 2
- js: document.querySelector("#markdown-source").textContent.trim().length > 0
- brief: Support loading Markdown from a ?url= parameter when present and fall back to the attachment otherwise, showing the active source in #markdown-source-label.
checks:
- js: document.querySelector("#markdown-source-label").textContent.length > 0
- js: !!document.querySelector("script").textContent.includes("fetch(")
- brief: Display a live word count badge #markdown-word-count that updates after every render and formats numbers with Intl.NumberFormat.
checks:
- js: document.querySelector("#markdown-word-count").textContent.includes(",")
- js: !!document.querySelector("script").textContent.includes("Intl.NumberFormat")id: github-user-created
brief: Publish a Bootstrap page with form id="github-user-${seed}" that fetches a GitHub username, optionally uses ?token=, and displays the account creation date in YYYY-MM-DD UTC inside #github-created-at.
checks:
- js: document.querySelector("#github-user-${seed}").tagName === "FORM"
- js: document.querySelector("#github-created-at").textContent.includes("20")
- js: !!document.querySelector("script").textContent.includes("https://api.github.com/users/")
round2:
- brief: Show an aria-live alert #github-status that reports when a lookup starts, succeeds, or fails.
checks:
- js: document.querySelector("#github-status").getAttribute("aria-live") === "polite"
- js: !!document.querySelector("script").textContent.includes("github-status")
- brief: Display the account age in whole years inside #github-account-age alongside the creation date.
checks:
- js: parseInt(document.querySelector("#github-account-age").textContent, 10) >= 0
- js: document.querySelector("#github-account-age").textContent.toLowerCase().includes("years")
- brief: Cache the last successful lookup in localStorage under "github-user-${seed}" and repopulate the form on load.
checks:
- js: !!document.querySelector("script").textContent.includes("localStorage.setItem(\"github-user-${seed}\"")
- js: !!document.querySelector("script").textContent.includes("localStorage.getItem(\"github-user-${seed}\"")