2025 08 04 project 2 - Q&A 2 TDS May 2025#

Here’s an FAQ based on the provided TDS live tutorial:

Q1: What is the main goal of Project 2?

A1: The goal is for you to learn how to effectively use Large Language Models (LLMs) like ChatGPT or Codex as coding engines for data science tasks. We’ll be generating sample questions for a “Data Analyst Agent” and refining the prompts to make them robust and comprehensive.

Q2: What is the instructor’s general approach to tackling these problems?

A2: I embrace an iterative and “messy” process. I often start by dictating a prompt to an LLM, then review its suggestions, and meta-prompt (ask another LLM to improve my prompt). This saves time and leverages the LLMs’ strengths in prompt engineering. I encourage you to interact, ask questions, and follow a similar iterative approach.

Q3: How can I access the LLM tools like Codex?

A3: Codex, available through ChatGPT, generally requires a paid account (Plus tier, $20/month, or higher). Some free alternatives like Jules (which I also use) are available if you want to experiment.

Q4: What is the strategy for generating effective LLM prompts?

A4: The key is to be clear, provide context, and define the desired output structure. I’ll demonstrate including details about the project, the content to be covered (e.g., from _sidebar.md and project-data-analyst-agent.md), requiring specific output formats (like questions.txt, optional data files), and defining evaluation criteria using a promptfoo config. Using XML tags for delineation is often recommended for better LLM understanding.

Q5: What if the LLM’s initial output is not good enough or too narrow?

A5: This is where iteration comes in. I often find initial outputs too narrow or simple. You should provide feedback to the LLM, asking it to: _ Broaden the coverage of topics or categories. _ Increase the number of questions or examples. _ Require specific types of attachments (e.g., data, image, PDF). _ Include structured analysis (e.g., correlation, plots, LLM parsing).

Q6: What is the “bitter lesson of AI” and how does it relate to this project?

A6: The “bitter lesson of AI” suggests that in many domains, complex, human-designed expert systems are often outperformed by simpler, larger models trained on massive datasets. In this project, the lesson is to delegate tasks to LLMs. Instead of you trying to solve every detail, your job is to guide the LLM to do the work, write the code, debug it, and iterate. LLMs are rapidly improving, and this project aims to prepare you for a future where they are central to data science workflows.

Q7: The project seems very open-ended with many possible outputs. How do we judge or evaluate it if we don’t know the exact expected answers?

A7: The core of this project is to use LLMs as your primary tool. LLMs can write code, debug it, and refine their solutions. Your role shifts from writing code yourself to engineering the prompts and managing the LLM’s process. The project is designed to make you rely on LLMs to: _ Interpret the problem. _ Break it down into sub-tasks. _ Generate code. _ Use specific tools (like data processing libraries, web scrapers). * Iterate and improve based on errors or feedback. The evaluation will focus on how effectively you leverage the LLM to achieve the task, not necessarily on a single “correct” answer. Your ability to get a robust, high-quality, and fast solution from the LLM is what matters.

Q8: What specific technologies or libraries will be important for this project?

A8: Beyond LLMs, you’ll likely need skills in: _ Web scraping (for data from URLs). _ Data processing with Pandas. _ Data analysis and visualization. _ Working with different file types (CSV, Excel, PDF, images). _ Database processing. _ Using LLMs for structured text parsing and multimedia processing. * You’ll also be exposed to concepts like promptfoo config for defining evaluation criteria, and methods for optimizing LLM calls (parallelization, caching).

Q9: The 3-minute time limit for LLM responses seems short, especially with complex tasks involving image processing or multiple retries. Will this be adjusted?

A9: While the current limit encourages optimization, if a majority of students consistently face issues with the final question due to time constraints, the deadline may be recalibrated after initial attempts. The focus is on robust optimization (parallelization, caching, early cutoffs for non-performing calls) to meet real-world speed demands.

Q10: Will questions.txt always specify the exact format for inputs and outputs, and how images/files should be delivered to the API?

A10: Yes, the output format and instructions on how to receive inputs (like files attached via an HTML form equivalent) will be specified in questions.txt.

Q11: How do I handle multiple sources of data (e.g., URLs, SQL, Parquet, HTML, PDF, images) in one project?

A11: You can instruct the LLM to use different libraries for different data types (e.g., httpx or requests for URLs, Pandas for CSV/Parquet, specific libraries for PDF or image processing). You can also tell the LLM to read all the provided content and figure out the best approach. The core is using the LLM to generate the appropriate code for each source.

Q12: If I’m building a generic data analyst agent, will promptfoo be used for evaluation?

A12: Yes, the project involves building a generic data analyst agent, and promptfoo (or a similar backend script) will be used to evaluate its responses against predefined criteria.

Q13: How will the project evaluation work, given the variability of LLM outputs?

A13: The evaluation will largely use LLM rubrics. If you can “con” the LLM grader into giving you high marks, that’s acceptable! You’re welcome to do whatever it takes to achieve high scores, within ethical boundaries. Your job is to get high marks, and you can use LLMs to do that.

Q14: Will questions.txt expect multiple data sources (e.g., a CSV, an image, and a URL) within a single question?

A14: Yes, you can expect questions that require processing data from multiple sources. It could be a CSV, an image, and a URL, or even more. The complexity will be in processing these diverse inputs, not necessarily in handling massive amounts of data from a single source.

Q15: What kind of questions should I expect in the final evaluation?

A15: The questions will be roughly based on the content available in the course materials (reordered and possibly with some extra notes). They will likely involve: _ Using reveal.js for Markdown presentations. _ Creating marimo notebooks. _ Visualizing correlation matrices in Excel (via code). _ Publishing PowerPoints with morph transitions. _ Publishing GitHub repos with Seaborn visualizations and code. _ Creating Jules or Codex conversations that generate a full-fledged data visualization. _ Expect questions that test your ability to handle diverse data types, perform structured analysis (e.g., correlation, plotting degrees in networks), and integrate LLM parsing with geospatial analysis. _ There will be a mix of very easy questions (solvable with a simple LLM call) and some harder ones requiring more complex prompt engineering and iteration.

Q16: Do I need to create a front-end UI for my project?

A16: No, you do not need to create a front-end. What you saw me demonstrating (a simple web UI) is a bonus and helpful for debugging, but not required. You can debug just as effectively using cURL requests and promptfoo. Focusing on the core project (LLM engineering) is more important than UI development.

Q17: Is it mandatory to include a Docker container for my solution?

A17: Yes, you’ll need to build a proper Docker container with the pre-installed tools. This is a core requirement.

Q18: What is the deadline for Project 2, and when will the submission link be active?

A18: The deadline for Project 2 has been extended to August 6th, 2025. The submission link will be a Google Form, and you will receive an email and Discord notification, along with an update on the website, when it is active. You can revise your submission until the deadline.

Q19: How do I ensure my API endpoint is working correctly before final submission?

A19: We will run a preliminary check on submitted URLs to ensure accessibility. If your endpoint isn’t working, you’ll be notified via email and given a chance to fix it. This is why you must be responsive to communications.

Q20: Will the evaluation process involve multiple retries or specific tests against my API?

A20: Yes, we will send multiple requests to your API endpoint. We’ll perform a small, initial “ping” test to ensure accessibility, followed by several retries of the actual evaluation requests. If your endpoint fails multiple times, you’ll be given an opportunity to fix it. The evaluation script will be general and will hit all API endpoints you submit.

Q21: How will different API keys (e.g., OpenAI vs. Gemini) impact evaluation, and can I use any open-source LLM?

A21: You can use any LLM you prefer (OpenAI, Gemini, open-source models like Llama, etc.). The output will be evaluated, and while some models might perform slightly better by default, good prompt engineering can make any model perform well. There’s a theory that models prefer their own output, but this effect is minimal.

Q22: Do I need to be concerned about LLMs hallucinating or providing confident but incorrect answers?

A22: Yes, LLMs can hallucinate. This is where your prompt engineering skills are crucial. If you’re confident your prompt is good, and the LLM still gives wrong answers, that indicates a limitation of the model, not your prompting. However, LLMs also offer powerful debugging capabilities (e.g., feeding an error message back to the LLM to get a corrected code). Your job is to maximize reliability.

Q23: How do I handle situations where the LLM gets stuck in an infinite loop?

A23: You should implement a timeout mechanism in your code. However, if an LLM consistently gets into an infinite loop, it might indicate you’re using the wrong model for the task. Generally, newer models are less prone to this.

Q24: What is the recommended strategy for handling various input files (e.g., multiple CSVs, images, PDFs, URLs) in the API endpoint?

A24: Your API endpoint needs to be designed to accept multiple files, typically via a multipart/form-data request (similar to an HTML form upload). You can use libraries like Pandas for CSVs/Parquet, specific libraries for PDFs/images, and HTTP clients for URLs. The system will explicitly mention which files to expect.

Q25: What is “query rewriting,” and should I implement it in my solution?

A25: Query rewriting (or prompt rewriting) is where you take a user’s initial prompt, interpret what they really want, and rephrase it into a more effective prompt for the LLM. It’s a very powerful technique often used in real-world applications. While not strictly required for this project (as the evaluation prompts will be precise), it’s a valuable skill to learn for broader LLM-based solutions.

Q26: Given the open-ended nature, should I focus on generating my own questions and answers?

A26: Yes, you should definitely generate your own questions and answers. The process of creating your own robust question bank, then seeing how well your program solves them, is a powerful learning experience. Don’t limit yourself to just the sample questions provided.

Q27: What is the “most ambitious project” statement referring to?

A27: This project is designed to be highly ambitious, pushing you to work on cutting-edge data science by leveraging LLMs for complex, multi-modal tasks. It prepares you for real-world scenarios where you’ll be dealing with cutting-edge data science challenges.

Q28: Will the API endpoint need to handle specialized data extraction functions for different content types?

A28: Yes, the project may require specialized data extraction. The libraries covered in TDS (Pandas, specific PDF/image libraries) already provide functions for reading various content types. Your role is to use the LLM to generate code that utilizes these libraries effectively for the task. You might not need to write entirely new specialized functions, but rather integrate existing library functionalities.

Q29: How can I ensure my code is reliable when dealing with potentially inconsistent data or LLM outputs?

A29: Robustness is key. Make sure your code can return a response no matter what happens, even if it’s just a dummy answer. Implement error handling (try/catch blocks) and ensure the response adheres to the expected JSON structure. If you need to, have a “fallback” strategy to generate a valid (even if minimal) JSON response. This makes your system more reliable, which is a core aspect of “reliability engineering.”