Tools in Data Science - Jan 2025#
Tools in Data Science is a practical diploma level data science course at IIT Madras that teaches popular tools for sourcing data, transforming it, analyzing it, communicating these as visual stories, and deploying them in production.
This course exposes you to real-life tools
Courses teach you programming and data science. From statistics to algorithms to writing Python code to building models.
But one critical subject that’s rarely covered is: what tools should I pick and how do I become proficient in them?
These tools might not help your CV much. But they will make things easier in real life. For example, at school:
- You learn from pristine datasets. But in the industry, you’ll have to scrape them yourself.
- You learn how to train models. But soon, you’ll just pick something from HuggingFace.
- You learn to write a log parser over weeks. Instead, your boss writes a
sed+grepscript in minutes.
“We lost the documentation on quantum mechanics. You’ll have to decode the regexes yourself.”
In this course, we’ve curated the most important tools people use in data science.
Learn them well. You’ll be a lot more productive than your peers.
This course is quite hard
Here’s students’ feedback:
- It used to be an easy course until 2024. # # #
- Now it’s hard and covers more. Take it in your last semester if possible. # # #
- Plan extra time. It takes more time than typical 3-credit courses. # # #
- LLMs grade you – unpredictably. # #
- The ROE is hard. #
Take Graded assignment 1 to check if you’re ready for this course. Please drop this course (do it in a later term) if you score low. It’ll be too tough for you now.
Programming skills are a pre-requisite
You need a good understanding of Python, JavaScript, HTML, HTTP, Excel, and data science concepts.
But isn’t this a data science course? Yes. Good data scientists are good programmers. Data scientists don’t just analyze data or train models. They source data, clean it, transform it, visualize it, deploy it, and automate the whole process.
In some organizations, some of this work is done by others (e.g. data engineers, IT teams, etc.). But wherever you are, some of the time, you need to write code for all of this yourself.
This course teaches you tools that will make you more productive. But you do need programming to learn many of them.
We encourage learning by sharing
You CAN copy from friends. You can work in groups. You can share code. Even in projects, assignments, and exams (except the final end-term exam).
Why should you copy? Because in real life, there’s no time to re-invent the wheel. You’ll be working in teams on the shoulders of giants. It’s important to learn how to do that well.
To learn well, understand what you’re copying. If you’re short of time, prioritize.
To learn better, teach what you’ve learnt.
We cover 7 modules in 12 weeks#
The content evolves with technology and feedback. Track the commit history for changes.
Released content:
- Development Tools and concepts to build models and apps. Discussion Thread
- Deployment Tools and concepts to publish what you built. Discussion Thread
- Large Language Models that make your work easier and your apps smarter. Discussion Thread
- Data Sourcing to get data from the web, files, and databases. Discussion Thread
- Data Preparation to clean up and convert the inputs to the right format. Discussion Thread
Project 1 to build an LLM-based automation agent. Discussion Thread
Work in progress:
- Data Analysis to find surprising insights in the data.
- Data Visualization to communicate those insights as visual stories.
Evaluations are mostly open Internet#
| Exam | Type | Weight | Release Date | Submission Date |
|---|---|---|---|---|
| GA: Graded assignments | Best 4 out of 7 ‡ | 15% | ||
| Graded Assignment 1 | Online open MCQ | 30 Dec 2024 | 26 Jan 2025 | |
| Graded Assignment 2 | Online open MCQ | 3 Jan 2025 | 2 Feb 2025 | |
| Graded Assignment 3 | Online open MCQ | 15 Jan 2025 | 5 Feb 2025 | |
| Graded Assignment 4 | Online open MCQ | 31 Jan 2025 | 9 Feb 2025 | |
| P1: Project 1 | Take-home open-Internet | 20% | 19 Jan 2025 | 16 Feb 2025 |
| Graded Assignment 5 | Online open MCQ | 7 Feb 2025 | 21 Feb 2025 | |
| Graded Assignment 6 | Online open MCQ | 28 Feb 2025 | 16 Mar 2025 | |
| P2: Project 2 | Take-home open-Internet | 20% | 3 Mar 2025 | 31 Mar 2025 |
| Graded Assignment 7 | Online open MCQ | 14 Mar 2025 | 26 Mar 2025 | |
| ROE: Remote Online Exam | Online open-Internet MCQ | 20% | 02 Mar 2025 13:00 | 02 Mar 2025 13:45 |
| F: Final end-term | In-person, no internet, mandatory | 25% | 13 Apr 2025 |
Updates#
- 13 Jan 2025: GA3 release date moved from 10 Jan 2025 to 15 Jan 2025 due to faculty delay. Students have till 2 Feb 2025 - more than the 10 days expected for a GA.
- 22 Jan 2025: GA2 submission date moved from 26 Jan 2025 to 2 Feb 2025. GA4 release date is moved from 24 Jan 2025 to 31 Jan 2025. This is to reduce the amount students have to learn in a short period.
- 29 Jan 2025: GA3 submission date moved from 2 Feb 2025 to 5 Feb 2025.
- 13 Feb 2025: GA5 submission date moved from 16 Feb 2025 to 21 Feb 2025.
- 15 Feb 2025: Project 1 deadline moved from 15 Jan 2025 to 16 Feb 2025.
- 26 Feb 2025:
- Project 1 results will be released by 16 Mar 2025.
- Graded Assignment 6 moved from 14 Feb to 28 Feb 2025. Submission date moved from 9 Mar to 16 Mar 2025.
- Project 2 moved from 21 Feb to 3 Mar 2025. Submission date moved from 17 Mar to 31 Mar 2025.
- Graded Assignment 7 moved from 28 Feb to 7 Mar 2025. Submission date moved from 16 Mar to 26 Mar 2025.
- 7 Mar 2025: GA7 release date moved from 7 Mar to 14 Mar 2025.
Notes#
- Graded Assignment 1 checks course pre-requisites. Please drop this course (do it in a later term) if you score low. It’ll be too tough for you now.
- ‡ Graded Assignments: Best 4 out 7. We’ll take the best 4 out of your graded assignments submissions. These, combined, will have a 15% weightage.
- Remote exams are open and hard
- You can use the Internet, WhatsApp, ChatGPT, your notes, your friends, your pets…
- The RoE is especially hard. Read: What is the purpose of an impossible RoE exam?
- Final exam is in-person and closed book. It tests your memory. It’s easy.
- Projects test application. The projects test how well you apply what you learnt in a real-world context.
- Bonus activities may be posted on Discourse. See previous bonus activities
- Evaluations are mostly automated. This course uses pre-computed (for objective) or LLMs (for subjective) evaluations.
- LLMs will evaluate you differently each time. Learn to prompt them robustly to get higher marks.
Constantly check communications#
Check these three links regularly to keep up with the course.
- Seek Inbox for Course Announcements. Log into seek.onlinedegree.iitm.ac.in and click on “Inbox” on the left. Check notifications daily.

- Your email for Course Announcements. Seek Inbox are forwarded to your email. Check daily. Check spam folders too.
- TDS Discourse: Faculty, instructors, and TAs will share updates and address queries here. Email [email protected] cc: [email protected] if you can’t access Discourse.
People who help you#
- Faculty (who design the course)
- Instructors (who teach the course)
- Carlton D’Silva. [email protected] | @carlton
- Prasanna S, [email protected] | @iamprasna
- Teaching assistants (who help you with your doubts)
- Jivraj Singh, [email protected] | @Jivraj | LinkedIn Profile
- Saransh Saini, [email protected] | @Saransh_Saini | LinkedIn Profile
- Virtual TA (GPT Instructions)
Their job is to help you. Trouble them for your slightest doubts!
Course Links#
- TDS Discourse - Ask questions, get help, and discuss with your peers.
- IITM BS Degree Programme - Student Handbook
- Tools in Data Science Public course home page
Jan 2025 Links#
- [Jan 2025 Grading Document](https://docs.google.com/document/d/1e1l9ERBGYoS2jhKZHcTP6zZUH_NLzJv99xdcyi21Z1Y/pub.
- TDS: Course page - Jan 2025 – for students to access course content.
- TDS: Course calendar - Jan 2025
- TDS: Announcement group - Jan 2025
- TDS: Course material – Jupyter notebooks, datasets, etc.
- TDS: TA Sessions - Jan 2025 – YouTube playlist