Tools in Data Science - Jan 2025#

Tools in Data Science is a practical diploma level data science course at IIT Madras that teaches popular tools for sourcing data, transforming it, analyzing it, communicating these as visual stories, and deploying them in production.

This course exposes you to real-life tools

Courses teach you programming and data science. From statistics to algorithms to writing Python code to building models.

But one critical subject that’s rarely covered is: what tools should I pick and how do I become proficient in them?

These tools might not help your CV much. But they will make things easier in real life. For example, at school:

  • You learn from pristine datasets. But in the industry, you’ll have to scrape them yourself.
  • You learn how to train models. But soon, you’ll just pick something from HuggingFace.
  • You learn to write a log parser over weeks. Instead, your boss writes a sed + grep script in minutes.

“We lost the documentation on quantum mechanics. You’ll have to decode the regexes yourself.”

In this course, we’ve curated the most important tools people use in data science.

Learn them well. You’ll be a lot more productive than your peers.

This course is quite hard

Here’s students’ feedback:

  • It used to be an easy course until 2024. # # #
  • Now it’s hard and covers more. Take it in your last semester if possible. # # #
  • Plan extra time. It takes more time than typical 3-credit courses. # # #
  • LLMs grade you – unpredictably. # #
  • The ROE is hard. #

Take Graded assignment 1 to check if you’re ready for this course. Please drop this course (do it in a later term) if you score low. It’ll be too tough for you now.

Programming skills are a pre-requisite

You need a good understanding of Python, JavaScript, HTML, HTTP, Excel, and data science concepts.

But isn’t this a data science course? Yes. Good data scientists are good programmers. Data scientists don’t just analyze data or train models. They source data, clean it, transform it, visualize it, deploy it, and automate the whole process.

In some organizations, some of this work is done by others (e.g. data engineers, IT teams, etc.). But wherever you are, some of the time, you need to write code for all of this yourself.

This course teaches you tools that will make you more productive. But you do need programming to learn many of them.

We encourage learning by sharing

You CAN copy from friends. You can work in groups. You can share code. Even in projects, assignments, and exams (except the final end-term exam).

Why should you copy? Because in real life, there’s no time to re-invent the wheel. You’ll be working in teams on the shoulders of giants. It’s important to learn how to do that well.

To learn well, understand what you’re copying. If you’re short of time, prioritize.

To learn better, teach what you’ve learnt.

We cover 7 modules in 12 weeks#

The content evolves with technology and feedback. Track the commit history for changes.

Released content:

  1. Development Tools and concepts to build models and apps. Discussion Thread
  2. Deployment Tools and concepts to publish what you built. Discussion Thread
  3. Large Language Models that make your work easier and your apps smarter. Discussion Thread
  4. Data Sourcing to get data from the web, files, and databases. Discussion Thread
  5. Data Preparation to clean up and convert the inputs to the right format. Discussion Thread

Project 1 to build an LLM-based automation agent. Discussion Thread

Work in progress:

  1. Data Analysis to find surprising insights in the data.
  2. Data Visualization to communicate those insights as visual stories.

Evaluations are mostly open Internet#

ExamTypeWeightRelease DateSubmission Date
GA: Graded assignmentsBest 4 out of 7 ‡15%
Graded Assignment 1Online open MCQ30 Dec 202426 Jan 2025
Graded Assignment 2Online open MCQ3 Jan 20252 Feb 2025
Graded Assignment 3Online open MCQ15 Jan 20255 Feb 2025
Graded Assignment 4Online open MCQ31 Jan 20259 Feb 2025
P1: Project 1Take-home open-Internet20%19 Jan 202516 Feb 2025
Graded Assignment 5Online open MCQ7 Feb 202521 Feb 2025
Graded Assignment 6Online open MCQ28 Feb 202516 Mar 2025
P2: Project 2Take-home open-Internet20%3 Mar 202531 Mar 2025
Graded Assignment 7Online open MCQ14 Mar 202526 Mar 2025
ROE: Remote Online ExamOnline open-Internet MCQ20%02 Mar 2025 13:0002 Mar 2025 13:45
F: Final end-termIn-person, no internet, mandatory25%13 Apr 2025

Updates#

  • 13 Jan 2025: GA3 release date moved from 10 Jan 2025 to 15 Jan 2025 due to faculty delay. Students have till 2 Feb 2025 - more than the 10 days expected for a GA.
  • 22 Jan 2025: GA2 submission date moved from 26 Jan 2025 to 2 Feb 2025. GA4 release date is moved from 24 Jan 2025 to 31 Jan 2025. This is to reduce the amount students have to learn in a short period.
  • 29 Jan 2025: GA3 submission date moved from 2 Feb 2025 to 5 Feb 2025.
  • 13 Feb 2025: GA5 submission date moved from 16 Feb 2025 to 21 Feb 2025.
  • 15 Feb 2025: Project 1 deadline moved from 15 Jan 2025 to 16 Feb 2025.
  • 26 Feb 2025:
    • Project 1 results will be released by 16 Mar 2025.
    • Graded Assignment 6 moved from 14 Feb to 28 Feb 2025. Submission date moved from 9 Mar to 16 Mar 2025.
    • Project 2 moved from 21 Feb to 3 Mar 2025. Submission date moved from 17 Mar to 31 Mar 2025.
    • Graded Assignment 7 moved from 28 Feb to 7 Mar 2025. Submission date moved from 16 Mar to 26 Mar 2025.
  • 7 Mar 2025: GA7 release date moved from 7 Mar to 14 Mar 2025.

Notes#

  • Graded Assignment 1 checks course pre-requisites. Please drop this course (do it in a later term) if you score low. It’ll be too tough for you now.
  • Graded Assignments: Best 4 out 7. We’ll take the best 4 out of your graded assignments submissions. These, combined, will have a 15% weightage.
  • Remote exams are open and hard
  • Final exam is in-person and closed book. It tests your memory. It’s easy.
  • Projects test application. The projects test how well you apply what you learnt in a real-world context.
  • Bonus activities may be posted on Discourse. See previous bonus activities
  • Evaluations are mostly automated. This course uses pre-computed (for objective) or LLMs (for subjective) evaluations.
    • LLMs will evaluate you differently each time. Learn to prompt them robustly to get higher marks.

Constantly check communications#

Check these three links regularly to keep up with the course.

  1. Seek Inbox for Course Announcements. Log into seek.onlinedegree.iitm.ac.in and click on “Inbox” on the left. Check notifications daily. Portal Inbox
  2. Your email for Course Announcements. Seek Inbox are forwarded to your email. Check daily. Check spam folders too.
  3. TDS Discourse: Faculty, instructors, and TAs will share updates and address queries here. Email [email protected] cc: [email protected] if you can’t access Discourse.

People who help you#

Their job is to help you. Trouble them for your slightest doubts!