Data Sourcing#
Before you do any kind of data science, you obviously have to get the data to be able to analyze it, visualize it, narrate it, and deploy it. And what we are going to cover in this module is how you get the data.
There are three ways you can get the data.
- The first is you can download the data. Either somebody gives you the data and says download it from here, or you are asked to download it from the internet because it’s a public data source. But that’s the first way—you download the data.
- The second way is you can query it from somewhere. It may be on a database. It may be available through an API. It may be available through a library. But these are ways in which you can selectively query parts of the data and stitch it together.
- The third way is you have to scrape it. It’s not directly available in a convenient form that you can query or download. But it is, in fact, on a web page. It’s available on a PDF file. It’s available in a Word document. It’s available on an Excel file. It’s kind of structured, but you will have to figure out that structure and extract it from there.
In this module, we will be looking at the tools that will help you either download from a data source or query from an API or from a database or from a library. And finally, how you can scrape from different sources.
Here are links used in the video:
