JSON#

JSON (JavaScript Object Notation) is the de facto standard format for data exchange on the web and APIs. Its human-readable format and widespread support make it essential for data scientists working with web services, APIs, and configuration files.

For data scientists, JSON is essential when:

  • Working with REST APIs and web services
  • Storing configuration files and metadata
  • Parsing semi-structured data from databases like MongoDB
  • Creating data visualization specifications (e.g., Vega-Lite)

Watch this comprehensive introduction to JSON (15 min):

JSON Crash Course

Key concepts to understand in JSON:

  • JSON only supports 6 data types: strings, numbers, booleans, null, arrays, and objects
  • You can nest data. Arrays and objects can contain other data types, including other arrays and objects
  • Always validate. Ensure JSON is well-formed. Common errors: trailing commas, missing quotes, and incorrect escaping

JSON Lines is a format that allows you to store multiple JSON objects in a single line. It’s useful for logging and streaming data.

Tools you could use with JSON:

Common Python operations with JSON:

import json

# Parse JSON string
json_str = '{"name": "Alice", "age": 30}'
data = json.loads(json_str)

# Convert to JSON string
json_str = json.dumps(data, indent=2)

# Read JSON from file
with open("data.json") as f:
    data = json.load(f)

# Write JSON to file
with open("output.json", "w") as f:
    json.dump(data, f, indent=2)

# Read JSON data into a Pandas DataFrame. JSON data is typically stored as an array of objects.
import pandas as pd

df = pd.read_json("data.json")

# Read JSON lines from file into a DataFrame. JSON lines are typically one line per object.
df = pd.read_json("data.jsonl", lines=True)

Practice JSON skills with these resources: