Data Visualization with ChatGPT#

ChatGPT and other Large Language Models (LLMs) can help create compelling data visualizations by:

  1. Finding and analyzing datasets
  2. Generating visualization code
  3. Improving visual design
  4. Creating data stories

Watch this workshop (2h) on creating data visualizations with ChatGPT

Prompt to Plot (2 hours)

Prerequisites#

To follow this tutorial, you’ll need:

  • Gemini (free) - good for processing images and video
  • A ChatGPT Plus subscription ($20/month) - recommended for access to advanced models and coding capabilities
  • GitHub account - for publishing visualizations
  • Basic familiarity with HTML/CSS/JavaScript

Other useful but optional tools include:

Finding Datasets#

When asking ChatGPT to recommend datasets, provide clear requirements:

  1. Size constraints (e.g., “around 10,000-100,000 rows”)
  2. Desired column types (text, numbers, categories)
  3. Target audience and story potential
  4. Any specific themes or domains of interest

Example prompt:

I need an interesting dataset for data visualization that:
- Has 10,000-100,000 rows
- Includes various column types (text, numbers, categories)
- Could tell an engaging story for a general audience
- Ideally covers [your preferred theme/domain]

Please search online and suggest datasets matching these criteria.

Ideating Stories#

Once you have a dataset, ask ChatGPT to suggest story ideas:

Given these columns in my dataset:
[List your columns]

Please suggest:
1. A dozen potential data stories
2. Target audience for each story
3. Why each story would be interesting
4. Initial approach for analysis

Analysis and Visualization#

For the analysis phase, instruct ChatGPT to:

  1. Run statistical tests
  2. Filter out insignificant results
  3. Create aesthetically pleasing visualizations
  4. Consider outlier handling

Example prompt:

Please analyze this dataset by:
1. Running relevant statistical tests
2. Removing statistically insignificant results
3. Creating beautiful visualizations (consider styling, colors, typography)
4. Handling outliers appropriately
5. Ensuring the visualization tells a clear story

Generating Web-Ready Code#

When creating visualizations for web deployment, provide specific constraints:

Please create an HTML/JavaScript visualization that:
1. Works well on GitHub Pages
2. Keeps the data payload under 2MB
3. Handles outliers appropriately
4. Uses modern JavaScript
5. Follows good web performance practices

Improving Visual Design#

To enhance the visualization’s appearance, ask for specific improvements:

Please improve this visualization by:
1. Using a professional typography system
2. Implementing an appropriate color scheme
3. Adding proper spacing and layout
4. Including clear annotations and context
5. Making it feel like a professional publication (e.g., New York Times style)

Example Projects#

Here are some examples of data visualizations created using this approach:

  1. Books Visualization
  2. Books Analysis
  3. LLM Data Visualization
  4. Books Visual Story
  5. Books Data Exploration
  6. Coffee Reviews
  7. Story Visualization

Best Practices#

  1. Iterate with the LLM: Don’t expect perfect results on the first try. Refine your prompts based on the outputs.

  2. Be Specific: Clearly specify your requirements for:

    • Visual style
    • Performance constraints
    • Target audience
    • Story elements
  3. Data Size: Consider GitHub Pages limitations when deploying. Either:

    • Preprocess data to reduce size
    • Use data sampling techniques
    • Implement progressive loading
  4. Code Quality: Request modern, maintainable code:

    • Use ES modules
    • Implement responsive design
    • Follow web accessibility guidelines
    • Include error handling
  5. Documentation: Ask the LLM to include:

    • Clear code comments
    • Setup instructions
    • Data preprocessing steps
    • Deployment guide

References#