gousto-scraper/README.md

# Gousto Recipe Scraper

A Python script to scrape recipe data from Gousto's website and save it to a JSON file.

## Prerequisites

- Python 3.7+
- Chrome or Chromium browser (for Selenium)
- ChromeDriver (will be installed automatically by webdriver-manager)

## Setup

1. Clone this repository:
   ```bash
   git clone <repository-url>
   cd gousto-scraper
   ```

2. Create and activate a virtual environment:
   ```bash
   # On Linux/MacOS
   python3 -m venv venv
   source venv/bin/activate

   # On Windows
   python -m venv venv
   .\venv\Scripts\activate
   ```

3. Install the required packages:
   ```bash
   pip install -r requirements.txt
   ```

## Usage

Run the scraper with the following command:

```bash
python scraper.py
```

This will:
1. Scrape recipe data from Gousto's website
2. Save the results to `gousto_recipes.json`

### Options

- `--use-selenium` (default: True): Use Selenium for JavaScript rendering
- `--headless` (default: True): Run browser in headless mode
- `--max-pages`: Maximum number of recipe pages to scrape (default: all)
- `--output`: Output JSON file path (default: gousto_recipes.json)

Example:
```bash
python scraper.py --max-pages 5 --output recipes.json
```

## Output

The script saves the scraped data to a JSON file containing an array of recipe objects, each including:
- Title
- Description
- Ingredients
- Cooking time
- Nutritional information
- And more

## Notes

- This script is for educational purposes only
- Be respectful of Gousto's website - don't make too many requests in a short period
- The website structure might change over time, which could break the scraper