Files
gousto-scraper/README.md

74 lines
1.6 KiB
Markdown

# Gousto Recipe Scraper
A Python script to scrape recipe data from Gousto's website and save it to a JSON file.
## Prerequisites
- Python 3.7+
- Chrome or Chromium browser (for Selenium)
- ChromeDriver (will be installed automatically by webdriver-manager)
## Setup
1. Clone this repository:
```bash
git clone <repository-url>
cd gousto-scraper
```
2. Create and activate a virtual environment:
```bash
# On Linux/MacOS
python3 -m venv venv
source venv/bin/activate
# On Windows
python -m venv venv
.\venv\Scripts\activate
```
3. Install the required packages:
```bash
pip install -r requirements.txt
```
## Usage
Run the scraper with the following command:
```bash
python scraper.py
```
This will:
1. Scrape recipe data from Gousto's website
2. Save the results to `gousto_recipes.json`
### Options
- `--use-selenium` (default: True): Use Selenium for JavaScript rendering
- `--headless` (default: True): Run browser in headless mode
- `--max-pages`: Maximum number of recipe pages to scrape (default: all)
- `--output`: Output JSON file path (default: gousto_recipes.json)
Example:
```bash
python scraper.py --max-pages 5 --output recipes.json
```
## Output
The script saves the scraped data to a JSON file containing an array of recipe objects, each including:
- Title
- Description
- Ingredients
- Cooking time
- Nutritional information
- And more
## Notes
- This script is for educational purposes only
- Be respectful of Gousto's website - don't make too many requests in a short period
- The website structure might change over time, which could break the scraper