74 lines
1.6 KiB
Markdown
74 lines
1.6 KiB
Markdown
# Gousto Recipe Scraper
|
|
|
|
A Python script to scrape recipe data from Gousto's website and save it to a JSON file.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.7+
|
|
- Chrome or Chromium browser (for Selenium)
|
|
- ChromeDriver (will be installed automatically by webdriver-manager)
|
|
|
|
## Setup
|
|
|
|
1. Clone this repository:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd gousto-scraper
|
|
```
|
|
|
|
2. Create and activate a virtual environment:
|
|
```bash
|
|
# On Linux/MacOS
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
|
|
# On Windows
|
|
python -m venv venv
|
|
.\venv\Scripts\activate
|
|
```
|
|
|
|
3. Install the required packages:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Usage
|
|
|
|
Run the scraper with the following command:
|
|
|
|
```bash
|
|
python scraper.py
|
|
```
|
|
|
|
This will:
|
|
1. Scrape recipe data from Gousto's website
|
|
2. Save the results to `gousto_recipes.json`
|
|
|
|
### Options
|
|
|
|
- `--use-selenium` (default: True): Use Selenium for JavaScript rendering
|
|
- `--headless` (default: True): Run browser in headless mode
|
|
- `--max-pages`: Maximum number of recipe pages to scrape (default: all)
|
|
- `--output`: Output JSON file path (default: gousto_recipes.json)
|
|
|
|
Example:
|
|
```bash
|
|
python scraper.py --max-pages 5 --output recipes.json
|
|
```
|
|
|
|
## Output
|
|
|
|
The script saves the scraped data to a JSON file containing an array of recipe objects, each including:
|
|
- Title
|
|
- Description
|
|
- Ingredients
|
|
- Cooking time
|
|
- Nutritional information
|
|
- And more
|
|
|
|
## Notes
|
|
|
|
- This script is for educational purposes only
|
|
- Be respectful of Gousto's website - don't make too many requests in a short period
|
|
- The website structure might change over time, which could break the scraper
|