Init commit with working script
This commit is contained in:
73
README.md
Normal file
73
README.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Gousto Recipe Scraper
|
||||
|
||||
A Python script to scrape recipe data from Gousto's website and save it to a JSON file.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.7+
|
||||
- Chrome or Chromium browser (for Selenium)
|
||||
- ChromeDriver (will be installed automatically by webdriver-manager)
|
||||
|
||||
## Setup
|
||||
|
||||
1. Clone this repository:
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd gousto-scraper
|
||||
```
|
||||
|
||||
2. Create and activate a virtual environment:
|
||||
```bash
|
||||
# On Linux/MacOS
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
|
||||
# On Windows
|
||||
python -m venv venv
|
||||
.\venv\Scripts\activate
|
||||
```
|
||||
|
||||
3. Install the required packages:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Run the scraper with the following command:
|
||||
|
||||
```bash
|
||||
python scraper.py
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Scrape recipe data from Gousto's website
|
||||
2. Save the results to `gousto_recipes.json`
|
||||
|
||||
### Options
|
||||
|
||||
- `--use-selenium` (default: True): Use Selenium for JavaScript rendering
|
||||
- `--headless` (default: True): Run browser in headless mode
|
||||
- `--max-pages`: Maximum number of recipe pages to scrape (default: all)
|
||||
- `--output`: Output JSON file path (default: gousto_recipes.json)
|
||||
|
||||
Example:
|
||||
```bash
|
||||
python scraper.py --max-pages 5 --output recipes.json
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The script saves the scraped data to a JSON file containing an array of recipe objects, each including:
|
||||
- Title
|
||||
- Description
|
||||
- Ingredients
|
||||
- Cooking time
|
||||
- Nutritional information
|
||||
- And more
|
||||
|
||||
## Notes
|
||||
|
||||
- This script is for educational purposes only
|
||||
- Be respectful of Gousto's website - don't make too many requests in a short period
|
||||
- The website structure might change over time, which could break the scraper
|
||||
Reference in New Issue
Block a user