Init commit with working script

2025-08-17 11:44:25 +01:00
commit e570dfe1dc
4 changed files with 807 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,73 @@
+# Gousto Recipe Scraper
+
+A Python script to scrape recipe data from Gousto's website and save it to a JSON file.
+
+## Prerequisites
+
+- Python 3.7+
+- Chrome or Chromium browser (for Selenium)
+- ChromeDriver (will be installed automatically by webdriver-manager)
+
+## Setup
+
+1. Clone this repository:
+   ```bash
+   git clone <repository-url>
+   cd gousto-scraper
+   ```
+
+2. Create and activate a virtual environment:
+   ```bash
+   # On Linux/MacOS
+   python3 -m venv venv
+   source venv/bin/activate
+
+   # On Windows
+   python -m venv venv
+   .\venv\Scripts\activate
+   ```
+
+3. Install the required packages:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+## Usage
+
+Run the scraper with the following command:
+
+```bash
+python scraper.py
+```
+
+This will:
+1. Scrape recipe data from Gousto's website
+2. Save the results to `gousto_recipes.json`
+
+### Options
+
+- `--use-selenium` (default: True): Use Selenium for JavaScript rendering
+- `--headless` (default: True): Run browser in headless mode
+- `--max-pages`: Maximum number of recipe pages to scrape (default: all)
+- `--output`: Output JSON file path (default: gousto_recipes.json)
+
+Example:
+```bash
+python scraper.py --max-pages 5 --output recipes.json
+```
+
+## Output
+
+The script saves the scraped data to a JSON file containing an array of recipe objects, each including:
+- Title
+- Description
+- Ingredients
+- Cooking time
+- Nutritional information
+- And more
+
+## Notes
+
+- This script is for educational purposes only
+- Be respectful of Gousto's website - don't make too many requests in a short period
+- The website structure might change over time, which could break the scraper