# **FOREDF** - **FOR**ensic **E**mail fetcher and PDF analyzer using peep**DF**

**FOREDF** is a forensic-oriented tool designed for the analysis of PDF documents within a fully Dockerized environment.
By default, all operations run under a **non-root user** to ensure safer and more controlled execution.

The tool supports two primary workflows:

1. **Email fetching** – FOREDF can fetch emails from any mailbox (even PEC).
   - If emails contain PDF attachments, these are automatically downloaded.
   - Along with the PDF, the tool also extracts and saves the email metadata and headers in a `.json` file.

2. **Direct PDF analysis** – Users may also place their own PDF files inside the `pdfs/` folder.
   - As a safeguard, FOREDF always creates and works on **forensic copies** of these files to preserve integrity.

Once PDFs are available, they can be analyzed using [peepdf-3](https://github.com/digitalsleuth/peepdf-3), a Python 3–optimized fork of the original [peepdf](https://github.com/jesparza/peepdf).
Both versions share the same functionalities, but the original tool was written in Python 2, which is now deprecated.

## Extended Functionality

This Python 3–ready version has been enhanced with additional forensic features in foredf :

### 1. Embedded File Analysis (`embedded_analysis`)
When a PDF contains embedded files, the tool inspects them using several techniques:

- **Shannon Entropy Calculation**
  - Entropy is computed with the formula:     H(X) = - Σ (from i=0 to 255) p_i * log2(p_i), where \(p_i\) is the probability of each byte value.
  - High entropy (close to 8) suggests compression, encryption or obfuscation.
  - Low entropy suggests predictable or plain content (e.g., text, simple images).

- **File Hash Computation** 
  - Generates **MD5, SHA1, and SHA256** hashes for each embedded file.
  - These hashes serve as digital fingerprints, enabling quick comparison against malware databases or services like **VirusTotal**.
  - If a Virus Total API key is provided, the **MD5 hash** is automatically submitted to VirusTotal for detection statistics.

- **YARA Checks**
  - Embedded files are scanned using pre-defined [YARA](https://yara.readthedocs.io/en/stable/) rules (`rules.yar`).
  - Example rules include:
    - `contains_pe_file`: Detects Windows executables via the `MZ` header.
    - `detect_activemime`: Detects ActiveMime objects (common in malicious email attachments).
    - `detect_embedded_excel` and `detect_embedded_word`: Identify embedded Microsoft Office documents.
    - `detect_embedded_mht`: Detects archived HTML content (often used in phishing).
    - `detect_embedded_pdf`: Detects nested PDFs used for obfuscation.

- **MIME Type Verification**
  - The file’s actual MIME type is checked via its magic number.
  - This is compared against the declared file extension to flag mismatches (e.g., a file named `image.jpg` that is actually an executable).

### 2. Digital Signature Checks (`check_signatures`)
- Determines whether the PDF is **digitally signed**.
- If signed, it extracts the metadata of the signature(s) and any embedded **X.509 certificate**.
- Each signature is verified, and the tool highlights:
  - Objects properly covered by the signature(s).
  - Objects left unsigned (potential signs of tampering).
- In case of multiple signatures, a **summary table** is provided, showing which signer covered which objects, and highlighting suspicious unsigned objects.

## Reporting

After analysis, FOREDF generates an **automatic forensic report** in `.txt` format, which includes:

- **Email headers and metadata** (from the `.json` files created during fetching) with related hashes.
- **Forensic copy hash verification** (ensuring the PDF was not modified during analysis).
- **Full peepdf analysis log**.

The `.txt` format ensures readability and allows inclusion into larger forensic reports.

Additionally, for each report generated:
- **MD5** and **SHA256** hashes of the `.txt` report itself are printed.  
- This guarantees the immutability and integrity of the forensic evidence for future validation.


## 1. Setup

1. **Clone the repository**

```bash
git clone https://github.com/emfourem/foredf.git
cd foredf
```

2. **Copy the template environment file**

```bash
cp template-env .env
```
3. **Edit `.env` with your credentials**

Create or edit a `.env` file in the project root:

```bash
nano .env
```

Example `.env` contents:

```ini
IMAP_HOST=imap.example.com
IMAP_USER=your_email@example.com
IMAP_PASS=your_app_password   # Application-specific password
```

> ⚠️ **Important:** The `.env` file contains sensitive data and must **never be committed** to version control.

---

**Gmail Specific Configuration**

If you are using Gmail, you need to configure your account as follows:

1. **Enable IMAP** in Gmail settings:
   *Go to* `Settings → Forwarding and POP/IMAP → Enable IMAP`

2. **Enable 2-Step Verification**:
   https://myaccount.google.com/security

3. **Generate an App Password**:
   - Navigate to https://myaccount.google.com/apppasswords
   - Choose **Mail** as the app and **Other (e.g., PythonIMAP)** as the device
   - Copy the generated 16-character app password

4. **Update `.env`** with the app password:

   ```ini
   IMAP_HOST=imap.gmail.com
   IMAP_USER=your_email@gmail.com
   IMAP_PASS=xxxx xxxx xxxx xxxx   # 16-character Gmail app password
   ```
---

4. **Create a folder named `pdfs`** in which you can place PDFs to analyze or where automatically downloaded PDFs will be inserted:

```bash
mkdir pdfs
```

5. **Build the Docker container**:

```bash
docker compose build
# or
docker-compose build
```

6. **Run the Docker container**:

```bash
docker compose run --rm foredf
# or
docker-compose run --rm foredf
```

7. The container automatically loads environment variables from `.env`.
   You can now execute scripts inside the container.

---

## 2. Running the Scripts

### Email Fetcher

The `fetch_email` command wraps the `fetch_email.py` script to download and process emails.

- PDFs are saved in the folder specified in `OUTPUT_DIR`.
- JSON files (headers + metadata) are saved in the folder specified in `FORENSIC_DIR`.
- Nested `.eml` and PEC `.p7m` attachments are automatically processed and extracted.
- **Forensic Safety:** All downloaded PDFs and JSON metadata are **saved in read-only mode** to prevent accidental modification.

#### Usage

```bash
fetch_email [options]
```

#### Options

- `-s, --subject`
  Filter emails by subject (case-insensitive).

- `-f, --from`
  Filter emails by sender email address.

- `-n <int>`
  Number of last emails to process (default: `10`).

#### Examples

```bash
# Fetch the last 5 emails containing "bill" in the subject
fetch_email --subject "bill" -n 5

# Fetch all emails from a specific PEC sender
fetch_email --from "pec@domain.it"

# Fetch the last 3 emails from a sender with subject filter
fetch_email -f "pec@domain.it" -s "bill" -n 3
```


### PDF Analyzer (peepdf)

The `peepdf` command wraps the `peepdf.py` script to perform forensic PDF analysis.

```bash
peepdf [-h] | <pdf_filename>
```

#### Features

- Creates a **forensic copy** of the PDF (files in `pdfs/` are copied into `pdfs/forensic_copy/` before analysis).
- Supports only two modes:
  - `-h` → show help.  
  - `<pdf_filename>` → open PDF in **interactive console** mode (`-i`).  
- Computes and stores **MD5** and **SHA256** hashes of the forensic copy.  
  - Hashes are verified again after the analysis to ensure no modification occurs.  
- Captures the entire **analysis log** for later reporting.  

#### Interactive Console

Once inside the **PPDF interactive console**, you can:  

- Type `help` to list available commands.  
- Type `help <command>` to display detailed help for a specific command.  

#### VirusTotal Integration

If you have a **VirusTotal API Key**, you can use the `vtcheck` command inside the interactive console.  
First, set your key in the console:

```bash
set vt_key YOUR_API_KEY
```

Then run:

```bash
vtcheck
```

> ⚠️ Only the **MD5 hash** of the file is sent to VirusTotal — the PDF content itself is never uploaded.  

#### Example

```bash
peepdf "Welcome Letter.pdf"
```


---

## Notes

- Keep `template-env` in the repository as a **safe reference template**.  
- The `.env` file is **local-only** and must remain in `.gitignore` to protect credentials.  
- Bind mounts in `docker-compose.yml` ensure that your local `pdfs/` folder is accessible inside the container.  
- Wrapper scripts (`fetch_email` and `peepdf`) let you run commands from anywhere without manually changing directories.  
- The tool ensures **forensic safety inside the container**: downloaded files are made read-only, and you should **not modify files from the underlying host**. Only add new files intentionally, as they will be processed and preserved by the tool.

## License

This project is distributed under the terms of the
[GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.html).
See the [LICENSE](./LICENSE) file for details.

## Credits

This project integrates and further extends functionality from
[peepdf](https://github.com/jesparza/peepdf) by **Jose Miguel Esparza**, and
its enhanced Python 3 version [peepdf-3](https://github.com/digitalsleuth/peepdf-3) by **Digital Sleuth**.
Both projects are licensed under the **GNU General Public License v3.0 (GPLv3)**.
