My paperless setup

There’s just too much paper in my life.

Papers from the doctor appointments from five years ago. Old receipts. Warranties.

And none of it searchable, and my folders just keep growing in size.

And I don’t want to put all this data into Google Drive or some other non-encrypted provider. And I also ideally don’t want to pay.

Luckily, it’s possible to do just that. Here’s my setup.

0. Hardware

1. Paperless

The docs for paperless-ngx are well written and easy to read. I basically just followed their setup tutorial.

I run Paperless as a Docker container. I use the most generic Docker Compose file, basically directly from the docs.

My local directory /home/dt/paperless-ngx/consume is used to get scans into Paperless. Whatever is dropped there is consumed and added to the Paperless database.

2. Scanning script

I couldn’t figure out an easy way to set up the “scan-to-shared-folder” function. Even in HP’s web UI, they say that it cannot be enabled. Given that it’s HP, I’m guessing it’s intentional and I should probably pay them $500 a month just to be able to connect via Samba.

No matter, there are other tools.

I basically just installed hplip and sane, ran the hp-setup command.

After some negotiations with the scanner around the flags, I managed to get scans going with this command:

hp-scan --device="hpaio:/net/HP_OfficeJet_Pro_8020_series?ip=192.168.8.175" --mode=gray --res=300 --size=a4 --output=/tmp/test-scan.png

This works for single-page scans. I also added a duplex and multi-page scan option, where there’s some post-processing by img2pdf so that I can paste a couple of pages together into a single file.

Click to open the full script
#!/bin/bash

DEVICE="hpaio:/net/HP_OfficeJet_Pro_8020_series?ip=192.168.8.175"
CONSUME="/home/dt/paperless-ngx/consume"

usage() {
  echo "Usage: scan <mode>"
  echo ""
  echo "Modes:"
  echo "  single   Scan one side, save to Paperless consume"
  echo "  duplex   Scan two sides, merge to PDF, save to Paperless consume"
  echo "  multi    Scan arbitrary pages one by one, merge to PDF when done"
  echo ""
  echo "Examples:"
  echo "  scan single"
  echo "  scan duplex"
  echo "  scan multi"
}

scan_page() {
  local output="$1"
  hp-scan --device="$DEVICE" --mode=gray --res=300 --size=a4 --output="$output"
}

case "$1" in
  single)
    OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).png"
    scan_page "$OUTPUT"
    echo "Saved: $OUTPUT"
    ;;
  duplex)
    BASENAME="/tmp/scan-$(date +%Y%m%d-%H%M%S)"
    OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).pdf"
    echo "Place pages in ADF or on flatbed. Press enter to scan side 1..."
    read
    scan_page "${BASENAME}-p1.png"
    echo "Flip the stack. Press enter to scan side 2..."
    read
    scan_page "${BASENAME}-p2.png"
    img2pdf "${BASENAME}-p1.png" "${BASENAME}-p2.png" -o "$OUTPUT"
    rm "${BASENAME}-p1.png" "${BASENAME}-p2.png"
    echo "Saved: $OUTPUT"
    ;;
  multi)
    BASENAME="/tmp/scan-$(date +%Y%m%d-%H%M%S)"
    OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).pdf"
    PAGES=()
    PAGE=1
    while true; do
      echo "Page $PAGE: place on scanner, press enter to scan (or type 'done' to finish)..."
      read INPUT
      if [[ "$INPUT" == "done" ]]; then
        break
      fi
      PAGEFILE="${BASENAME}-p${PAGE}.png"
      scan_page "$PAGEFILE"
      PAGES+=("$PAGEFILE")
      echo "Page $PAGE scanned."
      ((PAGE++))
    done
    if [[ ${#PAGES[@]} -eq 0 ]]; then
      echo "No pages scanned, aborting."
      exit 1
    fi
    img2pdf "${PAGES[@]}" -o "$OUTPUT"
    rm "${PAGES[@]}"
    echo "Saved: $OUTPUT (${#PAGES[@]} pages)"
    ;;
  *)
    usage
    ;;
esac

This script is saved to my $PATH so I can just type scan and it’ll always be available.

3. Final workflow

  1. If it’s a single sided scan:
    1. I go to the scanner and place the page to scan
    2. On my computer, I type scan single
    3. The scanner scans and the image is saved to the consume directory
    4. Paperless grabs that image and adds it
  2. If it’s a duplex or multi-side scan:
    1. I go to the scanner and place the page to scan
    2. On my computer, I type scan duplex or scan multi, and press Enter
    3. When the scanning is done, I am prompted to turn it to the next side and press Enter
    4. When I am done (in the case of multi), instead of pressing Enter, I type done and press Enter
    5. The scanned images are transformed into a single PDF file and saved in the Paperless consume directory

4. Next?

This is all cool and good and sufficient. But I am annoyed that I cannot simply go to the scanner and press the button on the scanner to achieve the same goal.

So I’ll look into that.

But for now, it works great!