My paperless setup
There’s just too much paper in my life.
Papers from the doctor appointments from five years ago. Old receipts. Warranties.
And none of it searchable, and my folders just keep growing in size.
And I don’t want to put all this data into Google Drive or some other non-encrypted provider. And I also ideally don’t want to pay.
Luckily, it’s possible to do just that. Here’s my setup.
0. Hardware
- Machine running paperless: a Thinkpad X220 running Arch.
- An HP OfficeJet Pro 8022e printer/scanner.
- All connectivity is on the local area network, no scanned data goes out.
1. Paperless
The docs for paperless-ngx are well written and easy to read.
I basically just followed their setup tutorial.
I run Paperless as a Docker container. I use the most generic Docker Compose file, basically directly from the docs.
My local directory /home/dt/paperless-ngx/consume is used to get scans into Paperless.
Whatever is dropped there is consumed and added to the Paperless database.
2. Scanning script
I couldn’t figure out an easy way to set up the “scan-to-shared-folder” function. Even in HP’s web UI, they say that it cannot be enabled. Given that it’s HP, I’m guessing it’s intentional and I should probably pay them $500 a month just to be able to connect via Samba.
No matter, there are other tools.
I basically just installed hplip and sane, ran the hp-setup command.
After some negotiations with the scanner around the flags, I managed to get scans going with this command:
hp-scan --device="hpaio:/net/HP_OfficeJet_Pro_8020_series?ip=192.168.8.175" --mode=gray --res=300 --size=a4 --output=/tmp/test-scan.png
This works for single-page scans.
I also added a duplex and multi-page scan option, where there’s some post-processing by img2pdf so that I can paste a couple of pages together into a single file.
Click to open the full script
#!/bin/bash
DEVICE="hpaio:/net/HP_OfficeJet_Pro_8020_series?ip=192.168.8.175"
CONSUME="/home/dt/paperless-ngx/consume"
usage() {
echo "Usage: scan <mode>"
echo ""
echo "Modes:"
echo " single Scan one side, save to Paperless consume"
echo " duplex Scan two sides, merge to PDF, save to Paperless consume"
echo " multi Scan arbitrary pages one by one, merge to PDF when done"
echo ""
echo "Examples:"
echo " scan single"
echo " scan duplex"
echo " scan multi"
}
scan_page() {
local output="$1"
hp-scan --device="$DEVICE" --mode=gray --res=300 --size=a4 --output="$output"
}
case "$1" in
single)
OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).png"
scan_page "$OUTPUT"
echo "Saved: $OUTPUT"
;;
duplex)
BASENAME="/tmp/scan-$(date +%Y%m%d-%H%M%S)"
OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).pdf"
echo "Place pages in ADF or on flatbed. Press enter to scan side 1..."
read
scan_page "${BASENAME}-p1.png"
echo "Flip the stack. Press enter to scan side 2..."
read
scan_page "${BASENAME}-p2.png"
img2pdf "${BASENAME}-p1.png" "${BASENAME}-p2.png" -o "$OUTPUT"
rm "${BASENAME}-p1.png" "${BASENAME}-p2.png"
echo "Saved: $OUTPUT"
;;
multi)
BASENAME="/tmp/scan-$(date +%Y%m%d-%H%M%S)"
OUTPUT="$CONSUME/scan-$(date +%Y%m%d-%H%M%S).pdf"
PAGES=()
PAGE=1
while true; do
echo "Page $PAGE: place on scanner, press enter to scan (or type 'done' to finish)..."
read INPUT
if [[ "$INPUT" == "done" ]]; then
break
fi
PAGEFILE="${BASENAME}-p${PAGE}.png"
scan_page "$PAGEFILE"
PAGES+=("$PAGEFILE")
echo "Page $PAGE scanned."
((PAGE++))
done
if [[ ${#PAGES[@]} -eq 0 ]]; then
echo "No pages scanned, aborting."
exit 1
fi
img2pdf "${PAGES[@]}" -o "$OUTPUT"
rm "${PAGES[@]}"
echo "Saved: $OUTPUT (${#PAGES[@]} pages)"
;;
*)
usage
;;
esac
This script is saved to my $PATH so I can just type scan and it’ll always be available.
3. Final workflow
- If it’s a single sided scan:
- I go to the scanner and place the page to scan
- On my computer, I type
scan single - The scanner scans and the image is saved to the
consumedirectory - Paperless grabs that image and adds it
- If it’s a duplex or multi-side scan:
- I go to the scanner and place the page to scan
- On my computer, I type
scan duplexorscan multi, and press Enter - When the scanning is done, I am prompted to turn it to the next side and press Enter
- When I am done (in the case of multi), instead of pressing Enter, I type
doneand press Enter - The scanned images are transformed into a single PDF file and saved in the Paperless consume directory
4. Next?
This is all cool and good and sufficient. But I am annoyed that I cannot simply go to the scanner and press the button on the scanner to achieve the same goal.
So I’ll look into that.
But for now, it works great!