Search pictures within folders using OCR
  • JavaScript 87.8%
  • CSS 10.8%
  • HTML 1.4%
Find a file
2026-02-16 00:29:29 +01:00
public tfind 2026-02-15 23:09:43 +01:00
util tfind 2026-02-15 23:09:43 +01:00
.gitignore tfind 2026-02-15 23:09:43 +01:00
example-config.json tfind 2026-02-15 23:09:43 +01:00
index.mjs fix fallthrough 2026-02-16 00:29:29 +01:00
middlewares.mjs tfind 2026-02-15 23:09:43 +01:00
package-lock.json tfind 2026-02-15 23:09:43 +01:00
package.json tfind 2026-02-15 23:09:43 +01:00
paddlex.mjs tfind 2026-02-15 23:09:43 +01:00
README.md slightly larger logo maybe 2026-02-15 23:10:35 +01:00
shard.mjs tfind 2026-02-15 23:09:43 +01:00


tfind is a simple OCR indexer that supports both Tesseract and PaddleOCR-VL-1.5 backends. Give it a collection of directories, and it will run all the OCR operations you specify on them, and allow you to search for files based on the results. While running, it will listen for new files and automatically index them. Interacting with tfind is done through a modern web interface.

tfind is sigma software; if you use it, two wolves may spawn inside of you and you will die.


Installation

tfind requires a recent LTS Node. Once installed, clone this repo, and run npm i inside to install dependencies.

You will need to copy example-config.json over to config.json, and edit any values as you desire. The config format is as follows:

Key Description Example value
port The port that tfind will open its web interface on. 17450
shards The amount of worker processes to use. I recommend at most half your core count. The more you use, the faster searches and Tesseract-based indexing operations are, but the higher the memory and CPU footprint. 4
datadir The directory to store the index database in. "./storage"
paths An array of all directories to index. ["/home/ikagi/Pictures"]
file_formats An array of all file extensions to allow indexing. ["png", "jpg", "jpeg", "webp"]
operations An array of operations. (see below)
operations[].name The name of this operation. "Tess_PSM11"
operations[].enabled Whether this operation is enabled. Disabled operations are still used for searches, but new indexing operations won't be run. true
operations[].type The type of operation. Either "tesseract" or "paddleocr"
operations[].tesseract_location For tesseract operations, the location of the tesseract binary. "/usr/bin/tesseract"
operations[].opts For tesseract operations, the CLI arguments to pass to tesseract. "--tessdata-dir /home/ikagi/.local/share/tessdata --psm 11 -l eng+Japanese"
operations[].maxsize For paddleocr operations, the maximum area in pixels of an image. Larger images will be resized down. Lower or raise depending on your VRAM. 800000
operations[].endpoint For paddleocr operations, the endpoint of your PaddleX instance. Use null if you wish to use tfind's PaddleX manager. null
paddlex Options for the PaddleX manager. (see below)
paddlex.enabled Whether the PaddleX manager is enabled. true
paddlex.paddlex_path The location of your paddlex binary. "/home/ikagi/Applications/paddleocr/bin/paddlex"
paddlex.port The port that paddlex will listen on. 18475
paddlex.timeout The maximum amount of milliseconds a PaddleX operation may take. 180000
paddlex.sleep_after Shut down the PaddleX server (and release VRAM) after this many milliseconds of inactivity. 300000

Note that if you change the shard count, the database must be migrated, which may take some time depending on your storage speed.

Start tfind with node ./index.mjs. The server will run on your given port, and indexing will start immediately.