Search pictures within folders using OCR
  • JavaScript 81.7%
  • CSS 16.6%
  • HTML 1.7%
Find a file
2026-03-19 00:54:24 +01:00
public Add option to delete operations 2026-03-19 00:24:30 +01:00
util add scheduling 2026-03-15 20:57:11 +01:00
.gitignore tfind 2026-02-15 23:09:43 +01:00
config-manager.mjs searching, no indexing yet 2026-03-10 17:27:30 +01:00
db.mjs Add option to delete operations 2026-03-19 00:24:30 +01:00
example-config.json fix example config 2026-03-18 23:33:29 +01:00
index.mjs Add option to delete operations 2026-03-19 00:24:30 +01:00
middlewares.mjs tfind 2026-02-15 23:09:43 +01:00
migrations.mjs searching, no indexing yet 2026-03-10 17:27:30 +01:00
package-lock.json searching, no indexing yet 2026-03-10 17:27:30 +01:00
package.json ver bump 2026-03-15 20:59:27 +01:00
paddlex.mjs sqlite migration done! 2026-03-14 22:27:59 +01:00
queue.mjs convert gifs away in paddle queue 2026-03-19 00:54:24 +01:00
README.md add scheduling 2026-03-15 20:57:11 +01:00
watcher.mjs sqlite migration done! 2026-03-14 22:27:59 +01:00


tfind is a simple OCR indexer that supports both Tesseract and PaddleOCR-VL-1.5 backends. Give it a collection of directories, and it will run all the OCR operations you specify on them, and allow you to search for files based on the results. While running, it will listen for new files and automatically index them. Interacting with tfind is done through a modern web interface, that allows you to monitor indexing status and search the index.

tfind is sigma software; if you use it, two wolves may spawn inside of you and you will die.


Installation

tfind requires a recent LTS Node. Once installed, clone this repo, and run npm i inside to install dependencies.

You will need to copy example-config.json over to config.json, and edit any values as you desire. The config format is as follows:

Key Description Example value
port The port that tfind will open its web interface on. 17450
datadir The directory to store the index database in. "./storage"
paths An array of all directories to index. ["/home/ikagi/Pictures"]
file_formats An array of all file extensions to allow indexing. ["png", "jpg", "jpeg", "webp"]
operations An array of operations. (see below)
operations[].name The name of this operation. "Tess_PSM11"
operations[].enabled Whether this operation is enabled. Disabled operations are still used for searches, but new indexing operations won't be run. true
operations[].type The type of operation. Either "tesseract" or "paddleocr"
operations[].tesseract_location For tesseract operations, the location of the tesseract binary. "/usr/bin/tesseract"
operations[].opts For tesseract operations, the CLI arguments to pass to tesseract. "--tessdata-dir /home/ikagi/.local/share/tessdata --psm 11 -l eng+Japanese"
operations[].maxsize For paddleocr operations, the maximum area in pixels of an image. Larger images will be resized down. Lower or raise depending on your VRAM. 800000
operations[].endpoint For paddleocr operations, the endpoint of your PaddleX instance. Use null if you wish to use tfind's PaddleX manager. null
operations[].threads The amount of simultaneous operations allowed. 4
operations[].schedule An array of schedule definitions, which allow changing the amount of threads on a schedule (ie. to only run some operations while you're asleep). (see below)
operations[].schedule[].at The time at which this schedule will run, in Crontab syntax. * 20 * * *
operations[].schedule[].threads The amount of simultaneous operations allowed from this time onward. 0 will pause the queue. 0
paddlex Options for the PaddleX manager. (see below)
paddlex.enabled Whether the PaddleX manager is enabled. true
paddlex.paddlex_path The location of your paddlex binary. "/home/ikagi/Applications/paddleocr/bin/paddlex"
paddlex.port The port that paddlex will listen on. 18475
paddlex.timeout The maximum amount of milliseconds a PaddleX operation may take. 180000
paddlex.sleep_after Shut down the PaddleX server (and release VRAM) after this many milliseconds of inactivity. 300000

Start tfind with node ./index.mjs. The server will run on your given port, and indexing will start immediately.