WinForms crustware for creating a searchable index of images by text content
Find a file
2026-02-16 23:08:06 +00:00
ocrvaries Add project files. 2026-02-16 22:42:41 +00:00
.gitattributes Add .gitattributes and .gitignore. 2026-02-16 22:42:39 +00:00
.gitignore Add .gitattributes and .gitignore. 2026-02-16 22:42:39 +00:00
ocrvaries.slnx Add project files. 2026-02-16 22:42:41 +00:00
README.md Add readme 2026-02-16 23:08:06 +00:00

OCRvaries

Quick and dirty C# WinForms application that utilises Tesseract to create a searchable index of text content for images in a specified folder. Created in lieu of tfind, then uploaded after tfind.

If tfind is the real deal, then OCRvaries is the guy who comes over to fix your pipes when your mate says he knows a guy. It's very "black-box" ⸻ that is to say I botched it together to work one way and you will use it that way and like it.

Requires Tesseract trained models (though only the English one really) and expects them to be present in "tessdata" folder within the working directory. A symlink would work too I imagine. Find the models at https://github.com/tesseract-ocr/tessdata/.

Constraints

  • Probably only works on Windows.
  • WebP files are not supported.
  • Will consume entire CPU whilst updating index whether you like it or not.
  • Items are indexed (each!) by absolute path, so if you move the folder you'll need to tinker inside the SQLite database yourself if you don't want to spend hours re-indexing it. A simple UPDATE WHERE query should do the trick.
  • Index is final and you cannot force a rescan within the program (yet.)
  • Deleted files will retain their index and must be manually cleared from the database.

Usage

Once the Tesseract models are obtained per above, launch the program and then File->Open Folder. It will scan recursively whether you like it or not. The operation could take several hours depending on CPU horsepower and size of the folder. Once complete, it will poop out a list which you can then laggily search through using the box provided.

Tesseract is not perfect and even on clear text it is prone to mistakes such as confusing lower case L and upper case i, rn and m, and occasionally just puking up complete garbage. The search function right now is plain and simple search (minus case matching), so bear that in mind. It will also pick up on any instance of your letter sequence, so for example, when you want to search "ppy", aff-and-or-suffix it with spaces so you don't get every match for "happy" and "Flappyzor" as well.