extract text from any pdf or image · invisible-text searchable pdf · runs locally · tesseract.js
first use of a language pulls the trained data (~10–15 mb) from the tesseract cdn · subsequent runs use the cached copy
render scale for each pdf page · higher = sharper text, more memory