Reading from a pdf is actually quite an easy task with Python. If the PDF is of course “readable”, e.g. made from a word processor. The first thing to do is to install Tika and Java:
1 2 3 |
pip install tika <em>conda install</em> -c conda-forge <em>tika # as alternative</em> java --version #this one checks the installed java version in the command prompt |
Having this, the…