Binary package “libtika-java” in ubuntu oracular

Apache Tika - content analysis toolkit

 The Apache Tika toolkit detects and extracts metadata and text content
 from various documents (PPT, CSV, PDF, MP3, HTML and more) using existing
 parser libraries. Tika unifies these parsers under a single interface to
 allow you to easily parse over a thousand different file types. Tika is
 useful for search engine indexing, content analysis, translation, and much
 more.