TIKA文件格式
Tika支持的文件格式
下面的表顯示了Tika支持的文件格式。
文件格式
類庫
Tika中的類
XML
org.apache.tika.parser.xml
XMLParser
HTML
org.apache.tika.parser.htmll and it uses Tagsoup Library
HtmlParser
MS-Office compound document Ole2 till 2007 ooxml 2007 onwards
org.apache.tika.parser.microsoft
org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi library
OfficeParser(ole2)
OOXMLParser(ooxml)
OpenDocument Format openoffice
org.apache.tika.parser.odf
OpenOfficeParser
portable Document Format(PDF)
org.apache.tika.parser.pdf and this package uses Apache PdfBox library
PDFParser
Electronic Publication Format (digital books)
org.apache.tika.parser.epub
EpubParser
Rich Text format
org.apache.tika.parser.rtf
RTFParser
Compression and packaging formats
org.apache.tika.parser.pkg and this package uses Common compress library
PackageParser and CompressorParser and its sub-classes
Text format
org.apache.tika.parser.txt
TXTParser
Feed and syndication formats
org.apache.tika.parser.feed
FeedParser
Audio formats
org.apache.tika.parser.audio and org.apache.tika.parser.mp3
AudioParser MidiParser Mp3- for mp3parser
Imageparsers
org.apache.tika.parser.jpeg
JpegParser-for jpeg images
Videoformats
org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats
Mp4parser FlvParser
java class files and jar files
org.apache.tika.parser.asm
ClassParser CompressorParser
Mobxformat (email messages)
org.apache.tika.parser.mbox
MobXParser
Cad formats
org.apache.tika.parser.dwg
DWGParser
FontFormats
org.apache.tika.parser.font
TrueTypeParser
executable programs and libraries
org.apache.tika.parser.executable
ExecutableParser