處理Word文檔
要讀取word文檔,可使用python中的docx
模塊。 首先安裝docx
,如下所示。 然後編寫一個程序,使用docx
模塊中的不同函數按段落讀取整個文件。
使用以下命令將docx
模塊放入程序環境中。
pip install docx
在下面的示例中,通過將每個行附加到段落並最終打印出所有段落文本來讀取word文檔的內容。
import docx
def readtxt(filename):
doc = docx.Document(filename)
fullText = []
for para in doc.paragraphs:
fullText.append(para.text)
return '\n'.join(fullText)
print (readtxt('path\yiibaispoint.docx'))
當運行上面的程序時,我們得到以下輸出 -
Yiibai Point originated from the idea that there exists a class of readers who respond
better to online content and prefer to learn new skills at their own pace from the comforts
of their drawing rooms.
The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated,
we worked our way to adding fresh tutorials to our repository which now proudly flaunts
a wealth of tutorials and allied articles on topics ranging from programming languages
to web designing to academics and much more.
讀取個別段落
可以使用paragraph
屬性從word文檔中讀取特定段落。 在下面的例子中,只讀取word文檔中的第二段。
import docx
doc = docx.Document('path\Yiibaispoint.docx')
print len(doc.paragraphs)
print doc.paragraphs[2].text
當運行上面的程序時,我們得到以下輸出 -
The journey commenced with a single tutorial on HTML in 2006 and elated by the response
it generated, we worked our way to adding fresh tutorials to our repository
which now proudly flaunts a wealth of tutorials and allied articles on topics
ranging from programming languages to web designing to academics and much more.