Ruby解析XML(REXML)

XML是可擴展的標記語言，如HTML。它允許程序員開發可以被其他應用程序讀取的應用程序，而不管使用的是什麼操作系統和開發語言。

它可用於保存中小型數據量，不用在後端有任何基於SQL的技術。

REXML是一個純Ruby的XML處理器。它表示一個完整的XML文檔，包括PI，doctype等。一個XML文檔有一個可以由root()訪問的單個子對象。如果想要爲創建的文檔提供XML聲明，則必須自己添加一個。 REXML文檔不爲您寫入默認聲明。

REXML靈感來自於Java的Electric XML庫。它的API易於使用，體積小巧，並遵循Ruby方法的方法命名和代碼流。

它支持樹和流文檔解析。 Steam解析比樹解析快1.5倍。但是，在流解析中無法訪問某些功能(如XPath)。

REXML功能：

它100％使用Ruby語言編寫。
它包含少於2000行代碼，因此更輕巧。
它的方法和類很容易理解。
它隨Ruby安裝一起提供，不需要單獨安裝。
它用於DOM和SAX解析。

解析XML和訪問元素

現在，從解析XML文檔開始，下面是一個示例代碼：

require "rexml/document"  
file = File.new( "trial-1.xml" )  
doc = REXML::Document.new file

在上面的代碼中，第3行用於解析提供的文件。

示例

require 'rexml/document'   
# file ： rexml-example.rb

include REXML   

file = File.new("trial-1.xml")   
doc = Document.new(file)   
puts docs

在上面的代碼中，require語句加載了REXML庫。然後包括REXML表示不必使用像REXML:: Document這樣的名稱。創建了trial-1.xml文件。並將文檔顯示在屏幕上。

F:\worksp\ruby>ruby rexml-example.rb
<?xml version='1.0' encoding='UTF-8'?>
<root>
        Hello, this is first REXML use.
</root>

F:\worksp\ruby>

Document.new方法將IO，String對象或Document作爲參數。此參數指定必須讀取XML文檔的內容。

如果Document構造函數使用Document作爲參數，則將其所有元素節點克隆到新的Document對象。如果構造函數接受一個String參數，則字符串將包含一個XML文檔。

XML 和「Here Document」

這裏文檔(「Here Document」)是一種指定文本塊，保留換行符，空格或使用文本標識的方法。

使用「<<」命令後跟令牌字符串構建文檔。

在Ruby中，「<<」和令牌字符串之間不應有空格。

實例

#!/usr/bin/env ruby   
# file ： rexml-heredoc.rb

require 'rexml/document'   
include REXML   

info = <<XML   
<info>   
 <name>Maxsu</name>   
 <street>人民大道</street>   
 <city>海口</city>   
 <contact>9854126575</contact>   
 <country>中國</country>   
</info>   
XML   

document = Document.new( info )   
puts document

執行上面代碼，得到以下結果 -

F:\worksp\ruby>ruby rexml-heredoc.rb
<info>
 <name>Maxsu</name>
 <street>人民大道</street>
 <city>海口</city>
 <contact>9854126575</contact>
 <country>中國</country>
</info>

F:\worksp\ruby>

在這裏，在這裏使用文檔信息。包括<<EOF和EOF之間的所有字符都是信息的一部分。

對於XML解析示例，使用以下XML文件代碼作爲輸入：

執行上面代碼，得到以下結果 -

#!/usr/bin/ruby -w   

require 'rexml/document'
# file : rexml-newxml.rb

include REXML   
xmlfile = File.new("trial-2.xml")   
xmldoc = Document.new(xmlfile)   

# Now get the root element   
root = xmldoc.root   
puts "Root element : " + root.attributes["shelf"]   

# This will output all the cloth titles.   
xmldoc.elements.each("collection/clothing"){   
   |e| puts "cloth Title : " + e.attributes["title"]   
}   

# This will output all the cloth types.   
xmldoc.elements.each("collection/clothing/type") {   
   |e| puts "cloth Type : " + e.text   
}   

# This will output all the cloth description.   
xmldoc.elements.each("collection/clothing/description") {   
   |e| puts "cloth Description : " + e.text   
}

Ruby XML DOM類似的解析

這裏演示以樹形解析XML數據。將以上文件trial.xml代碼作爲輸入。

#!/usr/bin/ruby -w   

require 'rexml/document'   
include REXML   

xmlfile = File.new("trial.xml")   
xmldoc = Document.new(xmlfile)   

# Now get the root element   
root = xmldoc.root   
puts "Root element : " + root.attributes["shelf"]   

# This will output all the cloth titles.   
xmldoc.elements.each("collection/clothing"){   
   |e| puts "cloth Title : " + e.attributes["title"]   
}   

# This will output all the cloth types.   
xmldoc.elements.each("collection/clothing/type") {   
   |e| puts "cloth Type : " + e.text   
}   

# This will output all the cloth description.   
xmldoc.elements.each("collection/clothing/description") {   
   |e| puts "cloth Description : " + e.text   
}

Ruby XML以SAX類似的解析

這裏演示以流的方式解析XML數據。將文件trial.xml代碼作爲輸入。在這裏將定義一個偵聽器類，其方法將被解析器的回調目標。

建議不要對小文件使用類似SAX的解析。

#!/usr/bin/ruby -w   

require 'rexml/document'   
require 'rexml/streamlistener'   
include REXML   

class MyListener   
  include REXML::StreamListener   
  def tag_start(*args)   
    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"   
  end   

  def text(data)   
    return if data =~ /^\w*$/     # whitespace only   
    abbrev = data[0..40] + (data.length > 40 ? "..." : "")   
    puts "  text   :   #{abbrev.inspect}"   
  end   
end   

list = MyListener.new   
xmlfile = File.new("trial.xml")   
Document.parse_stream(xmlfile, list)

Ruby教程

控制語句

Ruby核心

Ruby高級部分