How to tell Nokogiri when parsing a document not to convert it a different encoding (in my case not to convert &paund; to to anything else)


HOME ยป Web Design

How do I tell Nokogiri not to convert a document to a different encoding, in my case not to convert &paund; to to anything else?


I have a file containing:


<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<span>&pound;</span>
</body>
</html>

I parse it with Nokogiri:


d = Nokogiri::HTML.parse(open('/tmp/in.html', 'r'))

If I print document "d" I get:


<!DOCTYPE

Related to : How to tell Nokogiri when parsing a document not to convert it a different encoding (in my case not to convert &paund; to to anything else)
How to tell Nokogiri when parsing a document not to convert it a different encoding (in my case not to convert &paund; to to anything else)
Web Design

How do I tell Nokogiri not to convert a document to a different encoding, in my case not to convert &paund; to to anything else?


I have a file containing:


<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body>
<span>&pound;</span>
</body>
</html>

I parse it with Nokogiri:


d = Nokogiri::HTML.parse(open('/tmp/in.html', 'r'))

If I print document "d" I get:


<!DOCTYPE
Parsing HTML with a weird encoding with Nokogiri
Web Design

I can't use XPath because the encoding gets weird. I hoped you could help me out of this trouble.


require "Nokogiri"
require "open-uri"
link = "http://www.arla.dk/Services/SearchService.asmx/RecipeResult?q=allRecipe&paging=6&include=&exclude=&area=recipeSearch&languageBranch=da"
doc = Nokogiri::HTML(open(link))
doc.xpath("//h2")

The xpath method returns an empty array. It looks like the document has not been parsed correct. I think it is due to the file being parsed contains the encoded characters:


&lt;strong&gt;Frokost til 8&lt;/strong&gt;
Convert a Nokogiri document to a Ruby Hash
Web Design

Is there an easy way to convert a Nokogiri XML document to a Hash?


Something like Rails' Hash.from_xml.


C# are they the same: Encoding.UTF8.GetBytes & Convert.FromBase64String?
Web Design

confused by the encoding stuff. are Encoding.UTF8.GetBytes and Convert.FromBase64String are the same?


How can Nokogiri extract the Charset encoding of a scraped HTML document?
Web Design

Found a snippet that works for HTML Simple Dom Parser.


$el=$html->find('meta[http-equiv=Content-Type]',0);
$fullvalue = $el->content;
preg_match('/charset=(.+)/', $fullvalue, $matches);
echo $matches[1];

Can somebody help me to convert this so that this suits for Ruby and Nokogiri?


Parsing the XML document using Nokogiri
Web Design
I am new to Nokogiri, I am trying to parse a rss feed from digital trends i am unable to get the attributes for example i need to get url of the image inside the <enclosure> tag How can I do this? <item> <title> Xbox One returns to Best Buy with five new holiday bundles</title><link> http://www.digitaltrends.com/gaming/xbox-one-returns-best-buy-five-new-holiday- bundles/</link><pubDate>Thu, 12 Dec 2013 23:59:20 +0000</pubDate>

Privacy Policy - Copyrights Notice - Feedback - Report Violation - RSS 2017 © bighow.org All Rights Reserved .