Full urls of images of a given page on Wikipedia (only those I see on the page)


HOME ยป Web Design

I'd want to extract all full urls of images of "Google"'s page on Wikipedia


I have tried with:


http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json

but, in this way, I got also not google-related images, such as:


http://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/commons/f/fe/Cr

Related to : Full urls of images of a given page on Wikipedia (only those I see on the page)
Links in master page with tilde URLs give 404 depending on the page
Web Design

Here's an odd one.


I have a master page with links to other pages in the site. Those links use tilde paths (like "~/dir1/page2.aspx"). On most of the pages in the site that use this master page, there is no problem. The problem only seems to be on a few pages that use the master page, the links are VERY wrong, it tries to use the ~ as part of the link (so they are "http://server.domain.com/~/dir1/page2.aspx"). It's as if it is treating the tilde as a literal under certain circumstances.


Any suggestions?


Thanks!


J.Ja


How to avoid part of 13 column in next page i want to get every page has full value without split into next page?
Web Design


How to avoid part of 13 column in next page i want to get every page has full value without split into next page?


How to get all URLs in a Wikipedia page
Web Design

It seems like Wikipedia API's definition of a link is different from URL? I'm trying to use the API to return all the urls in a specific wiki page.


I have been playing around with this query that I found from this page under generators and redirects.


Full urls of images of a given page on Wikipedia (only those I see on the page)
Web Design

I'd want to extract all full urls of images of "Google"'s page on Wikipedia


I have tried with:


http://en.wikipedia.org/w/api.php?action=query&titles=Google&generator=images&gimlimit=10&prop=imageinfo&iiprop=url|dimensions|mime&format=json

but, in this way, I got also not google-related images, such as:


http://upload.wikimedia.org/wikipedia/en/a/a4/Flag_of_the_United_States.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
http://upload.wikimedia.org/wikipedia/commons/f/fe/Cr
Determine if a URL is in the header/footer of a web page given URL, page DOM, parent URL and other page URLs
Web Design

Given a URL, the URL of the webpage that first URL is on, the DOM of the webpage, and a list of the rest of the URLs on the webpage how can I reliably determine if the URL is in the header/footer of the page or if it's in neither?


I'm using C#/.NET.


I know that no solution is perfect since webpages are not semantically expressed and also because some websites/pages specifically obfuscate their pages, but I would like to build some logic that would work for say 75% of webpages.


Also, are there other pieces of information that would be helpful to determine the location of the URL in the page?


wikipedia page-to-page links by pageid
Web Design

What?:
I'm trying to get page-to-page link map (matrix) of wikipedia pages by page_id in following format:


from1 to1 to2 to3 ...
from2 to1 to2 to3 ...
...

Why?:
I'm looking for data set (pages from wikipedia) to try out PageRank.


Problem:
At dumps.wikimedia.org it is possible to download pages-articles.xml which is XML with this kind of format:


<page>
<title>...</title>
<id>...</id> // pageid
<text>...</text>
</page>

that I w


Privacy Policy - Copyrights Notice - Feedback - Report Violation - RSS 2017 © bighow.org All Rights Reserved .