Tags down


scraping src dom with xpath

By : AlexG
Date : May 03 2020, 08:53 AM

There are images I want to scrapting, by using DOM xPath as scraping tool. But xPath can't find the src attributes although I can see the attributes in the sources code of the website.

Normally I should fine the image's attribute, but xPath returns empty.

$html = pageContent($link."photo");
$path = new \DOMXPath($html);
$route = $path->query("//ul[@class='categoryBox']//li[@class='photoList_item']/a/img");
foreach($route as $val){
    $images[] = trim($val->getAttribute("src"));


the website is: https://hana-yume.net/174/photo/ you can check the path here.

What are the possible reasons?

And if you need to see pageContent() function here:

function pageContent(String $url): \DOMDocument
    $html = cache()->rememberForever($url, function () use ($url) {

        $opts = array(
            "http" => array(
            "header"=>"Content-Type: text/html; charset=utf-8"

        $context = stream_context_create($opts);
        $result = @file_get_contents($url,false,$context);
        return $result;


    $parser = new \DOMDocument();
    $parser->loadHTML($html = mb_convert_encoding($html,"HTML-ENTITIES", "ASCII, JIS, UTF-8, EUC-JP, SJIS"));
    return $parser;
Answer :

you need to target it in another way.

If you carefully examine:

<a data-lightbox="tile10" href="/uploads/hall_photo/174/1/0/main_0.jpg?1566895565" onClick="ga('send', 'event', 'kanto', 'hall/photo', 'photo/1_0_main0_174', 1, {nonInteraction: true});">
    <img alt="アニヴェルセル 柏 挙式会場" width="750" height="330" class="lazy" data-original="/uploads/hall_photo/174/1/0/main_0_s.jpg?1566895565" />
    <noscript><img alt="アニヴェルセル 柏 挙式会場" width="750" height="330" src="/uploads/hall_photo/174/1/0/main_0_s.jpg?1566895565" /></noscript>

The <img> tag isn't static, meaning on load its not present but manipulated by JS. But as you can see, the source is still there.

So just target the data attribute instead:

$html = pageContent('https://hana-yume.net/174/photo/'); $path = new \DOMXPath($html); $images = []; $route = $path->query("//ul[@class='categoryBox']//li[contains(@class, 'photoList_item')]/a/img"); foreach($route as $val){ $images[] = trim($val->getAttribute('data-original')); }

Share : facebook icon twitter icon
Related Posts Related Posts :
  • clear array after form submit
  • pass url variable to javascript
  • access data using api
  • div position fixed css
  • add horizontal line between two div
  • disable toggle button css
  • css fixed line height
  • background image not working
  • table cell border
  • text line break css
  • overlay div on div
  • text direction rtl css
  • javascript convert string to an object
  • bind inner div click to outer div
  • jquery ajax oncomplete
  • how to use promise in for loop
  • jquery get element using variable
  • javascript send textbox value
  • pass div class content to another class or id
  • css new line after element
  • css calc not working properly
  • link disabled jquery
  • sql join table to subquery
  • MS Access SQL Issue with OR AND Operators
  • set textarea value using php
  • php merge two arrays into one
  • php form with google captcha
  • update current row in php
  • php date format mysql
  • mysql pdo select query
  • php function return an array
  • PHP mysqldump database to sql file
  • display image query
  • php create array using foreach loop
  • associative array php
  • group_concat in mysql laravel
  • php array_push empty
  • mysql on update cascade not working
  • pdo insert prepared statement
  • mysql insert ignore vs on duplicate key
  • mysql insert into existing row
  • mysql multi insert query
  • mysql group by show all rows
  • php change profile image
  • combine multiple arrays into one array php
  • how to sum values in an array
  • how to get sum of the total time
  • How to Create Facebook live stream without notification using API ?
  • on button click make div visible
  • css affect another element on hover
  • shadow
    Privacy Policy - Terms - Contact Us © bighow.org