recursive PHP function return Allowed memory size exhausted









up vote
-1
down vote

favorite












i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see



allowed memory size of bytes exhausted


this is my PHP code:



public $domain_name = 'www.example.com'; 
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';

return $this->dynamic_url2($page->href);






is my function correct ? how can i optimize it ?



thanks










share|improve this question























  • i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
    – Amine Bouhaddi
    Nov 10 at 3:29











  • I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
    – mickmackusa
    Nov 10 at 3:33











  • how can i do it can you give me an example please thanks brother
    – Amine Bouhaddi
    Nov 10 at 3:34










  • Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
    – mickmackusa
    Nov 10 at 3:35










  • ok thank you so much ;)
    – Amine Bouhaddi
    Nov 10 at 3:39














up vote
-1
down vote

favorite












i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see



allowed memory size of bytes exhausted


this is my PHP code:



public $domain_name = 'www.example.com'; 
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';

return $this->dynamic_url2($page->href);






is my function correct ? how can i optimize it ?



thanks










share|improve this question























  • i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
    – Amine Bouhaddi
    Nov 10 at 3:29











  • I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
    – mickmackusa
    Nov 10 at 3:33











  • how can i do it can you give me an example please thanks brother
    – Amine Bouhaddi
    Nov 10 at 3:34










  • Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
    – mickmackusa
    Nov 10 at 3:35










  • ok thank you so much ;)
    – Amine Bouhaddi
    Nov 10 at 3:39












up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see



allowed memory size of bytes exhausted


this is my PHP code:



public $domain_name = 'www.example.com'; 
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';

return $this->dynamic_url2($page->href);






is my function correct ? how can i optimize it ?



thanks










share|improve this question















i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see



allowed memory size of bytes exhausted


this is my PHP code:



public $domain_name = 'www.example.com'; 
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';

return $this->dynamic_url2($page->href);






is my function correct ? how can i optimize it ?



thanks







php dom simple-html-dom






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 3:29









mickmackusa

21.4k83256




21.4k83256










asked Nov 10 at 3:06









Amine Bouhaddi

12




12











  • i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
    – Amine Bouhaddi
    Nov 10 at 3:29











  • I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
    – mickmackusa
    Nov 10 at 3:33











  • how can i do it can you give me an example please thanks brother
    – Amine Bouhaddi
    Nov 10 at 3:34










  • Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
    – mickmackusa
    Nov 10 at 3:35










  • ok thank you so much ;)
    – Amine Bouhaddi
    Nov 10 at 3:39
















  • i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
    – Amine Bouhaddi
    Nov 10 at 3:29











  • I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
    – mickmackusa
    Nov 10 at 3:33











  • how can i do it can you give me an example please thanks brother
    – Amine Bouhaddi
    Nov 10 at 3:34










  • Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
    – mickmackusa
    Nov 10 at 3:35










  • ok thank you so much ;)
    – Amine Bouhaddi
    Nov 10 at 3:39















i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29





i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29













I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
– mickmackusa
Nov 10 at 3:33





I see. So you are experiencing infinite recursion because you are not passing the $pages data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages available to subsequent calls.
– mickmackusa
Nov 10 at 3:33













how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34




how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34












Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35




Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35












ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39




ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39












1 Answer
1






active

oldest

votes

















up vote
0
down vote













Apart from some minor adjustments that I made while testing, you only need to make $pages modifiable (via &$pages in the function declaration) and pass the $pages array with every recursive call.



public $domain_name = 'https://www.example.html'; 
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";

else
//echo "<div>Invalid url: $href</div>";


return $pages;

var_export($this->dynamic_url2($this->domain_name));





share|improve this answer






















  • Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
    – mickmackusa
    Nov 10 at 4:56










  • Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
    – mickmackusa
    Nov 10 at 6:41










Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235680%2frecursive-php-function-return-allowed-memory-size-exhausted%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Apart from some minor adjustments that I made while testing, you only need to make $pages modifiable (via &$pages in the function declaration) and pass the $pages array with every recursive call.



public $domain_name = 'https://www.example.html'; 
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";

else
//echo "<div>Invalid url: $href</div>";


return $pages;

var_export($this->dynamic_url2($this->domain_name));





share|improve this answer






















  • Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
    – mickmackusa
    Nov 10 at 4:56










  • Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
    – mickmackusa
    Nov 10 at 6:41














up vote
0
down vote













Apart from some minor adjustments that I made while testing, you only need to make $pages modifiable (via &$pages in the function declaration) and pass the $pages array with every recursive call.



public $domain_name = 'https://www.example.html'; 
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";

else
//echo "<div>Invalid url: $href</div>";


return $pages;

var_export($this->dynamic_url2($this->domain_name));





share|improve this answer






















  • Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
    – mickmackusa
    Nov 10 at 4:56










  • Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
    – mickmackusa
    Nov 10 at 6:41












up vote
0
down vote










up vote
0
down vote









Apart from some minor adjustments that I made while testing, you only need to make $pages modifiable (via &$pages in the function declaration) and pass the $pages array with every recursive call.



public $domain_name = 'https://www.example.html'; 
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";

else
//echo "<div>Invalid url: $href</div>";


return $pages;

var_export($this->dynamic_url2($this->domain_name));





share|improve this answer














Apart from some minor adjustments that I made while testing, you only need to make $pages modifiable (via &$pages in the function declaration) and pass the $pages array with every recursive call.



public $domain_name = 'https://www.example.html'; 
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";

else
//echo "<div>Invalid url: $href</div>";


return $pages;

var_export($this->dynamic_url2($this->domain_name));






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 10 at 5:14

























answered Nov 10 at 4:43









mickmackusa

21.4k83256




21.4k83256











  • Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
    – mickmackusa
    Nov 10 at 4:56










  • Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
    – mickmackusa
    Nov 10 at 6:41
















  • Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
    – mickmackusa
    Nov 10 at 4:56










  • Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
    – mickmackusa
    Nov 10 at 6:41















Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56




Is valid_url Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load() call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56












Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41




Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235680%2frecursive-php-function-return-allowed-memory-size-exhausted%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Darth Vader #20

How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

Ondo