recursive PHP function return Allowed memory size exhausted
up vote
-1
down vote
favorite
i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see
allowed memory size of bytes exhausted
this is my PHP code:
public $domain_name = 'www.example.com';
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';
return $this->dynamic_url2($page->href);
is my function correct ? how can i optimize it ?
thanks
php dom simple-html-dom
|
show 1 more comment
up vote
-1
down vote
favorite
i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see
allowed memory size of bytes exhausted
this is my PHP code:
public $domain_name = 'www.example.com';
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';
return $this->dynamic_url2($page->href);
is my function correct ? how can i optimize it ?
thanks
php dom simple-html-dom
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
I see. So you are experiencing infinite recursion because you are not passing the$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make$pages
available to subsequent calls.
– mickmackusa
Nov 10 at 3:33
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39
|
show 1 more comment
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see
allowed memory size of bytes exhausted
this is my PHP code:
public $domain_name = 'www.example.com';
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';
return $this->dynamic_url2($page->href);
is my function correct ? how can i optimize it ?
thanks
php dom simple-html-dom
i'm working on a small DOM project, to get dynamic links and static links. but my function take a lot of time to be executed and return an error as you can see
allowed memory size of bytes exhausted
this is my PHP code:
public $domain_name = 'www.example.com';
public function dynamic_url2($url = "http://www.example.com")
$pages = array();
$html = file_get_html($url);
foreach($html->find('a') as $page)
if(valid_url($page->href))
$parse_page = parse_url($page->href);
if($parse_page['host'] == $this->domain_name)
if(!in_array($page->href, $pages))
$pages = $page->href;
if(array_key_exists('query', $parse_page))
echo 'contain dynamic parameters : '. $page->href.'<br>';
else
echo 'not dynamic : '. $page->href.'<br>';
return $this->dynamic_url2($page->href);
is my function correct ? how can i optimize it ?
thanks
php dom simple-html-dom
php dom simple-html-dom
edited Nov 10 at 3:29
mickmackusa
21.4k83256
21.4k83256
asked Nov 10 at 3:06
Amine Bouhaddi
12
12
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
I see. So you are experiencing infinite recursion because you are not passing the$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make$pages
available to subsequent calls.
– mickmackusa
Nov 10 at 3:33
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39
|
show 1 more comment
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
I see. So you are experiencing infinite recursion because you are not passing the$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make$pages
available to subsequent calls.
– mickmackusa
Nov 10 at 3:33
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
I see. So you are experiencing infinite recursion because you are not passing the
$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages
available to subsequent calls.– mickmackusa
Nov 10 at 3:33
I see. So you are experiencing infinite recursion because you are not passing the
$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make $pages
available to subsequent calls.– mickmackusa
Nov 10 at 3:33
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39
|
show 1 more comment
1 Answer
1
active
oldest
votes
up vote
0
down vote
Apart from some minor adjustments that I made while testing, you only need to make $pages
modifiable (via &$pages
in the function declaration) and pass the $pages
array with every recursive call.
public $domain_name = 'https://www.example.html';
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";
else
//echo "<div>Invalid url: $href</div>";
return $pages;
var_export($this->dynamic_url2($this->domain_name));
Isvalid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my$dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Apart from some minor adjustments that I made while testing, you only need to make $pages
modifiable (via &$pages
in the function declaration) and pass the $pages
array with every recursive call.
public $domain_name = 'https://www.example.html';
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";
else
//echo "<div>Invalid url: $href</div>";
return $pages;
var_export($this->dynamic_url2($this->domain_name));
Isvalid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my$dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
add a comment |
up vote
0
down vote
Apart from some minor adjustments that I made while testing, you only need to make $pages
modifiable (via &$pages
in the function declaration) and pass the $pages
array with every recursive call.
public $domain_name = 'https://www.example.html';
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";
else
//echo "<div>Invalid url: $href</div>";
return $pages;
var_export($this->dynamic_url2($this->domain_name));
Isvalid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my$dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
add a comment |
up vote
0
down vote
up vote
0
down vote
Apart from some minor adjustments that I made while testing, you only need to make $pages
modifiable (via &$pages
in the function declaration) and pass the $pages
array with every recursive call.
public $domain_name = 'https://www.example.html';
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";
else
//echo "<div>Invalid url: $href</div>";
return $pages;
var_export($this->dynamic_url2($this->domain_name));
Apart from some minor adjustments that I made while testing, you only need to make $pages
modifiable (via &$pages
in the function declaration) and pass the $pages
array with every recursive call.
public $domain_name = 'https://www.example.html';
public function dynamic_url2($url, &$pages = )
//echo "<div>Crawling $url</div>";
$dom = new DOMDocument;
libxml_use_internal_errors(true); // for malformed html warning suppression
$dom->loadHTML(file_get_contents($url)); // this doesn't account for relative urls
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a") as $a)
$href = $a->getAttribute('href');
//echo "<div>Found $href @ $url</div>";
if (valid_url($href))
$parsed = parse_url($href);
if ($parsed['host'] == $this->domain_name && !in_array($href, $pages))
$pages = $href;
//echo "<div>$href is " , (array_key_exists('query', $parsed) ? '' : 'not ') , 'dynamic</div>';
$this->dynamic_url2($href, $pages);
else
//echo "<div>Ignored url: $href</div>";
else
//echo "<div>Invalid url: $href</div>";
return $pages;
var_export($this->dynamic_url2($this->domain_name));
edited Nov 10 at 5:14
answered Nov 10 at 4:43
mickmackusa
21.4k83256
21.4k83256
Isvalid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my$dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
add a comment |
Isvalid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my$dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)
– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
Is
valid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)– mickmackusa
Nov 10 at 4:56
Is
valid_url
Drupal's function: api.drupal.org/api/drupal/includes%21common.inc/function/… ? I suppose there should also be a check that the headers of new urls give a good response so that my $dom->load()
call doesn't bark. (stackoverflow.com/a/2280413/2943403)– mickmackusa
Nov 10 at 4:56
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
Okay, I just spent far too long in the rabbit hole while correcting relative and absolute paths based on the parent url using regex and such. I'm not going to do that -- it goes way too far for this question. Suffice to say, you can eliminate infinite recursion by implementing the '&$pages` technique that I have described. I'm done.
– mickmackusa
Nov 10 at 6:41
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235680%2frecursive-php-function-return-allowed-memory-size-exhausted%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
i want to check all the website links ( my function is like a spider ) i'm starting from the index page ( i wan't to check all website links ) for example from the index page i open page2 and i open all the links and i go to page3 and i open all the links .. ( i want to check all the website links )
– Amine Bouhaddi
Nov 10 at 3:29
I see. So you are experiencing infinite recursion because you are not passing the
$pages
data to the next function call -- you are only passing the new url. So if you "crawl" two pages that refer to each other -- bonk -- you are stuck in an indefinite loop. You must make$pages
available to subsequent calls.– mickmackusa
Nov 10 at 3:33
how can i do it can you give me an example please thanks brother
– Amine Bouhaddi
Nov 10 at 3:34
Some people post their ideas straight away. I don't post an answer until I have tested it first. This is why I am slow to answer and I rarely enjoy the spoils of quick answer upvotes. I'll set up a test for myself and if you aren't already helped, I'll post something.
– mickmackusa
Nov 10 at 3:35
ok thank you so much ;)
– Amine Bouhaddi
Nov 10 at 3:39