A colleague called me over today for some help with a memory usage issue in his PHP script. The script was performing the basic (but critical) task of importing data, pulling it in from MySQL in large chunks, then exporting it elsewhere. He was receiving the wonderful “Fatal error: Allowed memory size of XXXX bytes exhausted (tried to allocate YY bytes)” error message.
The code was following a basic flow:
10 Get stuff from the database (large array of arrays)
20 Iterate over it, doing stuff
30 Goto 10
I fixed the memory usage exceeded problem with an unset(), at the end of a loop.
Take a look at this sample program: loopOverStuff(); function loopOverStuff() { $var = null; for($i = 0; $i < 10; $i++) { $var = getData(); //Do stuff } } function getData() { $a = str_repeat(“This string is exactly 40 characters long”, 20000); return $a; }
The important thing to realize is that PHP will end up needing around twice as much memory as getData() takes. The problem is the line $var = getData(). The first time it is called $var is incredibly small, it’s clobbered and the return value of getData()is assigned to it. The second time through the loop $var still holds the value from the previous iteration, so while getData() is executing you’re maintaining the original data (in $var), and a whole new set (being built in getData()).
Fixing this is incredibly easy: function loopOverStuff() { $var = null; for($i = 0; $i < 10; $i++) { $var = stealMemories(); //Do Stuff unset($var); } }
This way we avoid the duplication in memory of those values on that line. To see this happen in more detail take a look at this sample script with ouptut: memory-usage-example.php. This isn’t critical, except when it is. Once loopOverStuff() completes, and the function ends. The memory is released back to the rest of PHP automatically. You’ll only run into problems where Other Stuff + (2 * memory needed in loop) > Memory Limit. There are better architectures available to avoid the issue entirely (like not storing everything in the array, just to iterate over it later) but they’re an issue for a different post.For a very simple base case demonstration of the issue take a look at the simple example.
Comments »
php -dxdebug.auto_trace=1 -dxdebug.trace_format=1 paul.php
You can clearly see this behaviour as well.
Thanks
The actual data sets we're dealing with are greater than 20MB, reducing peak memory usage by that value has real (positive) effects on the system.
True, using unbuffered queries, and avoiding buffering within PHP itself would be a far more efficient architecture (the developer in question will be going that way in version 1.2). Reducing peak memory usage now is far from "micro".
That said, it seems like unset() vs "" or null would fit in the micro category .
The article was talking exhausting the memory allowed to the PHP script, not all the memory on a server. The fact that opcode caching uses memory outside of the PHP script is not really relevant.
function loopOverStuff() {
for ($i = 0; $i < 10; $i++) {
doStuff(getData());
}
}
Not only will this have the same effect as your proposed solution, but it has the added benefit of making the overall code more readable.
As a general rule of thumb, avoiding temporary variables by replacing them with function calls makes code simpler and less error-prone.
C, C++, Perl... have it, PHP doesn't. Why PHP cannot have "use strict" like Perl?
Uff, php has more and more basic badly things, e.g. why function()[2] doesn't work when function()->property works?
I've never encountered a scenario where something like this caused any noticeable memory leakage, but it this is great to know.
I am probably going to blog about this on my site. Good job!
i have tested this issue like this:
$loops = 10000000;
$startPost = microtime(true);
for ($i = 0; $i < $loops; $i++) {
$test = 1;
}
$endPost = microtime(true);
$startPre = microtime(true);
for ($i = 0; $i < $loops; ++$i) {
$test = 1;
}
$endPre = microtime(true);
echo $loops.' post icrements needed: '.($endPost - $startPost).'';
echo $loops.' pre icrements needed: '.($endPre - $startPre).'';
10000000 post icrements needed: 17.445150852203
10000000 pre icrements needed: 13.529909849167
This would not take much influence at your problem but should be considered generally: use post increment only if really needed
no its a increase of 4 seconds because with the parameter of microtime() it will return seconds.
But you are right, its a kind of micro optimisation.
But why using a function (return old value and then increment) that will not be used this way? The interpeter has to copy the value, increment the original and, when incrementation finished, return the copy.
@paul, i'm interested in the other architectures, you have mentioned for resolving the main problem.
greetings
chasm
(17.445150852203 - 13.529909849167) / 10000000
The only reason you can see the difference is because you made a loop of 10 million iterations. If you have to do any kind of work inside the loop, then it will surely dwarf the difference between post and pre increment.
You're right of course, but my point being that this only matters for a very few and rather abnormal edge cases.
This post was mentioned on Twitter by planetphp: Memory usage in PHP - Paul Reinheimer http://blog.preinheimer.com/index.php?/archives/354-Memory-usage-in-PHP.html
Tracked: Mar 20, 01:16