Friday, March 19. 2010Memory usage in PHPA colleague called me over today for some help with a memory usage issue in his PHP script. The script was performing the basic (but critical) task of importing data, pulling it in from MySQL in large chunks, then exporting it elsewhere. He was receiving the wonderful "Fatal error: Allowed memory size of XXXX bytes exhausted (tried to allocate YY bytes)" error message. The code was following a basic flow: I fixed the memory usage exceeded problem with an unset(), at the end of a loop. Take a look at this sample program: loopOverStuff(); function loopOverStuff() { $var = null; for($i = 0; $i < 10; $i++) { $var = getData(); //Do stuff } } function getData() { $a = str_repeat("This string is exactly 40 characters long", 20000); return $a; } The important thing to realize is that PHP will end up needing around twice as much memory as getData() takes. The problem is the line $var = getData(). The first time it is called $var is incredibly small, it's clobbered and the return value of getData()is assigned to it. The second time through the loop $var still holds the value from the previous iteration, so while getData() is executing you're maintaining the original data (in $var), and a whole new set (being built in getData()). Fixing this is incredibly easy: function loopOverStuff() { $var = null; for($i = 0; $i < 10; $i++) { $var = stealMemories(); //Do Stuff unset($var); } } This way we avoid the duplication in memory of those values on that line. To see this happen in more detail take a look at this sample script with ouptut: memory-usage-example.php. This isn't critical, except when it is. Once loopOverStuff() completes, and the function ends. The memory is released back to the rest of PHP automatically. You'll only run into problems where Other Stuff + (2 * memory needed in loop) > Memory Limit. There are better architectures available to avoid the issue entirely (like not storing everything in the array, just to iterate over it later) but they're an issue for a different post.For a very simple base case demonstration of the issue take a look at the simple example. Comments
Display comments as
(Linear | Threaded)
This why learning a language like C/C++ is so useful. Those languages force you to manage memory and that skill translates nicely to scripting languages like PHP/Python.
Not really, all you need to come up with an idea like "hey, using unset will free me some memory and $var set in the loop exists outside the loop and between each iteration!!!" is a bit of logical thinking and knowledge of basic PHP stuff, C/C++ has not much to do with that
Also, using unbuffered queries with MySQL can save lots of memory. I talked about that at the 2008 MySQL Conference and on WebDevRadio. Slides and interview at http://brian.moonspot.net/2008/05/03/interview-with-webdevradio/
If you run the script with Xdebug, like:
php -dxdebug.auto_trace=1 -dxdebug.trace_format=1 paul.php You can clearly see this behaviour as well.
I'm just curious, as this would a micro-optimization, would it be more efficient to set $var to an empty value instead of unsetting it?
Thanks
I'd like to disagree with calling this a micro-optimization.
The actual data sets we're dealing with are greater than 20MB, reducing peak memory usage by that value has real (positive) effects on the system. True, using unbuffered queries, and avoiding buffering within PHP itself would be a far more efficient architecture (the developer in question will be going that way in version 1.2). Reducing peak memory usage now is far from "micro". That said, it seems like unset() vs "" or null would fit in the micro category
You are correct. I meant micro-optimization being the unset() vs "". Sorry for the confusion
What do you think about xcache mod for php...it's a php accellerator. So it's saving memory because most scripts get cached!
you're talking about *opcode caching*. that doesn't solve the main problem being discussed here.
Actually, that has the opposite effect. Opcode caches use more memory as it is caching all your scripts in memory. Opcode caches are made to save CPU cycles.
Let's not confuse the guy more.
The article was talking exhausting the memory allowed to the PHP script, not all the memory on a server. The fact that opcode caching uses memory outside of the PHP script is not really relevant.
Presumably there is something missing from this example - the "Do Stuff" comment would represent some code that makes use of the return value from `getData`. A better solution imho would be to move this code into a separate function and call it without using a temporary variable. Eg.:
function loopOverStuff() { for ($i = 0; $i < 10; $i++) { doStuff(getData()); } } Not only will this have the same effect as your proposed solution, but it has the added benefit of making the overall code more readable. As a general rule of thumb, avoiding temporary variables by replacing them with function calls makes code simpler and less error-prone.
PHP is optimized for web requests. Part of that optimization is lazily releasing memory in order to speed up the request. Those of us that use PHP for non-web things have to know how it works and be responsible for how we use it.
#6 has true, but generaly why ugly PHP doesn't have true block scope for variables? ({} scope for every loop, condition, ....)
C, C++, Perl... have it, PHP doesn't. Why PHP cannot have "use strict" like Perl? Uff, php has more and more basic badly things, e.g. why function()[2] doesn't work when function()->property works?
You still hit the peak memory usage even when using unset though. You said this is a work around for all the memory allocation issues PHP has. All this does is reduce memory usuage every other itterance. The app will still crash....
This is a great tip.
I've never encountered a scenario where something like this caused any noticeable memory leakage, but it this is great to know. I am probably going to blog about this on my site. Good job!
Little bit off-topic to the concrete discussed problem, but try to use pre-increment instead of post-increment in the for-loop.
i have tested this issue like this: $loops = 10000000; $startPost = microtime(true); for ($i = 0; $i < $loops; $i++) { $test = 1; } $endPost = microtime(true); $startPre = microtime(true); for ($i = 0; $i < $loops; ++$i) { $test = 1; } $endPre = microtime(true); echo $loops.' post icrements needed: '.($endPost - $startPost).''; echo $loops.' pre icrements needed: '.($endPre - $startPre).''; 10000000 post icrements needed: 17.445150852203 10000000 pre icrements needed: 13.529909849167 This would not take much influence at your problem but should be considered generally: use post increment only if really needed
@chasm Sure, if you care about a performance increase of 0.0000003915 seconds. That's not even a micro-optimisation - That's a nano-optimisation.
@troels
no its a increase of 4 seconds because with the parameter of microtime() it will return seconds. But you are right, its a kind of micro optimisation. But why using a function (return old value and then increment) that will not be used this way? The interpeter has to copy the value, increment the original and, when incrementation finished, return the copy. @paul, i'm interested in the other architectures, you have mentioned for resolving the main problem. greetings chasm
I meant, per cycle:
(17.445150852203 - 13.529909849167) / 10000000 The only reason you can see the difference is because you made a loop of 10 million iterations. If you have to do any kind of work inside the loop, then it will surely dwarf the difference between post and pre increment. You're right of course, but my point being that this only matters for a very few and rather abnormal edge cases.
Using unset will not work if you have circular references in the data you try to unset (PHP 5.3 has a special INI option that should improve handling this though). So it could help for certain specific cases, but isn't always the solution for OOM errors...
|
Who Am I?Hi, I'm Paul Reinheimer a developer working with PHP. I wrote a book titled Professional Web APIs with PHP back in 2006, and am currently a contractor, working primarily in Biomedical Informatics. I'm working on a project to help developers called WonderProxy which has proxies all over the world. Working on GeoIP development? Now you can finally test properly! My hobbies are cycling, photography, and travel. I frequently write about PHP or other related technologies. Feel free to subscribe to just the PHP feed if you're only interested in such topics. ![]() ![]() QuicksearchCalendar
CategoriesSyndicate This BlogBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||


This post was mentioned on Twitter by planetphp: Memory usage in PHP - Paul Reinheimer http://blog.preinheimer.com/index.php?/archives/354-Memory-usage-in-PHP.html
Tracked: Mar 19, 21:16