backend/memory-management@v1.0.0
article··10 min read

PHP references: the footgun that ships faster than you think

#php #memory #debugging #references #performance #footguns

PHP references are one of the few language features the PHP manual explicitly warns against using unnecessarily. The warning is warranted. I have debugged three separate production incidents caused by references, and in two of them the original developer was not aware they had introduced a reference at all.

This is not an article about why & is a code smell. It is an article about understanding exactly what it does, because you will encounter it in legacy codebases, you will occasionally need it, and you will definitely debug a bug caused by it someday.

What a reference actually is

PHP's default behaviour is copy-on-write: when you assign one variable to another, they initially share the same value in memory. The copy happens only when one of them is modified. This is already quite efficient for read-heavy code.

A reference bypasses copy-on-write entirely. Two variables that are references to the same value share memory regardless of modifications. Modifying either one modifies the underlying value both point to.

$a = 'original';
$b = $a;   // copy-on-write: $b points to the same memory, but...
$b = 'modified';  // ...the copy happens here. $a is still 'original'.
var_dump($a);     // string(8) "original"

$a = 'original';
$b = &$a;  // reference: $b is an alias for the same memory location as $a
$b = 'modified';  // no copy — modifies the underlying value directly
var_dump($a);     // string(8) "modified"  ← $a changed, not $b

The difference matters because reference behaviour in PHP is not always obvious when reading code. References do not look different from regular variables after assignment, $b looks the same in both cases. You have to scroll back to where & was introduced.

Incident 1: the foreach that corrupted the array

This is the most common reference bug I have seen in production codebases. It appears in pre-PHP 7 code that was never refactored:

$prices = [100, 200, 300, 400, 500];

foreach ($prices as &$price) {
    $price = $price * 0.9;
}
// After the loop: $prices = [90, 180, 270, 360, 450] ✓

// Some other code, three lines later, iterates the same array:
foreach ($prices as $price) {
    echo $price . "\n";
}

Expected output: 90, 180, 270, 360, 450. Actual output: 90, 180, 270, 360, 360. The last element is wrong. After the first foreach, $price is still a reference to the last element of $prices, the value at index 4. The second foreach assigns each value to $price in turn. When it assigns the fourth value (360) to $price, it writes 360 to $prices[4]. Then it tries to read $prices[4] for the fifth iteration and finds 360, not 450.

// The fix
foreach ($prices as &$price) {
    $price = $price * 0.9;
}
unset($price);  // break the reference before the variable goes out of scope

unset($price) does not destroy the last element of the array. It destroys the reference connection between $price and $prices[4]. In every codebase where I have seen this bug, unset($price) was missing. The PHP documentation explicitly mentions it. It is still missing in codebases today.

Incident 2: the function that silently mutated the caller's data

A data transformation pipeline had a function to normalise product data. It was called with large arrays, and someone added & to avoid copying:

// Original: safe, no side effects
function normaliseProduct(array $product): array
{
    $product['title'] = trim(strtolower($product['title']));
    $product['price'] = round($product['price'] * 100) / 100;
    return $product;
}

// "Optimised" version: unsafe
function normaliseProduct(array &$product): void
{
    $product['title'] = trim(strtolower($product['title']));
    $product['price'] = round($product['price'] * 100) / 100;
}

Calling $normalised = normaliseProduct($product) on the original returned a modified copy. On the "optimised" version the function returned void, $normalised was null, and $product was modified in place. The cached data for every product was null. The reporting system showed nothing. Nobody noticed for two days because the main read path hit the database, not the cache. The reference "optimisation" saved literally zero memory, PHP arrays already use copy-on-write, and the function only reads two keys.

When references are actually correct

References are appropriate in exactly two situations I have encountered. First: large data structures modified in place in a recursive algorithm. If you are traversing and modifying a deeply nested array, passing by reference avoids copying the entire structure at each recursion depth. This is a real performance problem only at meaningful scale, and I would not reach for it below 10MB of data. Second: output parameters in C-extension-style functions, such as preg_match() with a match array.

The object reference misconception

A very common misunderstanding: objects in PHP are already "passed by reference." They are not. Objects are passed by handle, a pointer to the object, not the object itself. Reassigning the handle inside a function does not affect the caller's handle. Modifying the object through the handle does.

class Counter { public int $count = 0; }

function increment(Counter $counter): void
{
    $counter->count++;       // modifies the object — caller sees this
    $counter = new Counter;  // reassigns the handle — caller does NOT see this
}

$c = new Counter;
increment($c);
var_dump($c->count);  // int(1) — the increment happened, the reassignment did not

What I watch for in code review

When I see & in a function signature or in a foreach, I stop and read the surrounding twenty lines carefully. Is that reference still active after the loop? Does the caller expect the function to have no side effects on the argument? Is the performance justification real, or is it premature optimisation from someone who did not read the manual on copy-on-write?

A reference in application-layer PHP code is a yellow flag, not because it is always wrong, but because the code relies on aliasing semantics that are non-obvious to the next reader. Non-obvious to the next reader is where bugs live.

end of node