Tail Call Optimisation in Haskell and JavaScript

A tail call is a function call that is the last thing a function does before returning. That sounds like a minor implementation detail. It isn’t. Whether a runtime honours tail calls — replacing the current stack frame rather than pushing a new one — is the difference between recursion that scales linearly in stack space and recursion that uses constant space. It’s also where two languages I care about, Haskell and JavaScript, tell very different stories about the relationship between specification, implementation, and the reality of what you can actually rely on.

What Makes a Call a Tail Call

The critical question is whether any work remains after the recursive call returns. If the answer is no — if the call’s result is immediately returned without further computation — it’s in tail position.

-- NOT a tail call: multiplication happens after the recursive call returns
factorial :: Integer -> Integer
factorial 0 = 1
factorial n = n * factorial (n - 1)
--            ^-- this multiplication is still pending when factorial recurses

-- Tail call: the recursive call is the last thing that happens
factorial :: Integer -> Integer
factorial n = go 1 n
  where
    go acc 0 = acc
    go acc n = go (acc + n) (n - 1)  -- nothing left to do; result is the call itself

The accumulator pattern is the standard transformation. You thread an extra argument — the accumulated result — through the recursion so the final call can return it directly. The structure is always the same: an outer function with a clean interface and an inner go that carries the accumulator.

Without TCO, each recursive call pushes a new frame onto the call stack. A function called a million times deep creates a million stack frames. With TCO, the runtime recognises that the current frame is finished and reuses it — the recursion compiles down to a loop. O(n) stack becomes O(1).

Haskell: TCO Works, But Laziness Complicates It

GHC performs TCO reliably. A function in proper tail position will be optimised to a jump rather than a call — no stack growth. The trouble is that Haskell’s lazy evaluation can undermine the benefit even when the recursion is structurally tail-recursive.

Consider a sum over a list:

-- Structurally tail-recursive — but has a space leak
mySum :: [Integer] -> Integer
mySum xs = go 0 xs
  where
    go acc []     = acc
    go acc (x:xs) = go (acc + x) xs

This looks fine. The recursive call is in tail position. But Haskell is lazy — acc + x is not evaluated when it’s constructed; it’s stored as a thunk, an unevaluated expression to be forced later. After a million iterations, acc is not a number. It’s a chain of a million nested additions: ((((0 + 1) + 2) + 3) + ...). The stack doesn’t blow up, but the heap does. This is a space leak.

The fix is forcing strict evaluation with a bang pattern or seq:

{-# LANGUAGE BangPatterns #-}

-- Bang pattern forces evaluation of acc before the recursive call
mySum :: [Integer] -> Integer
mySum xs = go 0 xs
  where
    go acc []      = acc
    go !acc (x:xs) = go (acc + x) xs
--  ^--- the ! forces acc to be evaluated to WHNF before proceeding

The ! annotation tells GHC: evaluate this argument to weak head normal form before proceeding. The thunk chain never forms. Stack usage: O(1). Heap usage: O(1). This is what you want.

The same issue surfaces with foldl — the lazy left fold — which is almost never what you want:

-- foldl is tail recursive but lazy — builds thunks
sum1 = foldl (+) 0 [1..1000000]   -- heap explosion

-- foldl' is the strict version — forces evaluation at each step
sum2 = foldl' (+) 0 [1..1000000]  -- O(1) space

foldl' (from Data.List) is the correct default for left folds over large lists in Haskell. foldl is a trap that runs silently until it doesn’t.

GHC’s Worker/Wrapper Transformation

Beyond TCO, GHC performs a more general optimisation called worker/wrapper transformation. It splits a recursive function into a wrapper (the external interface, handles the clean API) and a worker (the internal loop, optimised for performance — often unboxed types, tight loops). This is what happens when you look at GHC Core output and see your clean recursive Haskell function has been compiled to something barely recognisable. That’s a compliment.

The practical upshot: for performance-critical recursion in Haskell, write accumulator-style with strictness annotations, and trust GHC to finish the job. Don’t reach for unsafePerformIO or manual CPS when the compiler is doing more sophisticated work than TCO anyway.

JavaScript: Specified, Then Abandoned

This is where the story gets interesting.

ES2015 (ES6) added Proper Tail Calls (PTC) to the specification. The language formally specified that calls in tail position must not grow the stack. It was a genuine addition to the standard, shipped in June 2015. The intent was clear: JavaScript should support tail-recursive programming as a first-class feature.

V8 implemented it. Then removed it.

In 2016, the V8 team pulled their PTC implementation, citing two problems: debugging tools couldn’t reconstruct meaningful stack traces (the optimised frames were gone), and no other major engine was moving toward implementation. The tail of the tail call story is that the only major JavaScript engine that implements proper tail calls today is JavaScriptCore — the engine in Safari. V8 (Chrome, Node.js) does not. SpiderMonkey (Firefox) never did.

The result: code written to rely on TCO will stack overflow in every JavaScript runtime except Safari. This is not a theoretical concern.

// In tail position per the spec — but will stack overflow in Node for large n
function factorial(n, acc = 1) {
  if (n <= 1) return acc;
  return factorial(n - 1, n * acc);
}

factorial(100000); // RangeError: Maximum call stack size exceeded (Node/Chrome)

The Trampoline Pattern

The community workaround is trampolining. Instead of making the recursive call directly, you return a function that would make the call. A driver loop — the trampoline — keeps invoking the returned functions until it gets a plain value back.

// A general trampoline driver
const trampoline = fn => (...args) => {
  let result = fn(...args);
  while (typeof result === 'function') {
    result = result();
  }
  return result;
};

The recursive function is rewritten to return a thunk (a zero-argument function) instead of making the recursive call:

// Trampolined factorial — returns a thunk instead of calling itself
const _factorial = (n, acc = 1) => {
  if (n <= 1) return acc;
  return () => _factorial(n - 1, n * acc);  // thunk, not a call
};

const factorial = trampoline(_factorial);

factorial(100000); // Works in any JS engine — stack depth never exceeds 1

The call stack stays flat. trampoline invokes one thunk at a time, sees whether the result is another thunk or a value, and loops accordingly. Stack depth is always O(1) — there’s only ever one frame on the stack at a time.

The tradeoff: trampolining allocates a closure on the heap for each iteration instead of a stack frame. For tight numeric loops, this is measurable overhead. For most application-level recursion over data structures, it’s fine.

Syntactic Tail Calls (STC) — The Abandoned Alternative

After pulling PTC, TC39 proposed an alternative: Syntactic Tail Calls, where you opt into tail call behaviour with explicit syntax — return continue factorial(n - 1, n * acc) or similar. The idea was to address the debugging concerns by making the optimisation visible and intentional. STC stalled in committee and was never adopted. The spec still formally contains PTC; no mainstream engine implements it outside Safari.

The Contrast

	Haskell	JavaScript
TCO guaranteed?	✅ GHC always	❌ Safari only
Main gotcha	Lazy thunk buildup	Engine fragmentation
Correctness fix	Bang patterns / `foldl'`	Trampolining
Spec vs reality	Aligned	Diverged dramatically
Debugging impact	Minimal	Was the reason V8 dropped it

The divergence is philosophically revealing. Haskell is a language where recursion is the control flow primitive — there are no for loops, no while. TCO isn’t a nice-to-have; it’s a prerequisite for the language being usable. GHC implements it because it must.

JavaScript evolved from imperative roots. Loops exist. Recursion is one option among several. TCO was added to ES6 in pursuit of functional programming parity, but when the implementation cost (in debuggability, in cross-engine coordination effort) was weighed against the benefit — in a language where for loops work fine — V8 made the pragmatic call to drop it. You can disagree with that decision, but it’s not irrational.

When This Actually Matters

For most JavaScript code, TCO is irrelevant. Iteration over arrays, object traversal, async chains — none of these are expressed as deep recursion in idiomatic JS. The cases where it matters:

Recursive data structure traversal where the depth is unbounded (deep JSON trees, graph traversal, parsing)
Continuation-passing style (CPS) transforms, which are inherently tail-recursive
Recursive algorithms over user-controlled input where the depth is proportional to input size

For all of these in JavaScript, reach for trampolining or an explicit stack structure over recursion. Don’t rely on the runtime.

For Haskell: reach for accumulator-style recursion with bang patterns by default. Reach for foldl' instead of foldl everywhere. Profile before assuming a thunk buildup problem — sometimes it’s not there, and sometimes it’s in an unexpected place. GHC’s -prof flag and the heap profiling mode show exactly where thunks are accumulating when a space leak isn’t obvious from reading the code.

The broader lesson is that understanding whether your runtime actually does TCO matters more than whether the specification says it should. In Haskell, the specification and the implementation agree. In JavaScript, they diverge in production — and your stack traces will tell you so.