Where do two linked lists merge?
(followups set to comp.programming)
Where do two linked lists merge?
The following question was raised in comp.lang.c: Is there any
efficient way of finding the intersection point of two singly
linked lists? That is, given two singly linked lists that meet
at some point, find the node where they meet.
It turns out that the question as stated conceals two
assumptions. The first is that the two lists terminate with a
common null link. If they do they will have a single merge
point; the two merging lists form a Y structure. However there
are two ways a linked list can terminate, either by a null link
or by a cycle.
When the two lists terminate in a cycle each list can join the
cycle at different nodes. Each is a legitimate merge node, i.e.,
we can follow A until it reaches B's merge point or we can follow
B until it reaches A's merge point. There is no single "the
merge point".
We can remove the ambiguity by adopting some criterion for the
"best" merge point, e.g., the one that minimizes the combined
lengths to the merge point, although this fails if the two merge
points are on exact opposite sides of the cycle.
The second assumption is that the two lists have any nodes in
common, i.e., that they actually merge. Algorithms that fail if
the lists are terminated by cycles are unsafe for cycles.
Algorithms that fail if the lists have no nodes in common are
unsafe for failure to merge.
I. THE SIMPLE APPROACH FOR NULL TERMINATED LISTS
What we want to find is the first node the two lists have in
common. Let A.m and B.m be the number of nodes respectively
between the start of A and the merge node and B and the merge
node. Let m be the difference between A.m and B.m. If we skip
the first m nodes of the longer list than we can compare the two
lists node by node until we get to a match, the merge node.
Thus we can find the merge point if we can find m. Since the two
lists have the same nodes in common after the merge point we have
m = A.length - B.length.
The advantage of this approach is its simplicity. However it
does require knowing the list lengths. If this information is
not available in advance, the lists must be traversed in their
entirety.
The simple algorithm is unsafe for cycles (traversing the lists
to find end points goes into an infinite loop) but it is safe for
failure to merge (the comparison loop reaches the terminating
null node.)
II. FINDING THE ENTRY POINT INTO A TERMINATING CYCLE
Finding a cycle in a linked list is a classic algorithm. Let A
be the list. Let p1 and p2 be two pointers into A. Initialize
each to point to the start of the list. In each iteration of a
loop advance p1 by one node and p2 by two nodes; escape from the
loop when p1 and p2 both point to the same node. Call the node
x; it is a node within the list cycle.
The next step is to find the length of the cycle. In each
iteration of a second loop advance p1 by one node until again
points to x. The number of iterations is the length of the
cycle. Let k be the length of the cycle.
Now initialize p1 and p2 to the start of the list. Advance p2 by
k nodes. In a loop advance p1 and p2 by one node in each
iteration; escape from the loop when p1 and p2 point to the same
node. Call the node y. Then y is the entry point into the list
cycle.
If it is not known whether the lists are terminated by a cycle or
by a null node, we can add a test in the first loop to see if p2
detects a null node.
III. AN ALGORITHM BASED ON CYCLE DETECTION
When there is a terminating cycle there are three possibilities -
(a) A and B merge before they reach the cycle, (b) they merge at
a common entry point, or (c) they enter at different points in
the cycle. To distinguish these cases we need to find the entry
points for A and B. If the entry points are the same treat the
common entry point as a list terminator and apply the simple
algorithm in part I. If they differ we have case (c) - either
entry point is a legitimate answer.
If the two lists are null terminated the cycle detection
algorithm will discover the fact; the basic algorithm of part I
can be used.
In case (c) a complete implementation checks that the two lists
actually share a common cycle. Clearly they don't if the two
cycle lengths are different. If they are the same commonality
can be done by checking that the entry point of A can be reached
from the entry point of B.
This algorithm is O(n) but it is cumbersome and relatively
expensive. However,properly implemented, it is safe against
cycles and against failure to merge.
IV. A METHOD THAT DOESN'T REQUIRE FINDING THE TERMINATOR
The basic algorithm doesn't actually require knowing the list
length; what it needs is the distance between two nodes that
match. Call that distance m. It turns out that m can be found
without searching to the ends of the lists.
To see how that this works let us first suppose that we know a
bound M for m, and that A is the longer list. We compare
the first element of B with the first M elements of A. If we
find a match we've found m (and the merge point.) If no match is
found we go to node M+1 in B and check it against nodes M+1
through 2*M in A; we keep advancing by M nodes until a match is
found. A match will be found when the merge point is passed.
The difference in positions of the match in A and B yields m.
Unfortunately we don't have a bound M for m nor do we know which
list is longer. However we can use M=1 in the first iteration,
M=2 in the second iteration, and so on. We solve the problem of
not knowing which list is longer by alternating which list is the
lead list. With a bit of bookkeeping we get the following
algorithm:
Traverse A by one step. Traverse B by three steps, checking to
see if it there is a match with the last element read from A.
Traverse A by five more steps, checking for a match with the last
element read from B. Continue, alternating lists and increasing
the steps by two each time. When a match is found this way we
know the difference of distances to the merge point, m. This
gives us an O(m^2 +D) method where D is the common distance to
the merge point.
If the lists are null terminated it is possible that one of the
lists is exhausted before the a match is found. In that case we
revert to algorithm I. If the lists are terminated by a common
cycle algorithm IV will find one of the entry points. Algorithm
IV will fail if the two lists are disjoint and are terminated by
separate cycles.
FINAL REMARKS
This problem is more a puzzle than a serious programming problem.
A 'correct' solution depends on the objective and upon context.
If the lists are known to be null terminated algorithm III is not
needed. In principle algorithm IV is more efficient than
algorithm I; however it is more complex and in practice
simplicity may trump computational efficiency. If the lists are
known to be cycle terminated algorithm I will fail. Algorithm
III is guaranteed to work but is cumbersome; algorithm IV will be
considerably more efficient, but will fail if the lists are
disjoint.
Considered as a puzzle, algorithm I is the 'correct' solution.
It is simple and elegant.
Where do two linked lists merge?
The following question was raised in comp.lang.c: Is there any
efficient way of finding the intersection point of two singly
linked lists? That is, given two singly linked lists that meet
at some point, find the node where they meet.
It turns out that the question as stated conceals two
assumptions. The first is that the two lists terminate with a
common null link. If they do they will have a single merge
point; the two merging lists form a Y structure. However there
are two ways a linked list can terminate, either by a null link
or by a cycle.
When the two lists terminate in a cycle each list can join the
cycle at different nodes. Each is a legitimate merge node, i.e.,
we can follow A until it reaches B's merge point or we can follow
B until it reaches A's merge point. There is no single "the
merge point".
We can remove the ambiguity by adopting some criterion for the
"best" merge point, e.g., the one that minimizes the combined
lengths to the merge point, although this fails if the two merge
points are on exact opposite sides of the cycle.
The second assumption is that the two lists have any nodes in
common, i.e., that they actually merge. Algorithms that fail if
the lists are terminated by cycles are unsafe for cycles.
Algorithms that fail if the lists have no nodes in common are
unsafe for failure to merge.
I. THE SIMPLE APPROACH FOR NULL TERMINATED LISTS
What we want to find is the first node the two lists have in
common. Let A.m and B.m be the number of nodes respectively
between the start of A and the merge node and B and the merge
node. Let m be the difference between A.m and B.m. If we skip
the first m nodes of the longer list than we can compare the two
lists node by node until we get to a match, the merge node.
Thus we can find the merge point if we can find m. Since the two
lists have the same nodes in common after the merge point we have
m = A.length - B.length.
The advantage of this approach is its simplicity. However it
does require knowing the list lengths. If this information is
not available in advance, the lists must be traversed in their
entirety.
The simple algorithm is unsafe for cycles (traversing the lists
to find end points goes into an infinite loop) but it is safe for
failure to merge (the comparison loop reaches the terminating
null node.)
II. FINDING THE ENTRY POINT INTO A TERMINATING CYCLE
Finding a cycle in a linked list is a classic algorithm. Let A
be the list. Let p1 and p2 be two pointers into A. Initialize
each to point to the start of the list. In each iteration of a
loop advance p1 by one node and p2 by two nodes; escape from the
loop when p1 and p2 both point to the same node. Call the node
x; it is a node within the list cycle.
The next step is to find the length of the cycle. In each
iteration of a second loop advance p1 by one node until again
points to x. The number of iterations is the length of the
cycle. Let k be the length of the cycle.
Now initialize p1 and p2 to the start of the list. Advance p2 by
k nodes. In a loop advance p1 and p2 by one node in each
iteration; escape from the loop when p1 and p2 point to the same
node. Call the node y. Then y is the entry point into the list
cycle.
If it is not known whether the lists are terminated by a cycle or
by a null node, we can add a test in the first loop to see if p2
detects a null node.
III. AN ALGORITHM BASED ON CYCLE DETECTION
When there is a terminating cycle there are three possibilities -
(a) A and B merge before they reach the cycle, (b) they merge at
a common entry point, or (c) they enter at different points in
the cycle. To distinguish these cases we need to find the entry
points for A and B. If the entry points are the same treat the
common entry point as a list terminator and apply the simple
algorithm in part I. If they differ we have case (c) - either
entry point is a legitimate answer.
If the two lists are null terminated the cycle detection
algorithm will discover the fact; the basic algorithm of part I
can be used.
In case (c) a complete implementation checks that the two lists
actually share a common cycle. Clearly they don't if the two
cycle lengths are different. If they are the same commonality
can be done by checking that the entry point of A can be reached
from the entry point of B.
This algorithm is O(n) but it is cumbersome and relatively
expensive. However,properly implemented, it is safe against
cycles and against failure to merge.
IV. A METHOD THAT DOESN'T REQUIRE FINDING THE TERMINATOR
The basic algorithm doesn't actually require knowing the list
length; what it needs is the distance between two nodes that
match. Call that distance m. It turns out that m can be found
without searching to the ends of the lists.
To see how that this works let us first suppose that we know a
bound M for m, and that A is the longer list. We compare
the first element of B with the first M elements of A. If we
find a match we've found m (and the merge point.) If no match is
found we go to node M+1 in B and check it against nodes M+1
through 2*M in A; we keep advancing by M nodes until a match is
found. A match will be found when the merge point is passed.
The difference in positions of the match in A and B yields m.
Unfortunately we don't have a bound M for m nor do we know which
list is longer. However we can use M=1 in the first iteration,
M=2 in the second iteration, and so on. We solve the problem of
not knowing which list is longer by alternating which list is the
lead list. With a bit of bookkeeping we get the following
algorithm:
Traverse A by one step. Traverse B by three steps, checking to
see if it there is a match with the last element read from A.
Traverse A by five more steps, checking for a match with the last
element read from B. Continue, alternating lists and increasing
the steps by two each time. When a match is found this way we
know the difference of distances to the merge point, m. This
gives us an O(m^2 +D) method where D is the common distance to
the merge point.
If the lists are null terminated it is possible that one of the
lists is exhausted before the a match is found. In that case we
revert to algorithm I. If the lists are terminated by a common
cycle algorithm IV will find one of the entry points. Algorithm
IV will fail if the two lists are disjoint and are terminated by
separate cycles.
FINAL REMARKS
This problem is more a puzzle than a serious programming problem.
A 'correct' solution depends on the objective and upon context.
If the lists are known to be null terminated algorithm III is not
needed. In principle algorithm IV is more efficient than
algorithm I; however it is more complex and in practice
simplicity may trump computational efficiency. If the lists are
known to be cycle terminated algorithm I will fail. Algorithm
III is guaranteed to work but is cumbersome; algorithm IV will be
considerably more efficient, but will fail if the lists are
disjoint.
Considered as a puzzle, algorithm I is the 'correct' solution.
It is simple and elegant.
No comments:
Post a Comment