1: \documentclass[11pt]{article}
2: \usepackage{url}
3: \usepackage{epsf}
4: \usepackage{epsfig}
5: \usepackage{graphicx}
6: \usepackage{amsfonts}
7: \usepackage{amsmath,amssymb}
8: \usepackage{latexsym}
9:
10: \addtolength{\textwidth}{0.2in}
11: \addtolength{\evensidemargin}{-0.1in}
12: \addtolength{\oddsidemargin}{-0.1in}
13: %\addtolength{\textheight}{1in}
14: %\addtolength{\topmargin}{-0.5in}
15:
16: \newcommand{\ABox}{
17: \raisebox{3pt}{\framebox[6pt]{\rule{6pt}{0pt}}}
18: }
19: \newenvironment{proof}{{\bf Proof:}}{\hfill\ABox}
20:
21: \newtheorem{theorem}{{\bf Theorem}}
22: \newtheorem{corollary}[theorem]{Corollary}
23: \newtheorem{lemma}[theorem]{Lemma}
24: \newtheorem{claim}[theorem]{Claim}
25: \newtheorem{proposition}[theorem]{Proposition}
26: \newtheorem{conjecture}[theorem]{Conjecture}
27: \newtheorem{definition}[theorem]{Definition}
28: \newtheorem{openquestion}[theorem]{Open question}
29:
30: \newcommand{\R}{\mathcal R}
31: \newcommand{\A}{\mathcal A}
32: \newcommand{\dsd}{\tt DSD}
33: \newcommand{\cost}{\tt cost}
34:
35: \begin{document}
36:
37: \title{An $O(n \log n)$-Time Algorithm for the Restricted Scaffold Assignment}
38: \author{
39: Justin Colannino
40: \and Mirela Damian
41: %\and Erik Demaine
42: \and Ferran Hurtado
43: \and John Iacono
44: %\and Stefan Langerman
45: \and Henk Meijer
46: %\and Diane Souvaine
47: \and Suneeta Ramaswami
48: \and Godfried Toussaint}
49:
50: \date{}
51:
52: \maketitle
53:
54: \begin{abstract}
55: The {\em assignment} problem takes as input two finite point sets
56: $S$ and $T$ and establishes a correspondence between points in $S$
57: and points in $T$, such that each point in $S$ maps to exactly
58: one point in $T$, and each point in $T$ maps to at least one
59: point in $S$. In this paper we show that this problem has an
60: $O(n \log n)$-time solution, provided that the points in
61: $S$ and $T$ are restricted to lie on a line
62: (linear time, if $S$ and $T$ are presorted).
63: \end{abstract}
64:
65: \section{Introduction}
66: Consider two finite sets of points $S$ and $T$ with total
67: cardinality $n$.
68: The objective of the {\em assignment} problem is to establish a
69: correspondence between the points in $S$ and the points in $T$,
70: such that each point in $S$ corresponds to exactly one point in
71: $T$, and each point in $T$ corresponds to at least one point
72: in $S$.
73: This correspondence is measured by a cost function $\delta$ that
74: assigns a cost $\delta(s, t)$ to each assigned pair $(s, t)$.
75: The cost of an assignment is the sum of the costs of all assigned
76: pairs. The goal of the assignment problem is to find an assignment
77: of minimum cost.
78:
79: The general assignment problem is also known as the
80: {\em many-to-one assignment} problem.
81: The {\em one-to-one} version of the assignment problem requires
82: that each point in $S$ maps to exactly one point in $T$ and
83: each point in $T$ gets mapped exactly one point in $S$.
84: %Such an assignment is undefined when $|S| \neq |T|$.
85: Throughout the paper, whenever we talk about the assignment
86: problem, we refer to the many-to-one version of the problem.
87:
88: The simplest version of the assignment problem assumes
89: that the points in $S$ and $T$ lie on a line and the cost function
90: is the $L_1$ metric.
91: In this setting, the one-to-one assignment problem
92: has a simple $O(n \log n)$ time solution when $|S| = |T|$:
93: first sort the points in $O(n \log n)$ time, then map the $k^{th}$ point
94: in $S$ to the $k^{th}$ point in $T$ in $O(n)$ time
95: \cite{bib:toussaintsimilarity}.
96: However, the situation $|S| < |T|$ arises in many practical
97: applications.
98: This situation was first addressed by Karp and Li~\cite{KL75},
99: who provided an $O(n \log n)$ time algorithm for the one-to-one
100: assignment problem ($O(n)$ time, if $S$ and $T$ are given in
101: sorted order).
102: Simpler and equally efficient solutions have later been provided
103: in ~\cite{ABKKS95, BY98, WPMK86}.
104:
105: Eiter and Mannila\cite{bib:eiter97distance} studied the assignment
106: problem in the context of measuring the distance between two
107: theories expressed in a logical language. They showed that for points
108: in arbitrary dimensions, this problem has a polynomial time solution.
109: When restricted to points on a line, a minimum cost assignment
110: can be used in measuring the similarity between musical
111: rhythms. In this context, Toussaint~\cite{T03} proposed the use
112: of the {\em directed swap distance} as a similarity measure.
113: If the onsets of a rhythm are represented as points on a line
114: separated by ``silence''
115: intervals, the directed swap distance between two rhythms with onset
116: sets $S$ and $T$ is precisely the cost of an optimal
117: assignment between $S$ and $T$, with underlying cost function $L_1$.
118:
119: The assignment problem also appears in the shape of the
120: {\em restriction scaffold assignment} problem in computational
121: biology~\cite{BKSS03}.
122: The goal here is to establish a correspondence between sparse
123: experimental data and a restricted set of known structural
124: building blocks. Ben-Dor et. al.~\cite{BKSS03} model the
125: restriction scaffold assignment as an assignment problem
126: for points on a line, and provide an $O(n \log n)$ time algorithm to
127: solve this problem. However, as later shown by Colannino and
128: Toussaint~\cite{CT05}, this algorithm fails to always produce
129: a minimum cost assignment. Thus, the best existing
130: solution to the assignment problem in one dimension
131: is the $O(n^2)$ algorithm presented in~\cite{CT05}.
132:
133: In this paper, we show that the assignment problem
134: with underlying cost function $L_1$ in one dimension
135: can be solved in $O(n \log n)$ time
136: ($O(n)$ if the points in $S$ and $T$ are given in sorted order).
137: Our algorithm is a simple extension of the $O(n \log n)$ time
138: algorithm of Karp and
139: Li~\cite{KL75} for finding the minimum cost {\em one-to-one}
140: assignment over $T$ and all subsets $S' \subset S$ of size $|T|$,
141: assuming $|S| > |T|$.
142: We present our algorithm in Section~\ref{sec:many-one},
143: after a few preliminary results (Section~\ref{sec:preliminaries})
144: and a close look at some properties of an optimal solution
145: (Section~\ref{sec:properties}).
146:
147: \section{Background}
148: \label{sec:definition}
149: Let $S = \{s_0, s_1, s_2, \ldots\}$ and $T = \{t_0, t_1, t_2,
150: \ldots\}$ be two finite sets of points that lie on a horizontal line,
151: with $|S|+|T|=n$ and $|S| > |T|$. For any $s \in S$ and $t \in T$,
152: the cost $\delta(s, t)$ of an assigned pair $(s, t)$ is the absolute value
153: of the difference between the $x$-coordinates of $s$ and $t$. To avoid
154: overloading the notation, we use the same symbol for a point and its
155: $x$-coordinate. Thus, $\delta(s, t) = |s - t|$. We assume that $s_i <
156: s_{i+1}, 0\leq i < |S|-1$ and $t_j < t_{j+1}, 0\leq j < |T|-1$.
157:
158: An assignment $\A$ between $S$ and $T$
159: consists of pairs of points $(s, t)$ (henceforth {\em edges}),
160: with $s \in S$ and $t \in T$, such that each point in $S$ belongs to
161: exactly one edge in $\A$, and each point in $T$ belongs to at least
162: one edge in $\A$. The cost of $\A$ is
163: \[ cost(\A) = \sum_{(s, t) \in \A} \delta(s, t) \]
164: Our goal is to find an assignment $\A$ of minimum cost.
165: If two points in $S \cup T$ have the same $x$-coordinate, we can
166: slightly shift one of them to the left or right. If the minimum cost
167: assignment is unique and the change is sufficiently small, this change
168: will not affect the optimal assignment. If there are several
169: assignments with the same optimal cost, at least one of them
170: will be the optimal solution of the new point set. So we may
171: assume without loss of generality that all points in $S \cup T$ are
172: distinct.
173:
174: \subsection{Preliminaries}
175: \label{sec:preliminaries}
176: %
177: For any $s \in S$ and $t \in T$, the value $|s - t|$
178: can be expressed in a different way as follows.
179: Define a function $f_{s,t}$ to be $1$ in the interval
180: between $s$ and $t$ and $0$ at any other
181: point (see Figure~\ref{fig:intcost}).
182: Then $|s - t| = \int_{-\infty}^{+\infty} f_{s,t}(x) dx$.
183: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
184: \begin{figure}[htbp]
185: \centering
186: \includegraphics[width=0.34\linewidth]{Figures/int.cost.eps}
187: \caption{Function $f_{s,t}$. Shaded area represents the
188: cost $|s-t|$.}
189: \label{fig:intcost}
190: \end{figure}
191: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
192:
193: \noindent
194: The cost of an assignment $\A$ is therefore
195: \[
196: cost(\A) = \sum_{(s,t)\in \A} \int_{-\infty}^{+\infty} f_{s,t}(x) dx
197: = \int_{-\infty}^{+\infty} \sum_{(s,t) \in \A} f_{s,t}(x) dx
198: \]
199: If we define
200: \[
201: f_{\A}(x) = \sum_{(s,t) \in \A} f_{s,t}(x)
202: \]
203: then the value $f_{\A}(a)$ is simply the number of
204: edges in $\A$ pierced by the vertical line $x = a$,
205: and the cost of $\A$ is
206:
207: \begin{equation}
208: cost(\A) = \int_{-\infty}^{+\infty} f_{\A}(x) dx
209: \label{eq:cost}
210: \end{equation}
211:
212: Our definition of $f_{\A}$ is similar in nature to the {\em height}
213: function $H:\mathbb{R}\rightarrow\mathbb{Z}$ introduced by Karp and Li
214: \cite{KL75}. Informally, they define $H(a)$ at each point $a$ as the
215: difference between the number of points in $S$ and the number of
216: points in $T$ restricted to the interval $(-\infty, a]$ (or
217: equivalently, to the left of the vertical line $x = a$). Thus $H$
218: remains constant throughout each interval that does not contain a
219: point in $S \cup T$. Figure~\ref{fig:height} shows the stair-shaped
220: curve of $H$ for a small example.
221: %$S = \{0, 4, 6, 13, 14, 16\}$
222: %and $T = \{1, 2, 8, 10, 11, 12\}$.
223: Note that {\em up} transitions in the curve correspond to
224: points in $S$ and {\em down} transitions correspond
225: to points in $T$. We refer to the value $H(x)$ as the {\em height} of
226: $x$. Note that $H(\infty) = |S|-|T|$.
227:
228: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
229: \begin{figure}[htbp]
230: \centering
231: \includegraphics[width=0.6\linewidth]{Figures/height.eps}
232: \caption{Height function for sets
233: $S = \{0, 3, 4, 6, 13, 14, 15, 16\}$
234: and $T = \{1, 2, 8, 10, 11, 12\}$.}
235: \label{fig:height}
236: \end{figure}
237: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
238:
239:
240: \begin{lemma}
241: If $|S| = |T|$, then $\int_{-\infty}^{+\infty} |H(x)|~dx$
242: is the cost of the assignment that assigns the
243: $k^{th}$ largest element of $S$ to the $k^{th}$
244: largest element of $T$.
245: \label{lem:one}
246: \end{lemma}
247: \begin{proof}
248: Follows immediately from (\ref{eq:cost}) and the fact that,
249: for this particular assignment, $f_{\A}(x) = |H(x)|$ at each
250: point $x$.
251: \end{proof}
252:
253: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
254: \begin{figure}[htbp]
255: \centering
256: \begin{tabular}{cc}
257: \includegraphics[width=0.6\linewidth]{Figures/equal.assignment.eps} &
258: \raisebox{7ex}{(a)} \\
259: & \\
260: \includegraphics[width=0.6\linewidth]{Figures/equal.cost.eps} &
261: \raisebox{7ex}{(b)}
262: \end{tabular}
263: \caption{
264: (a) One-to-one assignment for sets
265: $S = \{0, 4, 6, 13, 14, 16\}$ and
266: $T = \{1, 2, 8, 10, 11, 12\}$
267: (b) Shaded area represents the cost of the assignment.}
268: \label{fig:equal.assign}
269: \end{figure}
270: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
271:
272: \noindent
273: Figure~\ref{fig:equal.assign}a shows an assignment
274: for two sets $S$ and $T$, with $|S| = |T|$.
275: The cost of this assignment is equal
276: to the area shaded in Figure~\ref{fig:equal.assign}b,
277: which is precisely the value of the integral
278: $\int_{-\infty}^{+\infty} |H(x)|~dx$.
279:
280: \section{Properties of a Minimum Cost Assignment}
281: \label{sec:properties}
282: Our algorithm for computing a minimum cost assignment $\A$
283: exploits several important properties of $\A$, which
284: we discuss next.
285: A {\em crossing} is defined by a pair of
286: edges $(a, d)$ and $(b, c)$ such that $a < b$ in $S$
287: and $c < d$ in $T$.
288:
289: \begin{lemma}
290: There exists a minimum cost assignment with no crossings.
291: \label{lem:monotone}
292: \end{lemma}
293: \begin{proof}
294: Let $\A$ be a minimum cost assignment between $S$ and $T$
295: with a minimum number of crossings. If $\A$ has zero
296: crossings, the proof is finished. Otherwise, pick
297: two crossing edges $(a, d)$ and $(b, c)$ in $\A$,
298: with $a < b$ in $S$ and $c < d$ in $T$.
299: We show that $\A' = \A \setminus \{(a, d), (b, c)\}
300: \cup \{(a, c), (b, d)\}$ is an assignment with
301: $cost(\A') \le cost(\A)$, a contradiction.
302: In particular, we show that
303: $f_{\A'}(x) \le f_{\A}(x)$ at each point $x$; then
304: $cost(\A') \le cost(\A)$ follows immediately
305: from~(\ref{eq:cost}).
306:
307: First note that $f_{\A'}(x) \le f_{\A}(x)$ is true for
308: any $x$ such that the vertical line $L$ at $x$ intersects
309: neither of $(a,d)$ and $(b,c)$.
310: Suppose now that $L$ intersects $(a,c)$. Then $L$ must
311: also intersect either $(a, d)$ (see Figure~\ref{fig:crossing}a)
312: or $(b, c)$ (see Figure~\ref{fig:crossing}b) or both
313: (see Figure~\ref{fig:crossing}c).
314: Similarly, if $L$ intersects $(b,d)$, then $L$ also intersects
315: at least one of $(a, d)$ and $(b, c)$.
316: Furthermore, if $L$ intersects
317: both $(a, c)$ and $(b, d)$, then $L$ also intersects
318: both $(a, d)$ and $(b, c)$ (see Figure~\ref{fig:crossing}c).
319: It follows that $f_{\A'}(x) \le f_{\A}(x)$.
320: \end{proof}
321:
322: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
323: \begin{figure}[htbp]
324: \centering
325: \begin{tabular}{c@{\hspace{0.06\linewidth}}
326: c@{\hspace{0.06\linewidth}}c}
327: \includegraphics[width=0.22\linewidth]{Figures/crossing1.eps} &
328: \includegraphics[width=0.25\linewidth]{Figures/crossing2.eps} &
329: \includegraphics[width=0.24\linewidth]{Figures/crossing3.eps} \\
330: (a) & (b) & (c)
331: \end{tabular}
332: \caption{
333: (a) Vertical line $L$ intersects $(a,c)$ and $(a,d)$
334: (b) $L$ intersects $(a, c)$ and $(b,c)$
335: (c) $L$ intersects
336: $(a,c)$, $(b,d)$, $(a,d)$ and $(b,c)$.}
337: \label{fig:crossing}
338: \end{figure}
339: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
340:
341:
342: %In an assignment $\A$, each point
343: %$s \in S$ belongs to exactly one edge in $\A$ and
344: %each $t\in T$ belongs to at least one edge in $\A$.
345: %Therefore, $\A$ can be associated with an
346: %assignment (function) $\A : S \to T$ such that
347: %$\A(s) = t$ for each $(s, t) \in \A$.
348:
349: An assignment $\A$ can also be regarded as a function
350: $\A : S \to T$ such that $\A(s) = t$ for each
351: $(s, t) \in \A$.
352: For any $t \in T$, let $\A^{-1}(t)$ denote the set of
353: elements $s \in S$ such that $\A(s) = t$.
354: For each point $s \in S$, define the
355: {\em nearest neighbor} $N(s)$ to be point in
356: $T$ closest to $s$, i.e, $|N(s) - s|\le |t - s|$
357: for any $t\in T$. In the case of a tie, $N(s)$ is
358: arbitrarily picked from among the two candidate
359: neighbors.
360:
361: \begin{lemma}
362: Let $\A$ be optimal and let $t \in T$ be such that
363: $\A^{-1}(t)$ contains two or more elements.
364: Then for each $s \in \A^{-1}(t)$,
365: $t$ is a nearest neighbor of $s$. Furthermore,
366: $T$ contains no points in between $s$ and $t$.
367: \label{lem:neighbor}
368: \end{lemma}
369: \begin{proof}
370: Assume to the contrary that there is $s \in S$ with
371: $\A(s) = t, |\A^{-1}(t)| > 1,$ and $N(s) \neq t$. Refer to
372: Figure~\ref{fig:neighbor}.
373: Define a new assignment $\A'$ with $\A'(s) = N(s)$ and
374: $\A'(x) = \A(x)$ for $x \neq s$. Note that $\A'$
375: is also an assignment:
376: $\A^{-1}(t)$ contains at least one point.
377: Also $cost(\A') = cost(\A) - |s-t| + |s-N(s)|$ (see
378: Figures~\ref{fig:neighbor}a and~\ref{fig:neighbor}b).
379: %% at any other point.
380: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
381: \begin{figure}[htbp]
382: \centering
383: \begin{tabular}{c@{\hspace{1in}}c}
384: \includegraphics[width=0.3\linewidth]{Figures/neighbor.not.eps} &
385: \includegraphics[width=0.3\linewidth]{Figures/neighbor} \\
386: (a) & (b)
387: \end{tabular}
388: \caption{
389: (a) Assignment $\A$ with $\A(s)\neq N(s)$
390: (b) Assignment $\A'$ with $\A'(s) = N(s)$}
391: \label{fig:neighbor}
392: \end{figure}
393: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
394: Since $|s-N(s)| < |s-t|$, it follows that
395: $cost(\A') < cost(\A)$,
396: contradicting the fact that $\A$ is of minimum cost.
397: Thus, $t$ is a nearest neighbor of $s$.
398:
399: The claim that $T$ contains no points in between
400: $s$ and $t$ is immediate: if such a point $t_1\in T$ existed,
401: then $|s-t_1| < |s-t|$,
402: contradicting the fact that $N(s) = t$.
403: \end{proof}
404:
405: \medskip
406: \noindent
407: Observe that for any subset $R \subset S$ of size $|R| = |S|-|T|$,
408: there is a unique minimum cost assignment (with no crossings)
409: from $S\setminus R$ to $T$. Let $\A_{S\setminus R}$ denote the
410: edges of such an assignment, and define a new
411: assignment $\A_R: S\rightarrow T$ as follows:
412: \begin{equation}
413: \A_R(x) =
414: \begin{cases} N(x) & \text{if $x\in R$,}\\
415: y & \text{if $x\in S\setminus R$ and $(x,y)\in \A_{S\setminus R}$}
416: \end{cases}
417: \label{eq:ar}
418: \end{equation}
419:
420: \vspace*{0.1in}
421: \noindent Lemma~\ref{lem:neighbor} implies that there always exists
422: a subset $R$ such that $\A_R$ defines a minimum cost assignment
423: from $S$ to $T$. Furthermore, $R$ satisfy a special height
424: condition, stated in the lemma below.
425:
426: \begin{lemma}
427: %% There exists a minimum cost assignment $\A_R$ with the property that
428: %% the $k^{th}$ smallest element of $R$ has height $k$.
429: There exists a subset $R\subset S$ with $|R|=|S|-|T|$ such that $\A_R$
430: defines a minimum cost assignment from $S$ to $T$, and the
431: $k^{th}$ smallest element of $R$ has height $k$.
432: \label{lem:height}
433: \end{lemma}
434:
435: \begin{proof}
436: Let $\A:S\rightarrow T$ define a minimum cost assignment. We prove the
437: existence of $\A_R$ by constructing a set $R \subset S$
438: with the properties stated in this lemma. Initially $R$ is empty.
439: If $|\A^{-1}(t)| = 1$ for all $t\in T$, then $R$ is empty and
440: the proof is finished.
441: Otherwise, we process points $t \in T$ for
442: which $\A^{-1}(t)$ has two or more elements.
443: For each such point we consider two cases,
444: as depicted in Figure~\ref{fig:lemma4cases}.
445: If all points in $\A^{-1}(t)$ are less than $t$, then we
446: add to $R$ all but the largest (rightmost) point in $\A^{-1}(t)$
447: (see Figure~\ref{fig:lemma4cases}a).
448: Otherwise, we add to $R$ all points in $\A^{-1}(t)$
449: except for the smallest (leftmost) point greater than $t$
450: (see Figure~\ref{fig:lemma4cases}b).
451:
452: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
453: \begin{figure}[htbp]
454: \centering
455: \includegraphics[width=0.7\linewidth]{Figures/lemma4cases.eps}
456: \caption{
457: (a) All points in $\A^{-1}(t)$ are less than $t$.
458: (b) Some points in $\A^{-1}(t)$ are greater than $t$.}
459: \label{fig:lemma4cases}
460: \end{figure}
461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
462:
463: %% Note that the cost of $\A_{R}$ is minimum, as the assignment has
464: %% not changed.
465: We now define $A_R$ as in~(\ref{eq:ar}). Since $\A_R$
466: is identical to $\A$, $\A_R$ is a minimum
467: cost many-to-one assignment from $S$ to $T$.
468:
469: It remains to show that
470: the $k^{th}$ smallest element of $R$ has
471: height $k$. To see this, first consider
472: the smallest element of a nonempty set $\A^{-1}(t)\cap R$. Call this
473: element $r$ and suppose it
474: is the $k^{th}$ smallest element of $R$. It follows then that (i)
475: $R$ contains $k-1$ points less than $r$, and
476: (ii) $T$ and $S \setminus R$ contain an equal number of
477: elements less than $r$. This latter claim follows from
478: Lemma~\ref{lem:neighbor}, which tells us that $T$
479: contains no elements in between $r$ and $t$, and the following
480: observation: the way in which we have selected $R$
481: ensures that if $t$ lies to the left of $r$ (i.e., $t<r$), the
482: assigned item for $t$ in $S/R$ lies to the
483: left of $r$, and if $t$ lies to the right of $r$ ($t>r$), the assigned
484: item for $t$ in $S/R$ lies to the right of $r$.
485: These together imply that $H(r) = k$.
486:
487: We now show that the points in $\A^{-1}(t) \setminus \{r\}$
488: have height values $k+1, k+2, \ldots$, in order from
489: smallest to largest.
490: By Lemma~\ref{lem:neighbor}, $T$ contains no points in between
491: $s$ and $t$, for each $s \in \A^{-1}(t)$. Then
492: the points in $R \cap \A^{-1}(t)$ have incrementally
493: increasing height values. It follows that the height of
494: the $k^{th}$ smallest element of $R$ is $k$.
495: \end{proof}
496:
497: \medskip
498: \noindent
499: Let $H_R$ represent the height function restricted to
500: sets $S \setminus R$ and $T$. This means that for each $x$,
501: $H_R(x)$ is the
502: %% absolute value of the
503: difference between the number of points in $S \setminus R$ and the
504: number of points in $T$ restricted to the interval $(-\infty, x]$.
505:
506: \begin{lemma}
507: The cost of
508: %% an optimal
509: assignment $\A_R$ is
510: \begin{equation}
511: \sum_{r \in R} |r - N(r)| + \int_{-\infty}^{+\infty} |H_R(x)|dx
512: \label{eq:removecost}
513: \end{equation}
514: \label{lem:optcost}
515: \end{lemma}
516: \begin{proof}
517: By Lemma~\ref{lem:one} we have that the contribution of
518: $S \setminus R$ to the cost of $\A_R$ is
519: $\int_{-\infty}^{+\infty} |H_R(x)|dx$.
520: Since each point in $R$ maps to its nearest neighbor, the
521: contribution of $R$ to the cost of $\A_R$ is
522: $\sum_{r \in R} |r - N(r)|$. These together conclude the
523: lemma.
524: \end{proof}
525:
526: \begin{theorem}
527: Let $R \subset S$ be a subset of size $|R| = |S| - |T|$ with
528: two properties:
529: \begin{enumerate}
530: \item[i.] The $k^{th}$ smallest element of $R$ has height $k$.
531: \item[ii.] $R$ minimizes the quantity from (\ref{eq:removecost}).
532: \end{enumerate}
533: Then $\A_R$ defines a minimum cost assignment from $S$ to
534: $T$.
535: %%assignment.
536: \label{thm:properties}
537: \end{theorem}
538: \begin{proof}
539: %% Use Lemma~\ref{lem:height} to justify (1). Use
540: %% Lemma~\ref{lem:optcost} to justify (2).
541: By Lemma~\ref{lem:height}, we know that there exists a set $R$ that
542: satisfies {\em (i)}. By Lemma~\ref{lem:optcost},
543: $R$ satisfies {\em (ii)}. It follows that $\A_R$ is a minimum
544: cost assignment from $S$ to $T$.
545: \end{proof}
546:
547: \section{Computing a Minimum Cost Assignment}
548: \label{sec:many-one}
549: Theorem~\ref{thm:properties} gives an exact description of the set $R$
550: that yields a minimum cost assignment $\A_R$. We now turn to the
551: problem of efficiently determining this set. With this goal in mind,
552: we introduce the following notation.
553: For any point $x$ and any integer $k$, define the
554: {\em relative height} of $x$ with respect to $k$ as
555: \[ h^{k}(x) = \left\{
556: \begin{tabular}{ll}
557: $1$, & if $H(x) \ge k$ \\
558: $-1$, & if $H(x) < k$
559: \end{tabular}
560: \right.
561: \]
562: \noindent
563: Observe that when a point $s$ is removed from $S$,
564: $H(x)$ decreases by 1 for all $x > s$. Suppose
565: that $H(s) = k$, and let $m$ be the largest point in $S \cup T$.
566: The removal of $s$ causes the area under the height
567: function between $s$ and $m$ to decrease by the quantity $\int_s^{m}
568: h^{k}(x) dx$. We use this observation to define the {\em profit} of
569: removing $s$ from $S$ and placing it in $R$ (recall that $\A_R$
570: assigns each item in $R$ to its nearest neighbor), as follows:
571: \begin{equation}
572: P(s) = \int_s^{m} h^{k}(x) dx - |s - N(s)|
573: \label{eq:profdef}
574: \end{equation}
575:
576: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
577: \begin{figure}[htbp]
578: \centering
579: \includegraphics[width=0.6\linewidth]{Figures/before.remove.eps}
580: \caption{A depiction of the integral $\int_s^{m}h^{k}(x) dx$ for $s =
581: 4$. The integral represents the effect of excluding $4$ from the
582: one-to-one assignment from $S$ to $T$.}
583: \label{fig:relheight}
584: \end{figure}
585: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
586:
587: The profit function quantifies the benefit of placing $s$ in $R$, the
588: goal being to minimize the cost of the assignment defined by $\A_R$.
589: The integral term in~(\ref{eq:profdef}) represents the effect of
590: excluding $s$ from the one-to-one assignment from $S\setminus R$ to $T$,
591: as depicted in Figure \ref{fig:relheight}.
592: The term $|s - N(s)|$ in~(\ref{eq:profdef}) represents the cost
593: of assigning $s$ to its nearest neighbor.
594: We minimize the cost of the assignment defined by $\A_R$ by
595: choosing items $s$ that maximize $P(s)$.
596: This is formalized in the following lemma.
597:
598: \begin{lemma}
599: Let $R \subset S$ be a set with elements
600: $r_1 < r_2 \ldots < r_{|S|-|T|}$ such that $H(r_k) = k$
601: and $r_k$ maximizes $P(s)$ among all points $s\in S$ of
602: height $k$. Then $R$ minimizes
603: \[ \sum_{r \in R} |r - N(r)| +
604: \int_{-\infty}^{+\infty} |H_R(x)|dx\]
605: \label{lem:cost.optimal}
606: \end{lemma}
607: \begin{proof}
608: Karp and Li~\cite{KL75} proved that any set $R$ of size $|S|-|T|$ whose
609: $k^{th}$ smallest element has height $k$ satisfies the
610: equality
611: \[\int_{-\infty}^{+\infty} |H_R(x)|dx =
612: \int_{0}^m |H(x)|dx -
613: \sum_{r \in R} \int_r^m h^{k}(x) dx\]
614: Summing up the cost contribution of $R$ to both sides of
615: the equality yields
616: \[ \sum_{r \in R} |r - N(r)|
617: + \int_{-\infty}^{+\infty} |H_R(x)|dx
618: =
619: \sum_{r \in R} |r - N(r)|
620: + \int_{0}^m |H(x)|dx
621: - \sum_{r \in R} \int_r^m h^{k}(x) dx \]
622: This is equivalent to
623: \[\sum_{r \in R} |r - N(r)|
624: + \int_{-\infty}^{+\infty} |H_R(x)|dx
625: =
626: \int_{0}^m |H(x)|dx - \sum_{r \in R} P(r) \]
627: Since $P(r_k)$ is maximized at each height $k$ and
628: there is only one element in $R$ at each height, we have
629: that $R$ maximizes $\sum_{r \in R} P(r)$, which in turn
630: minimizes
631: \[\sum_{r \in R} |r - N(r)|
632: + \int_{-\infty}^{+\infty} |H_R(x)|dx\]
633: as required (refer to Lemma~\ref{lem:optcost}).
634: \end{proof}
635:
636: The following algorithm uses the preceding lemma to determine
637: the optimal set $R$, and then compute the minimum cost
638: assignment.
639:
640: \subsection{The Assignment Algorithm}
641: Initially $R$ is the empty set.
642: \begin{itemize}
643: \item[1.] Sort $S$ and $T$.
644: \item[2.] Calculate $H(x)$ for each $x \in S \cup T$. In between
645: consecutive points, $H$ is constant.
646: \item[3.] Calculate $P(s)$ for each $s \in S$.
647: \item[4.] For $k = 1, 2, \ldots |S|-|T|$
648: \begin{itemize}
649: \item[4.1] Find the leftmost point $r_k$ of height $k$
650: that maximizes $P(r_k)$.
651: \item[4.2] Add $r_k$ to $R$.
652: \end{itemize}
653: %\item[5.] Compute the minimum cost assignment $\A$ from
654: % $S \setminus R$ to $T$.
655: \item[5.] Return $\A_R$.
656: \end{itemize}
657:
658: \begin{lemma}
659: The assignment algorithm computes a minimum cost assignment
660: from $S$ to $T$.
661: \end{lemma}
662: \begin{proof}
663: Let $r_k$ be the element of $R$ of height $k$ returned by the
664: algorithm. If we show that $r_1 < r_2 < \ldots < r_{|S|-|T|}$, then it
665: follows by Lemma~\ref{lem:cost.optimal} that $\A_R$ is a minimum cost
666: assignment. We prove below, by contradiction, that indeed $r_1
667: < r_2 < \ldots < r_{|S|-|T|}$.
668: %% So in proving that $A_R$ is optimal, it suffices to show that $r_1
669: %% < r_2 < \ldots < r_{|S|-|T|}$.
670:
671: Let $m$ be the largest point in $S$.
672: Assume that there exists some $k (1\leq k\leq |S|-|T|-1)$ for which
673: the algorithm returns $r_k$ and $r_{k+1}$, with $r_k > r_{k+1}$. Let
674: $s_k$ be the maximal element at height $k$ in $S \setminus R$ which is
675: less than $r_{k+1}$. By continuity, such an $s_k$ must
676: exist. Similarly, let $s_{k+1}$ be the minimal element
677: at height $k+1$ in $S \setminus R$ which is greater than $r_k$. Such
678: an $s_{k+1}$ must exist since the height at $\infty$ is
679: $H(\infty) = |S|-|T|$. Refer to Figure~\ref{fig:order}.
680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
681: \begin{figure}[htbp]
682: \centering
683: \includegraphics[width=0.38\linewidth]{Figures/order.eps}
684: \caption{$s_k (s_{k+1})$ is the closest point at height $k (k+1)$ to
685: the left (right) of $r_{k+1} (r_k)$.}
686: \label{fig:order}
687: \end{figure}
688: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
689:
690: Since
691: $H(r_{k+1}) = H(s_{k+1})$ and $r_{k+1} < s_{k+1}$, we have that
692: \[\int_{r_{k+1}}^{m}h^{k+1}(x)dx =
693: \int_{r_{k+1}}^{s_{k+1}}h^{k+1}(x)dx +
694: \int_{s_{k+1}}^{m}h^{k+1}(x)dx\]
695: From this and equation~(\ref{eq:profdef}), we can derive the
696: following relation between the profit functions of $r_{k+1}$ and
697: $s_{k+1}$:
698: \begin{equation}
699: P(r_{k+1}) = P(s_{k+1}) + \int_{r_{k+1}}^{s_{k+1}} h^{k+1}(x) dx -
700: |r_{k+1} - N(r_{k+1})| +
701: |s_{k+1} - N(s_{k+1})|
702: \label{eq:profit1}
703: \end{equation}
704: Note that equality (\ref{eq:profit1}) is the result of breaking up the
705: integral corresponding to $P(r_{k+1})$ into two parts, and taking
706: into account the distance from each element to its nearest neighbor.
707: Similarly, we can derive the following relation
708: between $P(r_k)$ and $P(s_k)$:
709: \begin{equation}
710: P(s_k) = P(r_k) + \int_{s_{k}}^{r_{k}} h^{k}(x) dx -
711: |s_{k} - N(s_{k})| + |r_{k} - N(r_{k})|
712: \label{eq:profit2}
713: \end{equation}
714: The nearest neighbor of $s_k$ cannot be farther than
715: $N(r_{k+1})$. This translates into:
716: \[ |s_k - N(s_k)| \leq |r_{k+1} - N(r_{k+1})| +
717: |s_k - r_{k+1}| \]
718: Also note that $h^k(x)$ is positive on the interval $(s_k, r_{k+1})$,
719: which allows us to rewrite the previous equation as:
720: \begin{equation}
721: |s_k - N(s_k)| \leq |r_{k+1} - N(r_{k+1})| +
722: \int_{s_k}^{r_{k+1}}h^k(x) dx
723: \label{eq:profit3}
724: \end{equation}
725: Similar arguments lead to the following relationship between
726: nearest neighbors of $r_k$ and $s_{k+1}$:
727: \begin{equation}
728: |r_k - N(r_k)| \geq |s_{k+1} - N(s_{k+1})| +
729: \int_{r_k}^{s_{k+1}}h^{k+1}(x) dx
730: \label{eq:profit4}
731: \end{equation}
732: Finally, on the interval $(r_{k+1}, r_k)$ note that
733: \begin{equation}
734: \int_{r_{k+1}}^{r_{k}}h^{k+1}(x) dx \leq
735: \int_{r_{k+1}}^{r_{k}}h^{k}(x) dx
736: \label{eq:profit5}
737: \end{equation}
738: Let $M_k = |s_{k} - N(s_{k})| - |r_{k} - N(r_{k})|$.
739: Simple arithmetic that involves inequalities (\ref{eq:profit3}),
740: (\ref{eq:profit4}) and (\ref{eq:profit5}) yields
741: \[\int_{s_{k}}^{r_{k}} h^{k}(x) dx - M_k
742: \geq \int_{r_{k+1}}^{s_{k+1}} h^{k+1}(x) dx + M_{k+1} \]
743: This along with (\ref{eq:profit1}) and (\ref{eq:profit2})
744: implies that
745: \[ P(s_k) - P(r_k) \geq P(r_{k+1}) - P(s_{k+1})\]
746:
747: Since $r_{k+1}$ was picked by the assignment algorithm, we have that
748: $P(r_{k+1})\geq P(s_{k+1})$. This implies that $P(s_k)\geq P(r_k)$,
749: but since $s_k$ lies to the left of $r_k$, the assignment algorithm would
750: have picked $s_k$ instead of $r_k$, a contradiction.
751: %% if $P(r_{k+1}) \geq P(s_{k+1})$, then $P(s_{k}) \geq P(r_{k})$. But
752: %% then the assignment algorithm would have added $s_k$ to $\A$ instead of
753: %% $r_k$, a contradiction.
754: \end{proof}
755:
756: \subsection{Complexity Analysis}
757: Sorting in step 1 takes $O(n \log n)$ time. All other steps run in
758: $O(n)$ time. The only steps where this is not obvious are steps 2 and
759: 3 that involve computing $H(x)$ and $P(x)$ respectively.
760: $H(x)$ can be computed
761: for all $s \in S$ by conducting a sweep of the sorted points in $S
762: \cup T$, adding one when we encounter an element of $S$ and
763: subtracting one when we encounter an element of $T$.
764:
765: Since all nearest neighbors of the elements of $S$ can easily be computed in
766: linear time, to show that we can compute the profit function for all
767: elements of $S$ in linear time we concern ourselves only with computing
768: the integral of relative height function $h^k$. This
769: integral can be computed in linear time for all points in $S$ at
770: height $k$ in a sweep from right to left.
771: For the rightmost element $s_r$ of $S$ at height $k$
772: %% has a relative height function equal to
773: $\int_{s_r}^{m}h^{k}(x)dx = |s_r - m|$, where $m$ is the largest
774: point in $S$. Suppose that we know $\int_{s}^{m}h^{k}(x)dx$ for some
775: item $s$ at height $k$.
776: %% the relative height of some element $s$ at height $k$.
777: Let $s' < s$ be the largest element in $S$
778: also at height $k$, and let $t < s$ be the largest element
779: in $T$ at height $k$. Note that by continuity, $t$ exists and
780: must be greater than $s'$. Also note that $h^k(x)$ is positive for
781: all $s'\leq x\leq t$, and $h^k(x)$ is negative for all $t<x<s$.
782: %% over the interval $(s',t)$ and negative over the interval $(t,s)$.
783: Thus we can derive the following equation:
784: \begin{equation}
785: \int_{s'}^{m} h^k(x)dx = \int_{s}^{m} h^k(x)dx + |s' - t| - |t - s|
786: \end{equation}
787: This value can be computed in constant time for each $s' \in S$.
788: Thus we can compute $P(s)$ for all $s \in S$ in linear time.
789:
790: It follows that the assignment algorithm runs in $O(n \log n)$
791: time. Furthermore, if $S$ and $T$ are given in sorted order,
792: the assignment algorithm runs in $O(n)$ time.
793:
794: \section{Conclusion}
795: We have shown that the one-to-one assignment algorithm in ~\cite{KL75}
796: can be extended to produce a minimum cost many-to-one assignment. The
797: algorithm runs in $O(n \log n)$ time, if the input points are given in
798: arbitrary order, and in $O(n)$ time, if the input points are presorted.
799: To our knowledge, this is the first solution to the assignment problem
800: that achieves this time complexity.
801:
802: \begin{thebibliography}{1}
803:
804: \bibitem{ABKKS95}
805: A.~Aggarwal, A.~Bar-Noy, S.~Khuller, D.~Kravets, and B.~Schieber.
806: \newblock Efficient minimum cost matching and transportation using the
807: quadrangle inequality.
808: \newblock {\em J. Algorithms}, 19(1):116--143, 1995.
809:
810: \bibitem{BKSS03}
811: A.~Ben-Dor, R.M. Karp, B.~Schwikowski, and R.~Shamir.
812: \newblock The restriction scaffold problem.
813: \newblock {\em Journal of Computational Biology}, 10(2):385--398, 2003.
814:
815: \bibitem{BY98}
816: S.R. Buss and P.N.Yianilos.
817: \newblock Linear and o(n log n) time minimum-cost matching algorithms for
818: quasi-convex tours.
819: \newblock {\em SIAM J. of Computing}, 27(1):170--201, 1998.
820:
821: \bibitem{CT05}
822: J.~Colannino and G.~Toussaint.
823: \newblock An algorithm for computing the restriction scaffold assignment
824: problem in computational biology.
825: \newblock Technical Report~2, McGill University, 2005.
826:
827: \bibitem{bib:eiter97distance}
828: Thomas Eiter and Heikki Mannila.
829: \newblock Distance measures for point sets and their computation.
830: \newblock {\em Acta Informatica}, 34(2):109--133, 1997.
831:
832: \bibitem{KL75}
833: R.M. Karp and S.-Y.R. Li.
834: \newblock Two special cases of the assignment problem.
835: \newblock {\em Discrete Mathematics}, 13(46):129--142, 1975.
836:
837: \bibitem{bib:toussaintsimilarity}
838: Godfried Toussaint.
839: \newblock A comparison of rhythmic similarity measures.
840: \newblock In {\em Proc. 5th International Conference on Music Information
841: Retrieval}, pages 242--245, 2004.
842:
843: \bibitem{T03}
844: G.T. Toussaint.
845: \newblock Classification and phylogenetic analysis of african ternary rhythm
846: timelines.
847: \newblock In {\em Proceedings of BRIDGES: Mathematical Connections in Art,
848: Music and Science}, pages 25--36, 2003.
849:
850: \bibitem{WPMK86}
851: M.~Werman, S.~Peleg, R.~Melter, and T.~Kong.
852: \newblock Bipartite graph matching for points on a line or a circle.
853: \newblock {\em J. Algorithms}, 7:277--284, 1986.
854:
855: \end{thebibliography}
856:
857: \end{document}
858: