cs0507013/dsd.tex
1: \documentclass[11pt]{article}
2: \usepackage{url}
3: \usepackage{epsf}
4: \usepackage{epsfig}
5: \usepackage{graphicx}
6: \usepackage{amsfonts}
7: \usepackage{amsmath,amssymb}
8: \usepackage{latexsym}
9: 
10: \addtolength{\textwidth}{0.2in}
11: \addtolength{\evensidemargin}{-0.1in}
12: \addtolength{\oddsidemargin}{-0.1in}
13: %\addtolength{\textheight}{1in}
14: %\addtolength{\topmargin}{-0.5in}
15: 
16: \newcommand{\ABox}{
17: \raisebox{3pt}{\framebox[6pt]{\rule{6pt}{0pt}}}
18: }
19: \newenvironment{proof}{{\bf Proof:}}{\hfill\ABox}
20: 
21: \newtheorem{theorem}{{\bf Theorem}}
22: \newtheorem{corollary}[theorem]{Corollary}
23: \newtheorem{lemma}[theorem]{Lemma}
24: \newtheorem{claim}[theorem]{Claim}
25: \newtheorem{proposition}[theorem]{Proposition}
26: \newtheorem{conjecture}[theorem]{Conjecture}
27: \newtheorem{definition}[theorem]{Definition}
28: \newtheorem{openquestion}[theorem]{Open question}
29: 
30: \newcommand{\R}{\mathcal R}
31: \newcommand{\A}{\mathcal A}
32: \newcommand{\dsd}{\tt DSD}
33: \newcommand{\cost}{\tt cost}
34: 
35: \begin{document}
36: 
37: \title{An $O(n \log n)$-Time Algorithm for the Restricted Scaffold Assignment}
38: \author{
39: Justin Colannino 
40: \and Mirela Damian
41: %\and Erik Demaine 
42: \and Ferran Hurtado 
43: \and John Iacono
44: %\and Stefan Langerman 
45: \and Henk Meijer 
46: %\and Diane Souvaine 
47: \and Suneeta Ramaswami 
48: \and Godfried Toussaint}
49: 
50: \date{}
51: 
52: \maketitle
53: 
54: \begin{abstract}
55: The {\em assignment} problem takes as input two finite point sets
56: $S$ and $T$ and establishes a correspondence between points in $S$ 
57: and points in $T$, such that each point in $S$ maps to exactly 
58: one point in $T$, and each point in $T$ maps to at least one 
59: point in $S$. In this paper we show that this problem has an 
60: $O(n \log n)$-time solution, provided that the points in 
61: $S$ and $T$ are restricted to lie on a line 
62: (linear time, if $S$ and $T$ are presorted). 
63: \end{abstract}
64: 
65: \section{Introduction}
66: Consider two finite sets of points $S$ and $T$ with total 
67: cardinality $n$.  
68: The objective of the {\em assignment} problem is to establish a
69: correspondence between the points in $S$ and the points in $T$,
70: such that each point in $S$ corresponds to exactly one point in 
71: $T$, and each point in $T$ corresponds to at least one point 
72: in $S$. 
73: This correspondence is measured by a cost function $\delta$ that
74: assigns a cost $\delta(s, t)$ to each assigned pair $(s, t)$. 
75: The cost of an assignment is the sum of the costs of all assigned
76: pairs. The goal of the assignment problem is to find an assignment
77: of minimum cost.
78: 
79: The general assignment problem is also known as the 
80: {\em many-to-one assignment} problem. 
81: The {\em one-to-one} version of the assignment problem requires 
82: that each point in $S$ maps to exactly one point in $T$ and 
83: each point in $T$ gets mapped exactly one point in $S$.  
84: %Such an assignment is undefined when $|S| \neq |T|$.  
85: Throughout the paper, whenever we talk about the assignment
86: problem, we refer to the many-to-one version of the problem.
87: 
88: The simplest version of the assignment problem assumes
89: that the points in $S$ and $T$ lie on a line and the cost function 
90: is the $L_1$ metric.
91: In this setting, the one-to-one assignment problem 
92: has a simple $O(n \log n)$ time solution when $|S| = |T|$:
93: first sort the points in $O(n \log n)$ time, then map the $k^{th}$ point 
94: in $S$ to the $k^{th}$ point in $T$ in $O(n)$ time
95: \cite{bib:toussaintsimilarity}. 
96: However, the situation $|S| < |T|$ arises in many practical
97: applications. 
98: This situation was first addressed by Karp and Li~\cite{KL75},
99: who provided an $O(n \log n)$ time algorithm for the one-to-one 
100: assignment problem ($O(n)$ time, if $S$ and $T$ are given in 
101: sorted order). 
102: Simpler and equally efficient solutions have later been provided 
103: in ~\cite{ABKKS95, BY98, WPMK86}. 
104: 
105: Eiter and Mannila\cite{bib:eiter97distance} studied the assignment
106: problem in the context of measuring the distance between two
107: theories expressed in a logical language. They showed that for points
108: in arbitrary dimensions, this problem has a polynomial time solution. 
109: When restricted to points on a line, a minimum cost assignment 
110: can be used in measuring the similarity between musical 
111: rhythms. In this context, Toussaint~\cite{T03} proposed the use 
112: of the {\em directed swap distance} as a similarity measure.  
113: If the onsets of a rhythm are represented as points on a line 
114: separated by ``silence'' 
115: intervals, the directed swap distance between two rhythms with onset 
116: sets $S$ and $T$ is precisely the cost of an optimal 
117: assignment between $S$ and $T$, with underlying cost function $L_1$.
118: 
119: The assignment problem also appears in the shape of the 
120: {\em restriction scaffold assignment} problem in computational
121: biology~\cite{BKSS03}.
122: The goal here is to establish a correspondence between sparse 
123: experimental data and a restricted set of known structural 
124: building blocks. Ben-Dor et. al.~\cite{BKSS03} model the 
125: restriction scaffold assignment as an assignment problem
126: for points on a line, and provide an $O(n \log n)$ time algorithm to
127: solve this problem. However, as later shown by Colannino and 
128: Toussaint~\cite{CT05}, this algorithm fails to always produce
129: a minimum cost assignment. Thus, the best existing
130: solution to the assignment problem in one dimension
131: is the $O(n^2)$ algorithm presented in~\cite{CT05}.
132: 
133: In this paper, we show that the assignment problem 
134: with underlying cost function $L_1$ in one dimension 
135: can be solved in $O(n \log n)$ time 
136: ($O(n)$ if the points in $S$ and $T$ are given in sorted order). 
137: Our algorithm is a simple extension of the $O(n \log n)$ time 
138: algorithm of Karp and
139: Li~\cite{KL75} for finding the minimum cost {\em one-to-one} 
140: assignment over $T$ and all subsets $S' \subset S$ of size $|T|$, 
141: assuming $|S| > |T|$. 
142: We present our algorithm in Section~\ref{sec:many-one}, 
143: after a few preliminary results (Section~\ref{sec:preliminaries}) 
144: and a close look at some properties of an optimal solution 
145: (Section~\ref{sec:properties}). 
146: 
147: \section{Background}
148: \label{sec:definition}
149: Let $S = \{s_0, s_1, s_2, \ldots\}$ and $T = \{t_0, t_1, t_2,
150: \ldots\}$ be two finite sets of points that lie on a horizontal line,
151: with $|S|+|T|=n$ and $|S| > |T|$.  For any $s \in S$ and $t \in T$,
152: the cost $\delta(s, t)$ of an assigned pair $(s, t)$ is the absolute value
153: of the difference between the $x$-coordinates of $s$ and $t$. To avoid
154: overloading the notation, we use the same symbol for a point and its
155: $x$-coordinate. Thus, $\delta(s, t) = |s - t|$. We assume that $s_i <
156: s_{i+1}, 0\leq i < |S|-1$ and $t_j < t_{j+1}, 0\leq j < |T|-1$.
157:  
158: An assignment $\A$ between $S$ and $T$
159: consists of pairs of points $(s, t)$ (henceforth {\em edges}), 
160: with $s \in S$ and $t \in T$, such that each point in $S$ belongs to 
161: exactly one edge in $\A$, and each point in $T$ belongs to at least 
162: one edge in $\A$. The cost of $\A$ is
163: \[ cost(\A) = \sum_{(s, t) \in \A} \delta(s, t) \] 
164: Our goal is to find an assignment $\A$ of minimum cost.
165: If two points in $S \cup T$ have the same $x$-coordinate, we can
166: slightly shift one of them to the left or right. If the minimum cost
167: assignment is unique and the change is sufficiently small, this change
168: will not affect the optimal assignment.  If there are several 
169: assignments with the same optimal cost, at least one of them 
170: will be the optimal solution of the new point set. So we may 
171: assume without loss of generality that all points in $S \cup T$ are 
172: distinct.
173: 
174: \subsection{Preliminaries}
175: \label{sec:preliminaries}
176: %
177: For any $s \in S$ and $t \in T$, the value $|s - t|$ 
178: can be expressed in a different way as follows. 
179: Define a function $f_{s,t}$ to be $1$ in the interval 
180: between $s$ and $t$ and $0$ at any other 
181: point (see Figure~\ref{fig:intcost}). 
182: Then $|s - t| = \int_{-\infty}^{+\infty} f_{s,t}(x) dx$.
183: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
184: \begin{figure}[htbp]
185: \centering
186: \includegraphics[width=0.34\linewidth]{Figures/int.cost.eps}
187: \caption{Function $f_{s,t}$. Shaded area represents the 
188: cost $|s-t|$.}
189: \label{fig:intcost}
190: \end{figure}
191: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
192: 
193: \noindent
194: The cost of an assignment $\A$ is therefore 
195: \[
196:   cost(\A) = \sum_{(s,t)\in \A} \int_{-\infty}^{+\infty} f_{s,t}(x) dx
197:            = \int_{-\infty}^{+\infty} \sum_{(s,t) \in \A} f_{s,t}(x) dx
198: \]
199: If we define
200: \[
201:   f_{\A}(x) = \sum_{(s,t) \in \A} f_{s,t}(x)  
202: \]
203: then the value $f_{\A}(a)$ is simply the number of 
204: edges in $\A$ pierced by the vertical line $x = a$, 
205: and the cost of $\A$ is
206: 
207: \begin{equation}
208:   cost(\A) = \int_{-\infty}^{+\infty} f_{\A}(x) dx
209: \label{eq:cost}
210: \end{equation}
211: 
212: Our definition of $f_{\A}$ is similar in nature to the {\em height}
213: function $H:\mathbb{R}\rightarrow\mathbb{Z}$ introduced by Karp and Li
214: \cite{KL75}. Informally, they define $H(a)$ at each point $a$ as the
215: difference between the number of points in $S$ and the number of
216: points in $T$ restricted to the interval $(-\infty, a]$ (or
217: equivalently, to the left of the vertical line $x = a$).  Thus $H$
218: remains constant throughout each interval that does not contain a
219: point in $S \cup T$.  Figure~\ref{fig:height} shows the stair-shaped
220: curve of $H$ for a small example.
221: %$S = \{0, 4, 6, 13, 14, 16\}$ 
222: %and $T = \{1, 2, 8, 10, 11, 12\}$.
223: Note that {\em up} transitions in the curve correspond to 
224: points in $S$ and {\em down} transitions correspond
225: to points in $T$.  We refer to the value $H(x)$ as the {\em height} of
226: $x$. Note that $H(\infty) = |S|-|T|$.
227: 
228: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
229: \begin{figure}[htbp]
230: \centering
231: \includegraphics[width=0.6\linewidth]{Figures/height.eps}
232: \caption{Height function for sets 
233: $S = \{0, 3, 4, 6, 13, 14, 15, 16\}$ 
234: and $T = \{1, 2, 8, 10, 11, 12\}$.}
235: \label{fig:height}
236: \end{figure}
237: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
238: 
239: 
240: \begin{lemma}
241: If $|S| = |T|$, then $\int_{-\infty}^{+\infty} |H(x)|~dx$ 
242: is the cost of the assignment that assigns the
243: $k^{th}$ largest element of $S$ to the $k^{th}$ 
244: largest element of $T$.
245: \label{lem:one}
246: \end{lemma}
247: \begin{proof}
248: Follows immediately from (\ref{eq:cost}) and the fact that,
249: for this particular assignment, $f_{\A}(x) = |H(x)|$ at each 
250: point $x$. 
251: \end{proof}
252: 
253: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
254: \begin{figure}[htbp]
255: \centering
256: \begin{tabular}{cc}
257: \includegraphics[width=0.6\linewidth]{Figures/equal.assignment.eps} & 
258: \raisebox{7ex}{(a)} \\
259:  & \\
260: \includegraphics[width=0.6\linewidth]{Figures/equal.cost.eps} & 
261: \raisebox{7ex}{(b)}
262: \end{tabular}
263: \caption{
264: (a) One-to-one assignment for sets
265: $S = \{0, 4, 6, 13, 14, 16\}$ and 
266: $T = \{1, 2, 8, 10, 11, 12\}$
267: (b) Shaded area represents the cost of the assignment.}
268: \label{fig:equal.assign}
269: \end{figure}
270: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
271: 
272: \noindent
273: Figure~\ref{fig:equal.assign}a shows an assignment 
274: for two sets $S$ and $T$, with $|S| = |T|$. 
275: The cost of this assignment is equal
276: to the area shaded in Figure~\ref{fig:equal.assign}b,
277: which is precisely the value of the integral 
278: $\int_{-\infty}^{+\infty} |H(x)|~dx$. 
279: 
280: \section{Properties of a Minimum Cost Assignment}
281: \label{sec:properties}
282: Our algorithm for computing a minimum cost assignment $\A$
283: exploits several important properties of $\A$, which 
284: we discuss next.
285: A {\em crossing} is defined by a pair of 
286: edges $(a, d)$ and $(b, c)$ such that $a < b$ in $S$ 
287: and $c < d$ in $T$. 
288: 
289: \begin{lemma}
290: There exists a minimum cost assignment with no crossings.
291: \label{lem:monotone}
292: \end{lemma}
293: \begin{proof}
294: Let $\A$ be a minimum cost assignment between $S$ and $T$ 
295: with a minimum number of crossings. If $\A$ has zero 
296: crossings, the proof is finished. Otherwise, pick 
297: two crossing edges $(a, d)$ and $(b, c)$ in $\A$, 
298: with $a < b$ in $S$ and $c < d$ in $T$.
299: We show that $\A' = \A \setminus \{(a, d), (b, c)\} 
300: \cup \{(a, c), (b, d)\}$ is an assignment with 
301: $cost(\A') \le cost(\A)$, a contradiction.
302: In particular, we show that 
303: $f_{\A'}(x) \le f_{\A}(x)$ at each point $x$; then
304: $cost(\A') \le cost(\A)$ follows immediately
305: from~(\ref{eq:cost}). 
306: 
307: First note that $f_{\A'}(x) \le f_{\A}(x)$ is true for 
308: any $x$ such that the vertical line $L$ at $x$ intersects 
309: neither of $(a,d)$ and $(b,c)$. 
310: Suppose now that $L$ intersects $(a,c)$. Then $L$ must 
311: also intersect either $(a, d)$ (see Figure~\ref{fig:crossing}a) 
312: or $(b, c)$ (see Figure~\ref{fig:crossing}b) or both
313: (see Figure~\ref{fig:crossing}c).
314: Similarly, if $L$ intersects $(b,d)$, then $L$ also intersects
315: at least one of $(a, d)$ and $(b, c)$. 
316: Furthermore, if $L$ intersects
317: both $(a, c)$ and $(b, d)$, then $L$ also intersects
318: both $(a, d)$ and $(b, c)$ (see Figure~\ref{fig:crossing}c).  
319: It follows that $f_{\A'}(x) \le f_{\A}(x)$.
320: \end{proof}
321: 
322: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
323: \begin{figure}[htbp]
324: \centering
325: \begin{tabular}{c@{\hspace{0.06\linewidth}}
326:                 c@{\hspace{0.06\linewidth}}c}
327: \includegraphics[width=0.22\linewidth]{Figures/crossing1.eps} & 
328: \includegraphics[width=0.25\linewidth]{Figures/crossing2.eps} & 
329: \includegraphics[width=0.24\linewidth]{Figures/crossing3.eps} \\
330: (a) & (b) & (c)
331: \end{tabular}
332: \caption{
333: (a) Vertical line $L$ intersects $(a,c)$ and $(a,d)$ 
334: (b) $L$ intersects $(a, c)$ and $(b,c)$
335: (c) $L$ intersects 
336: $(a,c)$, $(b,d)$, $(a,d)$ and $(b,c)$.}
337: \label{fig:crossing}
338: \end{figure}
339: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
340: 
341: 
342: %In an assignment $\A$, each point 
343: %$s \in S$ belongs to exactly one edge in $\A$ and 
344: %each $t\in T$ belongs to at least one edge in $\A$.
345: %Therefore, $\A$ can be associated with an 
346: %assignment (function) $\A : S \to T$ such that 
347: %$\A(s) = t$ for each $(s, t) \in \A$. 
348: 
349: An assignment $\A$ can also be regarded as a function 
350: $\A : S \to T$ such that $\A(s) = t$ for each 
351: $(s, t) \in \A$. 
352: For any $t \in T$, let $\A^{-1}(t)$ denote the set of
353: elements $s \in S$ such that $\A(s) = t$. 
354: For each point $s \in S$, define the 
355: {\em nearest neighbor}  $N(s)$ to be point in 
356: $T$ closest to $s$, i.e, $|N(s) - s|\le |t - s|$
357: for any $t\in T$. In the case of a tie, $N(s)$ is
358: arbitrarily picked from among the two candidate
359: neighbors.
360: 
361: \begin{lemma}
362: Let $\A$ be optimal and let $t \in T$ be such that
363: $\A^{-1}(t)$ contains two or more elements. 
364: Then for each $s \in \A^{-1}(t)$, 
365: $t$ is a nearest neighbor of $s$. Furthermore, 
366: $T$ contains no points in between $s$ and $t$. 
367: \label{lem:neighbor}
368: \end{lemma}
369: \begin{proof}
370: Assume to the contrary that there is $s \in S$ with
371: $\A(s) = t, |\A^{-1}(t)| > 1,$ and $N(s) \neq t$. Refer to
372: Figure~\ref{fig:neighbor}. 
373: Define a new assignment $\A'$ with $\A'(s) = N(s)$ and 
374: $\A'(x) = \A(x)$ for $x \neq s$. Note that $\A'$ 
375: is also an assignment: 
376: $\A^{-1}(t)$ contains at least one point. 
377: Also $cost(\A') = cost(\A) - |s-t| + |s-N(s)|$ (see
378: Figures~\ref{fig:neighbor}a and~\ref{fig:neighbor}b). 
379: %% at any other point. 
380: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
381: \begin{figure}[htbp]
382: \centering
383: \begin{tabular}{c@{\hspace{1in}}c}
384: \includegraphics[width=0.3\linewidth]{Figures/neighbor.not.eps} &
385: \includegraphics[width=0.3\linewidth]{Figures/neighbor} \\
386: (a) & (b)
387: \end{tabular}
388: \caption{
389: (a) Assignment $\A$ with $\A(s)\neq N(s)$
390: (b) Assignment $\A'$ with $\A'(s) = N(s)$}
391: \label{fig:neighbor}
392: \end{figure}
393: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
394: Since $|s-N(s)| < |s-t|$, it follows that 
395: $cost(\A') < cost(\A)$, 
396: contradicting the fact that $\A$ is of minimum cost.
397: Thus, $t$ is a nearest neighbor of $s$. 
398: 
399: The claim that $T$ contains no points in between 
400: $s$ and $t$ is immediate: if such a point $t_1\in T$ existed,
401: then $|s-t_1| < |s-t|$, 
402: contradicting the fact that $N(s) = t$.
403: \end{proof}
404: 
405: \medskip
406: \noindent
407: Observe that for any subset $R \subset S$ of size $|R| = |S|-|T|$,
408: there is a unique minimum cost assignment (with no crossings) 
409: from $S\setminus R$ to $T$. Let $\A_{S\setminus R}$ denote the 
410: edges of such an assignment, and define a new
411: assignment $\A_R: S\rightarrow T$ as follows:
412: \begin{equation}
413: \A_R(x) =
414: \begin{cases} N(x) & \text{if $x\in R$,}\\
415: y & \text{if $x\in S\setminus R$ and $(x,y)\in \A_{S\setminus R}$}
416: \end{cases}
417: \label{eq:ar}
418: \end{equation}
419: 
420: \vspace*{0.1in}
421: \noindent Lemma~\ref{lem:neighbor} implies that there always exists 
422: a subset $R$ such that $\A_R$ defines a minimum cost assignment 
423: from $S$ to $T$. Furthermore, $R$ satisfy a special height 
424: condition, stated in the lemma below. 
425: 
426: \begin{lemma}
427: %% There exists a minimum cost assignment $\A_R$ with the property that 
428: %% the $k^{th}$ smallest element of $R$ has height $k$.
429: There exists a subset $R\subset S$ with $|R|=|S|-|T|$ such that $\A_R$
430: defines a minimum cost assignment from $S$ to $T$, and the
431: $k^{th}$ smallest element of $R$ has height $k$.
432: \label{lem:height}
433: \end{lemma}
434: 
435: \begin{proof}
436: Let $\A:S\rightarrow T$ define a minimum cost assignment. We prove the 
437: existence of $\A_R$ by constructing a set $R \subset S$ 
438: with the properties stated in this lemma. Initially $R$ is empty.  
439: If $|\A^{-1}(t)| = 1$ for all $t\in T$, then $R$ is empty and 
440: the proof is finished.
441: Otherwise, we process points $t \in T$ for
442: which $\A^{-1}(t)$ has two or more elements. 
443: For each such point we consider two cases, 
444: as depicted in Figure~\ref{fig:lemma4cases}.  
445: If all points in $\A^{-1}(t)$ are less than $t$, then we 
446: add to $R$ all but the largest (rightmost) point in $\A^{-1}(t)$
447: (see Figure~\ref{fig:lemma4cases}a).
448: Otherwise, we add to $R$ all points in $\A^{-1}(t)$  
449: except for the smallest (leftmost) point greater than $t$
450: (see Figure~\ref{fig:lemma4cases}b).
451: 
452: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
453: \begin{figure}[htbp]
454: \centering
455: \includegraphics[width=0.7\linewidth]{Figures/lemma4cases.eps} 
456: \caption{
457: (a) All points in $\A^{-1}(t)$ are less than $t$.
458: (b) Some points in $\A^{-1}(t)$ are greater than $t$.}
459: \label{fig:lemma4cases}
460: \end{figure}
461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
462: 
463: %% Note that the cost of $\A_{R}$ is minimum, as the assignment has
464: %% not changed.   
465: We now define $A_R$ as in~(\ref{eq:ar}). Since $\A_R$
466: is identical to $\A$, $\A_R$ is a minimum
467: cost many-to-one assignment from $S$ to $T$.
468: 
469: It remains to show that 
470: the $k^{th}$ smallest element of $R$ has 
471: height $k$. To see this, first consider 
472: the smallest element of a nonempty set $\A^{-1}(t)\cap R$. Call this
473: element $r$ and suppose it 
474: is the $k^{th}$ smallest element of $R$. It follows then that (i)
475: $R$ contains  $k-1$ points less than $r$, and 
476: (ii) $T$ and $S \setminus R$ contain an equal number of 
477: elements less than $r$. This latter claim follows from 
478: Lemma~\ref{lem:neighbor}, which tells us that $T$ 
479: contains no elements in between $r$ and $t$, and the following
480: observation: the way in which we have selected $R$
481: ensures that if $t$ lies to the left of $r$ (i.e., $t<r$), the
482: assigned item for $t$ in $S/R$ lies to the 
483: left of $r$, and if $t$ lies to the right of $r$ ($t>r$), the assigned
484: item for $t$ in $S/R$ lies to the right of $r$.
485: These together imply that $H(r) = k$.  
486: 
487: We now show that the points in $\A^{-1}(t) \setminus \{r\}$ 
488: have height values $k+1, k+2, \ldots$, in order from 
489: smallest to largest.
490: By Lemma~\ref{lem:neighbor}, $T$ contains no points in between
491: $s$ and $t$, for each $s \in \A^{-1}(t)$. Then 
492: the points in $R \cap \A^{-1}(t)$ have incrementally 
493: increasing height values. It follows that the height of 
494: the $k^{th}$ smallest element of $R$ is $k$. 
495: \end{proof}
496: 
497: \medskip
498: \noindent
499: Let $H_R$ represent the height function restricted to
500: sets $S \setminus R$ and $T$. This means that for each $x$,
501: $H_R(x)$ is the 
502: %% absolute value of the 
503: difference between the number of points in $S \setminus R$ and the
504: number of points in $T$ restricted to the interval $(-\infty, x]$.
505: 
506: \begin{lemma}
507: The cost of 
508: %% an optimal 
509: assignment $\A_R$ is
510: \begin{equation} 
511: \sum_{r \in R} |r - N(r)| + \int_{-\infty}^{+\infty} |H_R(x)|dx 
512: \label{eq:removecost}
513: \end{equation}
514: \label{lem:optcost}
515: \end{lemma}
516: \begin{proof}
517: By Lemma~\ref{lem:one} we have that the contribution of 
518: $S \setminus R$ to the cost of $\A_R$ is 
519: $\int_{-\infty}^{+\infty} |H_R(x)|dx$. 
520: Since each point in $R$ maps to its nearest neighbor, the 
521: contribution of $R$ to the cost of $\A_R$ is 
522: $\sum_{r \in R} |r - N(r)|$. These together conclude the 
523: lemma.
524: \end{proof}
525: 
526: \begin{theorem}
527: Let $R \subset S$ be a subset of size $|R| = |S| - |T|$ with 
528: two properties:
529: \begin{enumerate}
530: \item[i.] The $k^{th}$ smallest element of $R$ has height $k$.
531: \item[ii.] $R$ minimizes the quantity from (\ref{eq:removecost}).
532: \end{enumerate}
533: Then $\A_R$ defines a minimum cost assignment from $S$ to
534: $T$.
535: %%assignment. 
536: \label{thm:properties}
537: \end{theorem}
538: \begin{proof}
539: %% Use Lemma~\ref{lem:height} to justify (1). Use
540: %% Lemma~\ref{lem:optcost} to justify (2).  
541: By Lemma~\ref{lem:height}, we know that there exists a set $R$ that
542: satisfies {\em (i)}. By Lemma~\ref{lem:optcost}, 
543: $R$ satisfies {\em (ii)}. It follows that $\A_R$ is a minimum 
544: cost assignment from $S$ to $T$.
545: \end{proof}
546: 
547: \section{Computing a Minimum Cost Assignment}
548: \label{sec:many-one}
549: Theorem~\ref{thm:properties} gives an exact description of the set $R$
550: that yields a minimum cost assignment $\A_R$.  We now turn to the
551: problem of efficiently determining this set. With this goal in mind,
552: we introduce the following notation. 
553: For any point $x$ and any integer $k$, define the 
554: {\em relative height} of $x$ with respect to $k$ as 
555: \[ h^{k}(x) = \left\{ 
556: \begin{tabular}{ll}
557: $1$, & if $H(x) \ge k$ \\
558: $-1$, & if $H(x) < k$
559: \end{tabular}
560: \right.
561: \]
562: \noindent
563: Observe that when a point $s$ is removed from $S$, 
564: $H(x)$ decreases by 1 for all $x > s$. Suppose
565: that $H(s) = k$, and let $m$ be the largest point in $S \cup T$.
566: The removal of $s$ causes the area under the height
567: function between $s$ and $m$ to decrease by the quantity $\int_s^{m}
568: h^{k}(x) dx$. We use this observation to define the {\em profit} of
569: removing $s$ from $S$ and placing it in $R$ (recall that $\A_R$
570: assigns each item in $R$ to its nearest neighbor), as follows: 
571: \begin{equation}
572: P(s) = \int_s^{m} h^{k}(x) dx - |s - N(s)|
573: \label{eq:profdef}
574: \end{equation}
575: 
576: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
577: \begin{figure}[htbp]
578: \centering
579: \includegraphics[width=0.6\linewidth]{Figures/before.remove.eps}
580: \caption{A depiction of the integral $\int_s^{m}h^{k}(x) dx$ for $s =
581: 4$.  The integral represents the effect of excluding $4$ from the
582: one-to-one assignment from $S$ to $T$.} 
583: \label{fig:relheight}
584: \end{figure}
585: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
586: 
587: The profit function quantifies the benefit of placing $s$ in $R$, the
588: goal being to minimize the cost of the assignment defined by $\A_R$. 
589: The integral term in~(\ref{eq:profdef}) represents the effect of 
590: excluding $s$ from the one-to-one assignment from $S\setminus R$ to $T$, 
591: as depicted in Figure \ref{fig:relheight}.  
592: The term $|s - N(s)|$ in~(\ref{eq:profdef}) represents the cost 
593: of assigning $s$ to its nearest neighbor.  
594: We minimize the cost of the assignment defined by $\A_R$ by
595: choosing items $s$ that maximize $P(s)$.
596: This is formalized in the following lemma.
597: 
598: \begin{lemma}
599: Let $R \subset S$ be a set with elements 
600: $r_1 < r_2 \ldots < r_{|S|-|T|}$ such that $H(r_k) = k$ 
601: and $r_k$ maximizes $P(s)$ among all points $s\in S$ of 
602: height $k$. Then $R$ minimizes
603: \[ \sum_{r \in R} |r - N(r)| + 
604: \int_{-\infty}^{+\infty} |H_R(x)|dx\]
605: \label{lem:cost.optimal}
606: \end{lemma}
607: \begin{proof}
608: Karp and Li~\cite{KL75} proved that any set $R$ of size $|S|-|T|$ whose 
609: $k^{th}$ smallest element has height $k$ satisfies the 
610: equality
611: \[\int_{-\infty}^{+\infty} |H_R(x)|dx = 
612: \int_{0}^m |H(x)|dx - 
613: \sum_{r \in R} \int_r^m h^{k}(x) dx\]
614: Summing up the cost contribution of $R$ to both sides of 
615: the equality yields 
616: \[ \sum_{r \in R} |r - N(r)| 
617: + \int_{-\infty}^{+\infty} |H_R(x)|dx 
618: = 
619: \sum_{r \in R} |r - N(r)| 
620: + \int_{0}^m |H(x)|dx 
621: - \sum_{r \in R} \int_r^m h^{k}(x) dx \]
622: This is equivalent to
623: \[\sum_{r \in R} |r - N(r)| 
624: + \int_{-\infty}^{+\infty} |H_R(x)|dx 
625: = 
626: \int_{0}^m |H(x)|dx - \sum_{r \in R} P(r) \]
627: Since  $P(r_k)$ is maximized at each height $k$ and 
628: there is only one element in $R$ at each height, we have 
629: that $R$ maximizes $\sum_{r \in R} P(r)$, which in turn 
630: minimizes 
631:    \[\sum_{r \in R} |r - N(r)| 
632:      + \int_{-\infty}^{+\infty} |H_R(x)|dx\]
633: as required (refer to Lemma~\ref{lem:optcost}).
634: \end{proof}
635: 
636: The following algorithm uses the preceding lemma to determine
637: the optimal set $R$, and then compute the minimum cost 
638: assignment. 
639: 
640: \subsection{The Assignment Algorithm}
641: Initially $R$ is the empty set.
642: \begin{itemize}
643: \item[1.] Sort $S$ and $T$.
644: \item[2.] Calculate $H(x)$ for each $x \in S \cup T$. In between
645: consecutive points, $H$ is constant.
646: \item[3.] Calculate $P(s)$ for each $s \in S$.
647: \item[4.] For $k = 1, 2, \ldots |S|-|T|$
648: \begin{itemize}
649: \item[4.1] Find the leftmost point $r_k$ of height $k$ 
650:            that maximizes $P(r_k)$.
651: \item[4.2] Add $r_k$ to $R$.  
652: \end{itemize}
653: %\item[5.] Compute the minimum cost assignment $\A$ from 
654: %          $S \setminus R$ to $T$.
655: \item[5.] Return $\A_R$.
656: \end{itemize}
657: 
658: \begin{lemma}
659: The assignment algorithm computes a minimum cost assignment 
660: from $S$ to $T$. 
661: \end{lemma}
662: \begin{proof}
663: Let $r_k$ be the element of $R$ of height $k$ returned by the
664: algorithm. If we show that $r_1 < r_2 < \ldots < r_{|S|-|T|}$, then it
665: follows by Lemma~\ref{lem:cost.optimal} that $\A_R$ is a minimum cost
666: assignment. We prove below, by contradiction, that indeed $r_1
667: < r_2 < \ldots < r_{|S|-|T|}$.
668: %% So in proving that $A_R$ is optimal, it suffices to show that $r_1
669: %% < r_2 < \ldots < r_{|S|-|T|}$.  
670: 
671: Let $m$ be the largest point in $S$. 
672: Assume that there exists some $k (1\leq k\leq |S|-|T|-1)$ for which
673: the algorithm returns $r_k$ and $r_{k+1}$, with $r_k > r_{k+1}$.  Let
674: $s_k$ be the maximal element at height $k$ in $S \setminus R$ which is
675: less than $r_{k+1}$.  By continuity, such an $s_k$ must
676: exist. Similarly, let $s_{k+1}$ be the minimal element 
677: at height $k+1$ in $S \setminus R$ which is greater than $r_k$.  Such
678: an $s_{k+1}$ must exist since the height at $\infty$ is 
679: $H(\infty) = |S|-|T|$. Refer to Figure~\ref{fig:order}.
680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure Begin
681: \begin{figure}[htbp]
682: \centering
683: \includegraphics[width=0.38\linewidth]{Figures/order.eps} 
684: \caption{$s_k (s_{k+1})$ is the closest point at height $k (k+1)$ to
685: the left (right) of $r_{k+1} (r_k)$.}
686: \label{fig:order}
687: \end{figure}
688: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%Figure End
689: 
690: Since
691: $H(r_{k+1}) = H(s_{k+1})$ and $r_{k+1} < s_{k+1}$, we have that
692: \[\int_{r_{k+1}}^{m}h^{k+1}(x)dx =
693: \int_{r_{k+1}}^{s_{k+1}}h^{k+1}(x)dx +
694:  \int_{s_{k+1}}^{m}h^{k+1}(x)dx\]
695: From this and equation~(\ref{eq:profdef}), we can derive the
696: following relation between the profit functions of $r_{k+1}$ and
697: $s_{k+1}$: 
698: \begin{equation}
699:    P(r_{k+1}) = P(s_{k+1}) + \int_{r_{k+1}}^{s_{k+1}} h^{k+1}(x) dx - 
700:                 |r_{k+1} - N(r_{k+1})| + 
701:                 |s_{k+1} - N(s_{k+1})|
702: \label{eq:profit1}
703: \end{equation}
704: Note that equality (\ref{eq:profit1}) is the result of breaking up the 
705: integral corresponding to $P(r_{k+1})$ into two parts, and taking 
706: into account the distance from each element to its nearest neighbor. 
707: Similarly, we can derive the following relation
708: between $P(r_k)$ and $P(s_k)$:
709: \begin{equation}
710:    P(s_k) =  P(r_k) + \int_{s_{k}}^{r_{k}} h^{k}(x) dx - 
711:              |s_{k} - N(s_{k})| + |r_{k} - N(r_{k})|
712: \label{eq:profit2}
713: \end{equation} 
714: The nearest neighbor of $s_k$ cannot be farther than
715: $N(r_{k+1})$. This translates into: 
716: \[ |s_k - N(s_k)| \leq |r_{k+1} - N(r_{k+1})| + 
717: |s_k - r_{k+1}| \] 
718: Also note that $h^k(x)$ is positive on the interval $(s_k, r_{k+1})$, 
719: which allows us to rewrite the previous equation as: 
720: \begin{equation}
721: |s_k - N(s_k)| \leq |r_{k+1} - N(r_{k+1})| + 
722: \int_{s_k}^{r_{k+1}}h^k(x) dx
723: \label{eq:profit3}
724: \end{equation}
725: Similar arguments lead to the following relationship between 
726: nearest neighbors of $r_k$ and $s_{k+1}$:
727: \begin{equation}
728: |r_k - N(r_k)| \geq |s_{k+1} - N(s_{k+1})| + 
729: \int_{r_k}^{s_{k+1}}h^{k+1}(x) dx
730: \label{eq:profit4}
731: \end{equation}
732: Finally, on the interval $(r_{k+1}, r_k)$ note that 
733: \begin{equation}
734: \int_{r_{k+1}}^{r_{k}}h^{k+1}(x) dx \leq 
735: \int_{r_{k+1}}^{r_{k}}h^{k}(x) dx 
736: \label{eq:profit5}
737: \end{equation}
738: Let $M_k = |s_{k} - N(s_{k})| - |r_{k} - N(r_{k})|$. 
739: Simple arithmetic that involves inequalities (\ref{eq:profit3}), 
740: (\ref{eq:profit4}) and (\ref{eq:profit5}) yields
741: \[\int_{s_{k}}^{r_{k}} h^{k}(x) dx - M_k 
742: \geq \int_{r_{k+1}}^{s_{k+1}} h^{k+1}(x) dx + M_{k+1} \]
743: This along with (\ref{eq:profit1}) and (\ref{eq:profit2}) 
744: implies that 
745: \[ P(s_k) - P(r_k) \geq P(r_{k+1}) - P(s_{k+1})\]
746: 
747: Since $r_{k+1}$ was picked by the assignment algorithm, we have that
748: $P(r_{k+1})\geq P(s_{k+1})$. This implies that $P(s_k)\geq P(r_k)$,
749: but since $s_k$ lies to the left of $r_k$, the assignment algorithm would
750: have picked $s_k$ instead of $r_k$, a contradiction.
751: %% if $P(r_{k+1}) \geq P(s_{k+1})$, then $P(s_{k}) \geq P(r_{k})$. But
752: %% then the assignment algorithm would have added  $s_k$ to $\A$ instead of
753: %% $r_k$, a contradiction. 
754: \end{proof}
755: 
756: \subsection{Complexity Analysis}
757: Sorting in step 1 takes $O(n \log n)$ time.  All other steps run in
758: $O(n)$ time.  The only steps where this is not obvious are steps 2 and
759: 3 that involve computing $H(x)$ and $P(x)$ respectively.  
760: $H(x)$ can be computed
761: for all $s \in S$ by conducting a sweep of the sorted points in $S
762: \cup T$, adding one when we encounter an element of $S$ and
763: subtracting one when we encounter an element of $T$.
764: 
765: Since all nearest neighbors of the elements of $S$ can easily be computed in 
766: linear time, to show that we can compute the profit function for all 
767: elements of $S$ in linear time we concern ourselves only with computing 
768: the integral of relative height function $h^k$.  This 
769: integral can be computed in linear time for all points in $S$ at 
770: height $k$ in a sweep from right to left.  
771: For the rightmost element $s_r$ of $S$ at height $k$ 
772: %% has a relative height function equal to 
773: $\int_{s_r}^{m}h^{k}(x)dx = |s_r - m|$, where $m$ is the largest 
774: point in $S$.  Suppose that we know $\int_{s}^{m}h^{k}(x)dx$ for some
775: item $s$ at height $k$. 
776: %% the relative height of some element $s$ at height $k$.  
777: Let $s' < s$ be the largest element in $S$ 
778: also at height $k$, and let $t < s$ be the largest element 
779: in $T$ at height $k$.  Note that by continuity, $t$ exists and
780: must be greater than  $s'$.  Also note that $h^k(x)$ is positive for
781: all $s'\leq x\leq t$, and $h^k(x)$ is negative for all $t<x<s$.
782: %% over the interval $(s',t)$ and negative over the interval $(t,s)$. 
783: Thus we can derive the following equation:
784: \begin{equation}
785: \int_{s'}^{m} h^k(x)dx = \int_{s}^{m} h^k(x)dx + |s' - t| - |t - s|
786: \end{equation}
787: This value can be computed in constant time for each $s' \in S$.  
788: Thus we can compute $P(s)$ for all $s \in S$ in linear time.
789: 
790: It follows that the assignment algorithm runs in $O(n \log n)$
791: time. Furthermore, if $S$ and $T$ are given in sorted order, 
792: the assignment algorithm runs in $O(n)$ time. 
793: 
794: \section{Conclusion}
795: We have shown that the one-to-one assignment algorithm in ~\cite{KL75}
796: can be extended to produce a minimum cost many-to-one assignment. The
797: algorithm runs in $O(n \log n)$ time, if the input points are given in 
798: arbitrary order, and in $O(n)$ time, if the input points are presorted. 
799: To our knowledge, this is the first solution to the assignment problem 
800: that achieves this time complexity. 
801: 
802: \begin{thebibliography}{1}
803: 
804: \bibitem{ABKKS95}
805: A.~Aggarwal, A.~Bar-Noy, S.~Khuller, D.~Kravets, and B.~Schieber.
806: \newblock Efficient minimum cost matching and transportation using the
807:   quadrangle inequality.
808: \newblock {\em J. Algorithms}, 19(1):116--143, 1995.
809: 
810: \bibitem{BKSS03}
811: A.~Ben-Dor, R.M. Karp, B.~Schwikowski, and R.~Shamir.
812: \newblock The restriction scaffold problem.
813: \newblock {\em Journal of Computational Biology}, 10(2):385--398, 2003.
814: 
815: \bibitem{BY98}
816: S.R. Buss and P.N.Yianilos.
817: \newblock Linear and o(n log n) time minimum-cost matching algorithms for
818:   quasi-convex tours.
819: \newblock {\em SIAM J. of Computing}, 27(1):170--201, 1998.
820: 
821: \bibitem{CT05}
822: J.~Colannino and G.~Toussaint.
823: \newblock An algorithm for computing the restriction scaffold assignment
824:   problem in computational biology.
825: \newblock Technical Report~2, McGill University, 2005.
826: 
827: \bibitem{bib:eiter97distance}
828: Thomas Eiter and Heikki Mannila.
829: \newblock Distance measures for point sets and their computation.
830: \newblock {\em Acta Informatica}, 34(2):109--133, 1997.
831: 
832: \bibitem{KL75}
833: R.M. Karp and S.-Y.R. Li.
834: \newblock Two special cases of the assignment problem.
835: \newblock {\em Discrete Mathematics}, 13(46):129--142, 1975.
836: 
837: \bibitem{bib:toussaintsimilarity}
838: Godfried Toussaint.
839: \newblock A comparison of rhythmic similarity measures.
840: \newblock In {\em Proc. 5th International Conference on Music Information
841:   Retrieval}, pages 242--245, 2004.
842: 
843: \bibitem{T03}
844: G.T. Toussaint.
845: \newblock Classification and phylogenetic analysis of african ternary rhythm
846:   timelines.
847: \newblock In {\em Proceedings of BRIDGES: Mathematical Connections in Art,
848:   Music and Science}, pages 25--36, 2003.
849: 
850: \bibitem{WPMK86}
851: M.~Werman, S.~Peleg, R.~Melter, and T.~Kong.
852: \newblock Bipartite graph matching for points on a line or a circle.
853: \newblock {\em J. Algorithms}, 7:277--284, 1986.
854: 
855: \end{thebibliography}
856: 
857: \end{document}
858: