abstract:b50d7cf5837a8b7d.tex

1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file

2: This paper describes how to convert a machine learning problem into a series of map-reduce tasks.

3: We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent

4: and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities.

5: Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they

6: can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression

7: is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise

8: of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed

9: Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem.

10: Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and

11: update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm

12: in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with

13: the number of cluster nodes.

14: \end{abstract}

15: