b50d7cf5837a8b7d.tex
1: \begin{abstract}%   <- trailing '%' for backward compatibility of .sty file
2: This paper describes how to convert a machine learning problem into a series of map-reduce tasks.
3: We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent 
4: and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities.
5: Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they
6: can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression
7: is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise 
8: of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed 
9: Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. 
10: Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and 
11: update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm 
12: in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with 
13: the number of cluster nodes.
14: \end{abstract}
15: