1: \begin{abstract}
2: We study whether a depth two neural network can learn another
3: depth two network using gradient descent.
4: % We study the problem of learning a depth two neural network with
5: % another
6: % randomly initialized
7: % depth two network using gradient descent.
8: % We study the problem of learning a function using a neural network
9: % of a certain depth and width assuming it can be represented using
10: % such a network.
11: Assuming a linear output node,
12: % the output node of the network
13: % is linear,
14: we show that
15: % We show that for networks of depth two with certain
16: % simplifying assumptions
17: the question of whether gradient descent converges to the
18: target function is equivalent to the following question in
19: electrodynamics:
20: Given $k$ fixed protons in $\rea^d,$ and $k$ electrons,
21: % initialized at random positions
22: % with the electrons moving due to
23: % under the influence of the
24: %electrical
25: each moving due to the attractive force from the protons and repulsive
26: force from the remaining electrons,
27: %. The question of convergence, then, is
28: whether at equilibrium all the electrons will be matched up with
29: %to all the
30: the protons, up to a permutation.
31: Under the standard electrical
32: force, this follows from the classic Earnshaw's theorem. In our setting,
33: the force is
34: % If the force function between a pair of
35: % charges is not given by the standard electrical force of $1/r^2$
36: % (where $r$ is the distance between unit charges), but by another
37: % function that is
38: determined by the activation function and the
39: input distribution.
40: Building on this equivalence, we prove the
41: existence of an activation function such that
42: % the corresponding
43: gradient descent learns
44: % dynamics
45: % result in learning
46: at least one of the
47: hidden nodes in the target network.
48: Iterating, we show that gradient
49: descent can be used to learn the entire network one node at a time.
50: \end{abstract}