4ad087dbc0271239.tex
1: \begin{abstract}
2: The history of deep learning has shown that 
3: human-designed problem-specific networks can greatly 
4: improve the classification performance of general neural models.
5: In most practical cases, however, choosing the optimal 
6: architecture for a given task remains a challenging problem. 
7: Recent architecture-search methods 
8: are able to automatically build neural models with 
9: strong performance but fail 
10: to fully appreciate 
11: the interaction between neural architecture and weights.
12: 
13: %
14: This work investigates  
15: the problem of disentangling 
16: the role of the neural structure and its edge weights,
17: by showing that 
18: well-trained architectures may not 
19: need any link-specific fine-tuning of the weights. 
20: We compare the performance of such weight-free 
21: networks (in our case these are binary networks with 
22: \{0, 1\}-valued weights) with random, 
23: weight-agnostic, pruned and   
24: standard fully connected networks.
25: %
26: To find the optimal weight-agnostic network, we use  
27: a novel and computationally efficient method that translates 
28: the hard architecture-search problem into a feasible 
29: optimization problem.
30: %
31: More specifically, we look at the optimal task-specific architectures 
32: as the optimal configuration of binary 
33: networks with \{0, 1\}-valued 
34: weights, which can be found through an approximate gradient 
35: descent strategy. 
36: %
37: Theoretical convergence guarantees of the proposed algorithm are 
38: obtained by bounding the error in the gradient approximation and 
39: its practical performance is evaluated 
40: %evaluate 
41: on two real-world data sets.
42: %
43: For measuring the structural similarities between different 
44: architectures, we use a novel spectral 
45: approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.
46: 
47: \end{abstract}
48: