abstract:4ad087dbc0271239.tex

1: \begin{abstract}

2: The history of deep learning has shown that

3: human-designed problem-specific networks can greatly

4: improve the classification performance of general neural models.

5: In most practical cases, however, choosing the optimal

6: architecture for a given task remains a challenging problem.

7: Recent architecture-search methods

8: are able to automatically build neural models with

9: strong performance but fail

10: to fully appreciate

11: the interaction between neural architecture and weights.

12:

13: %

14: This work investigates

15: the problem of disentangling

16: the role of the neural structure and its edge weights,

17: by showing that

18: well-trained architectures may not

19: need any link-specific fine-tuning of the weights.

20: We compare the performance of such weight-free

21: networks (in our case these are binary networks with

22: \{0, 1\}-valued weights) with random,

23: weight-agnostic, pruned and

24: standard fully connected networks.

25: %

26: To find the optimal weight-agnostic network, we use

27: a novel and computationally efficient method that translates

28: the hard architecture-search problem into a feasible

29: optimization problem.

30: %

31: More specifically, we look at the optimal task-specific architectures

32: as the optimal configuration of binary

33: networks with \{0, 1\}-valued

34: weights, which can be found through an approximate gradient

35: descent strategy.

36: %

37: Theoretical convergence guarantees of the proposed algorithm are

38: obtained by bounding the error in the gradient approximation and

39: its practical performance is evaluated

40: %evaluate

41: on two real-world data sets.

42: %

43: For measuring the structural similarities between different

44: architectures, we use a novel spectral

45: approach that allows us to underline the intrinsic differences between real-valued networks and weight-free architectures.

46:

47: \end{abstract}

48: