Ablation study wrt. the Tikhonov regularization strength hyperparameter λ. Displayed is the element-wise ranking accuracy (individual element ranks correctly identified), averaged over 10 seeds, and additionallyeach seed with low opacity in the background. Left: NeuralSort. Right: SoftSort. Each for n = 5. Newton Losses, and for both the Hessian and the Fisher variant, significantly improve over the baseline for up to (or beyond) 6 orders of magnitude in variation of its hyperparameter λ. Note the logarithmic horizontal axis.
When training neural networks with custom objectives, such as ranking losses and shortest-path losses, a common problem is that they are, per se, non-differentiable. A popular approach is to continuously relax the objectives to provide gradients, enabling learning. However, such differentiable relaxations are often non-convex and can exhibit vanishing and exploding gradients, making them (already in isolation) hard to optimize. Here, the loss function poses the bottleneck when training a deep neural network. We present Newton Losses, a method for improving the performance of existing hard to optimize losses by exploiting their second-order information via their empirical Fisher and Hessian matrices. Instead of training the neural network with second-order techniques, we only utilize the loss function’s second-order information to replace it by a Newton Loss, while training the network with gradient descent. This makes our method computationally efficient. We apply Newton Losses to eight differentiable algorithms for sorting and shortest-paths, achieving significant improvements for less-optimized differentiable algorithms, and consistent improvements, even for well-optimized differentiable algorithms
@inproceedings{Petersen2024NewtonLossesUsing,
author = {F. Petersen, C. Borgelt, T. Sutter, H. Kuehne, O. Deussen, S. Ermon},
booktitle = {Advances in Neural Information Processing Systems},
doi = {10.52202/079017-0821},
editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
pages = {26113--26135},
publisher = {Curran Associates, Inc.},
title = {Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms},
url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/2e102d937d094b7211c4d32ce1f1126c-Paper-Conference.pdf},
volume = {37},
year = {2024}
}