L12.4 Adam: Combining Adaptive Learning Rates and Momentum