The step size, often referred to as the learning rate, plays a pivotal role in optimizing the efficiency of the stochastic gradient descent (SGD) algorithm. In recent times, multiple step size strategies have emerged for enhancing SGD performance. However, a significant challenge associated with these step sizes is related to their probability distribution, denoted as ηt/ΣTt=1ηt .The step size, often referred to as the learning rate, plays a pivotal role in optimizing the efficiency of the stochastic gradient descent (SGD) algorithm. In recent times, multiple step size strategies have emerged for enhancing SGD performance. However, a significant challenge associated with these step sizes is related to their probability distribution, denoted as ηt/ΣTt=1ηt .[#item_full_content]