Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use default value of initial_scale_power if FP16 scaling params not provided #4986

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions deepspeed/runtime/fp16/loss_scaler.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

import torch
from deepspeed import comm as dist
from deepspeed.runtime.constants import FP16_INITIAL_SCALE_POWER_DEFAULT
from deepspeed.utils import logger

INITIAL_LOSS_SCALE = 'init_scale'
Expand Down Expand Up @@ -109,14 +110,14 @@ class DynamicLossScaler(LossScalerBase):
always using the highest loss scale possible without incurring overflow.

Args:
init_scale (float, optional, default=2**32): Initial loss scale attempted by :class:`DynamicLossScaler.`
init_scale (float, optional, default=2**16): Initial loss scale attempted by :class:`DynamicLossScaler.`
scale_factor (float, optional, default=2.0): Factor used when adjusting the loss scale. If an overflow is encountered, the loss scale is readjusted to loss scale/``scale_factor``. If ``scale_window`` consecutive iterations take place without an overflow, the loss scale is readjusted to loss_scale*``scale_factor``.
scale_window (int, optional, default=1000): Number of consecutive iterations without an overflow to wait before increasing the loss scale.
consecutive_hysteresis (bool, optional, default=False): Whether to refill hysteresis if we reach an iteration that doesn't overflow
"""

def __init__(self,
init_scale=2**32,
init_scale=2**FP16_INITIAL_SCALE_POWER_DEFAULT,
scale_factor=2.,
scale_window=1000,
min_scale=1,
Expand Down
Loading