File size: 998 Bytes
57bdca5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

Optimization
The .optimization module provides:

an optimizer with weight decay fixed that can be used to fine-tuned models, and
several schedules in the form of schedule objects that inherit from _LRSchedule:
a gradient accumulation class to accumulate the gradients of multiple batches

AdamW (PyTorch)
[[autodoc]] AdamW
AdaFactor (PyTorch)
[[autodoc]] Adafactor
AdamWeightDecay (TensorFlow)
[[autodoc]] AdamWeightDecay
[[autodoc]] create_optimizer
Schedules
Learning Rate Schedules (Pytorch)
[[autodoc]] SchedulerType
[[autodoc]] get_scheduler
[[autodoc]] get_constant_schedule
[[autodoc]] get_constant_schedule_with_warmup

[[autodoc]] get_cosine_schedule_with_warmup

[[autodoc]] get_cosine_with_hard_restarts_schedule_with_warmup

[[autodoc]] get_linear_schedule_with_warmup

[[autodoc]] get_polynomial_decay_schedule_with_warmup
[[autodoc]] get_inverse_sqrt_schedule
Warmup (TensorFlow)
[[autodoc]] WarmUp
Gradient Strategies
GradientAccumulator (TensorFlow)
[[autodoc]] GradientAccumulator