The two experiments are the same configuration, except for the max-duration. | |
The md=1000 experiment has better pre-training performance. | |
Both experiments uses fp16. |
The two experiments are the same configuration, except for the max-duration. | |
The md=1000 experiment has better pre-training performance. | |
Both experiments uses fp16. |