yangxiaoyu6
add files
2ad1ea3
The two experiments are the same configuration, except for the max-duration.
The md=1000 experiment has better pre-training performance.
Both experiments uses fp16.