It's usually done by reading the whole sentence with a mask to hide future tokens at a certain timestep.