Discriminative Unsupervised Alignment of Natural Language Instructions with Corresponding Video Segments