Text this: Fusion of Human Gaze and Machine Vision for Predicting Intended Locomotion Mode