Text this: Multi-Scale Masked Autoencoders for Cross-Session Emotion Recognition