Text this: Active Vision for Scene Understanding