Pretraining

Vision-LSTM -- xLSTM as Generic Vision Backbone

We introduce Vision-LSTM (ViL), an adaption of the xLSTM building blocks to computer vision.

Mim-refiner -- A contrastive learning boost from intermediate pre-trained representations

We introduce MIM (Masked Image Modeling)-Refiner, a contrastive learning boost for pre-trained MIM models.