Scaling Transformer to 1M tokens and beyond with RMT
https://arxiv.org/abs/2304.11062