MAMBA: SSM model as alternative to the transformers

Authors

DOI:

https://doi.org/10.29019/enfoqueute.1204

Keywords:

language models, hybrid architectures, mamba, state space models, scalability

Abstract

Mamba is a recent State Space Model (SSM) architecture to improve the computational and scalability limitations of transformer-based sequence models. In this review, we synthesize and compare Mamba’s core design—interleaved SSM and feedforward layers with hardware-aware memory management—to standard Transformers, highlighting its linear complexity and ability to process extremely long contexts. We analyze published benchmarks showing that Mamba outperforms or matches open-source baselines (e.g. Pythia, RWKV) of similar and even twice the size on zero-shot tasks, scales more efficiently on genomic sequences (processing >1 M tokens with only 74 K parameters), and supports variants such as Jamba (MoE extension), Falcon Mamba 7B, Mamba-2 (Structured SSM), and Mamba-4 with further speed or capacity gains. We discuss adaptations to vision (VIM) and dependency parsing (DepMamba), and emerging hybrids (e.g. Bamba, IBM Granite) that fuse SSM efficiency with Transformer accuracy. Finally, we interpret these findings in the context of real-world constraints—compute cost, energy, and tooling maturity—outlining where Mamba excels, where hybrid models may be preferable, and which areas require further optimization. Our conclusions suggest that Mamba and its derivatives offer a viable path toward more sustainable, scalable sequence modeling.

Downloads

Download data is not yet available.

References

[1] A. Vaswani et al., “Attention is All you Need”, in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

[2] A. Gu and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”, in Proceedings of the First Conference on Language Modeling, Philadelphia, Pennsylvania, USA., Oct. 2024. [Online]. Available: https://openreview.net/forum?id=tEYskw1VY2

[3] O. Lieber, B. Lenz, and G. Cohen, “Jamba: A Hybrid TransformerMamba Language Model”, presented at the 13th International Conference on Learning Representations (ICLR), 2025.

[4] A. Bick, E. P. Xing, and A. Gu, “Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism”, in Proceedings of Machine Learning Research, ML Research Press, 2025, pp. 4324–4344. [Online]. Available: https://nchr.elsevierpure.com/en/publications/understanding-the-skill-gap-in-recurrentlanguage-models-the-role/

[5] T. Dao and A. Gu, “Transformers are SSMs: generalized models and efficient algorithms through structured state space duality”, in Proceedings of the 41st International Conference on Machine Learning, in ICML’24, vol. 235. Vienna, Austria: JMLR.org, Jul. 2024, pp. 10041–10071.

[6] Z. Qin et al., “The Devil in Linear Transformer”, in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds., Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 7025–7041. https://doi.org/10.18653/v1/2022.emnlp-main.473

[7] A. Gu and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”, Oct. 2023, [Online]. Available: https://openreview.net/forum?id=AL1fq05o7H

[8] T. Hrycej, B. Bermeitinger, and S. Handschuh, “Integrating the Attention Mechanism Into State Space Models”, in 2025 IEEE Swiss Conference on Data Science (SDS), Jun. 2025, pp. 170–173. https://doi.org/10.1109/SDS66131.2025.00033

[9] N. Muca Cirone, A. Orvieto, B. Walker, C. Salvi, and T. Lyons, “Theoretical Foundations of Deep Selective State-Space Models”, Adv. Neural Inf. Process. Syst., vol. 37, pp. 127226–127272, Dec. 2024, https://doi.org/10.52202/079017-4041

[10] E. Yanar et al., “A Comparative Analysis of the Mamba, Transformer, and CNN Architectures for Multi-Label Chest X-Ray Anomaly Detection in the NIH ChestX-Ray14 Dataset”, Diagnostics, vol. 15, no. 17, p. 2215, Jan. 2025, https://doi.org/10.3390/diagnostics15172215

[11] B. N. Patro and V. S. Agneeswaran, “Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, Applications, and Challenges”, Eng. Appl. Artif. Intell., vol. 159, p. 111279, Nov. 2025, https://doi.org/10.1016/j.engappai.2025.111279

[12] S. Liu, J. Keung, Z. Yang, Z. Mao, and Y. Sun, “Can Mamba Be Better? An Experimental Evaluation of Mamba in Code Intelligence”, in 2025 40th IEEE/ACM International Conference on Automated Software Engineering (ASE), Nov. 2025, pp. 1856–1868. https://doi.org/10.1109/ASE63991.2025.00155

[13] H. Zhang, C. Chen, L. Mei, Q. Liu, and J. Mao, “Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval”, in Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, in CIKM 24. New York, NY, USA: Association for Computing Machinery, Oct. 2024, pp. 4268–4272. https://doi.org/10.1145/3627673.3679959

[14] A. Bick, K. Y. Li, E. P. Xing, J. Z. Kolter, and A. Gu, “Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models”, Adv. Neural Inf. Process. Syst., vol. 37, pp. 31788–31812, Dec. 2024, https://doi.org/10.52202/079017-0999

[15] L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: efficient visual representation learning with bidirectional state space model”, in Proceedings of the 41st International Conference on Machine Learning, in ICML’24, vol. 235. Vienna, Austria: JMLR.org, Jul. 2024, pp. 62429–62442.

[16] A. Berroukham, K. Housni, and M. Lahraichi, “Vision Transformers: A Review of Architecture, Applications, and Future Directions”, in 2023 7th IEEE Congress on Information Science and Technology (CiSt), Dec. 2023, pp. 205–210. https://doi.org/10.1109/CiSt56084.2023.10410015

[17] Y. Hedhoud, T. Mekhaznia, and M. Amroune, “Vision Mamba for efficient Tuberculosis Detection based on Chest X-Rays: A comparative study with CNN and Vision transformers”, in 2025 7th International Conference on Pattern Analysis and Intelligent Systems (PAIS), Apr. 2025, pp. 1–6. https://doi.org/10.1109/PAIS66004.2025.11126051

[18] Y. Cho, C. Lee, S. Kim, and E. Park, “PTQ4VM: Post-Training Quantization for Visual Mamba”, in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Feb. 2025, pp. 1176–1185. https://doi.org/10.1109/WACV61041.2025.00122

[19] H. Zhang et al., “A Survey on Visual Mamba”, Appl. Sci., vol. 14, no. 13, p. 5683, Jan. 2024, https://doi.org/10.3390/app14135683

[20] K. Li et al., “VideoMamba: State Space Model for Efficient Video Understanding”, in Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part XXVI, Berlin, Heidelberg: Springer-Verlag, Oct. 2024, pp. 237–255. https://doi.org/10.1007/978-3-031-73347-5_14

[21] J. Selva, A. S. Johansen, S. Escalera, K. Nasrollahi, T. B. Moeslund, and A. Clapés, “Video Transformers: A Survey”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 12922–12943, Nov. 2023, https://doi.org/10.1109/TPAMI.2023.3243465

[22] S. Suo, J. Liu, and H. Miao, “Mamba-Stereo: Mamba Regularization for Stereo matching”, Pattern Recognit., vol. 170, p. 112120, Feb. 2026, https://doi.org/10.1016/j.patcog.2025.112120

[23] Y. Schiff, C.-H. Kao, A. Gokaslan, T. Dao, A. Gu, and V. Kuleshov, “Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling”, Proc. Mach. Learn. Res., vol. 235, pp. 43632–43648, Jul. 2024.

[24] Z. Wang et al., “DSA Mamba: A Model for Advanced Medical Image Classification”, Expert Syst. Appl., p. 130064, Oct. 2025, https://doi.org/10.1016/j.eswa.2025.130064

[25] V. Nikitin and V. Danilov, “Transformer vs. Mamba as Skin Cancer Classifier: Preliminary ResultS”, KPI Sci. News, vol. 137, no. 1–4, Dec. 2024, https://doi.org/10.20535/kpisn.2024.1-4.301028

[26] Z. Zhang, Q. Ma, T. Zhang, J. Chen, H. Zheng, and W. Gao, “SwitchUMamba: Dynamic scanning vision Mamba UNet for medical image segmentation”, Med. Image Anal., vol. 107, p. 103792, Jan. 2026, https://doi.org/10.1016/j.media.2025.103792

[27] J. Ye, J. Zhang, and H. Shan, “Depmamba: Progressive fusion mamba for multimodal depression detection”, presented at the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2025, pp. 1–5.

[28] L. Zhao and Y. Zhang, “Research on Autism Diagnosis Method Based on Transformer and Mamba”, in 2025 6th International Conference on Machine Learning and Computer Application (ICMLCA), Oct. 2025, pp. 1190–1193. https://doi.org/10.1109/ICMLCA66850.2025.11336595

[29] X. Lou, J. Cai, Q. Liu, and S. W. I. Siu, “MambaTransDTA: A Hybrid Mamba-Transformer Architecture for Accurate Drug-Target Binding Affinity Prediction”, J. Chem. Inf. Model., vol. 66, no. 1, pp. 259–270, Jan. 2026, https://doi.org/10.1021/acs.jcim.5c02361

[30] M. A. Ahamed and Q. Cheng, “TimeMachine: A Time Series is Worth 4 Mambas for Long-Term Forecasting”, in ECAI 2024, IOS Press, 2024, pp. 1688–1695. https://doi.org/10.3233/FAIA240677

[31] Z. Wang et al., “Is Mamba effective for time series forecasting?”, Neurocomput., vol. 619, no. C, Feb. 2025, https://doi.org/10.1016/j.neucom.2024.129178

[32] Z. Shao, Z. Wang, X. Yao, M. G. H. Bell, and J. Gao, “ST-MambaSync: Complement the power of Mamba and Transformer fusion for less computational cost in spatial–temporal traffic forecasting”, Inf. Fusion, vol. 117, p. 102872, May 2025, https://doi.org/10.1016/j.inffus.2024.102872

[33] C. Zhang, M. Cheng, X. Li, X. Wang, B. Zhao, and X. Zhu, “WOASTMamba: A spatiotemporal Mamba model enhanced by Whale Optimization for short-term PM2.5 concentration prediction”, J. Environ. Chem. Eng., vol. 13, no. 5, p. 118366, Oct. 2025, https://doi.org/10.1016/j.jece.2025.118366

[34] R. Aalishah, M. Navardi, and T. Mohsenin, “EdgeNavMamba: MambaOptimized Object Detection for Energy-Efficient Edge Devices”, presented at the 2025 IEEE 11th International Conference on Edge Computing and Scalable Cloud (EdgeCom), IEEE Computer Society, Nov. 2025, pp. 62–67. https://doi.org/10.1109/EdgeCom66327.2025.00017

[35] H. Wang et al., “BitNet: 1-bit Pre-training for Large Language Models”, J. Mach. Learn. Res., vol. 26, no. 125, pp. 1–29, 2025.

[36] Z. Yu, T. Kojima, Y. Matsuo, and Y. Iwasawa, “Slender-Mamba: Fully Quantized Mamba in 1.58 Bits From Head to Toe”, in Proceedings of the 31st International Conference on Computational Linguistics, O. Rambow, L. Wanner, M. Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds., Abu Dhabi, UAE: Association for Computational Linguistics, Jan. 2025, pp. 4715–4724.

[37] X. Jiang, C. Han, and N. Mesgarani, “Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation”, in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5. https://doi.org/10.1109/ICASSP49660.2025.10888514

[38] H. Hou, X. Gong, and Y. Qian, “ConMamba: A Convolution-Augmented Mamba Encoder Model for Efficient End-to-End ASR Systems”, in 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing (ISCSLP), Nov. 2024, pp. 711–715. https://doi.org/10.1109/ISCSLP63861.2024.10800511

[39] X. Jiang, Y. A. Li, A. Nicolas Florea, C. Han, and N. Mesgarani, “Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis”, in ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2025, pp. 1–5. https://doi.org/10.1109/ICASSP49660.2025.10889391

[40] Y. Liang et al., “MSDM: A Lightweight Multi-Scale Dynamic Mamba for Dynamic Facial Expression Recognition in Smart Classrooms”, IEEE Trans. Affect. Comput., no. 01, pp. 1–17, Dec. 2025, https://doi.org/10.1109/TAFFC.2025.3646217

[41] M. Zhang, J. Li, H. Jing, K. Wu, and Q. Rong, “LTMHN: Learnable TransMamba Hybrid Network for Remote Sensing Image Superresolution”, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 18, pp. 23548–23562, 2025, https://doi.org/10.1109/JSTARS.2025.3607029

[42] Q. Liu et al., “Trans-Mamba: The Cross-network of Transformer and Mamba for Traffic Flow Prediction”, in 2024 IEEE Smart World Congress (SWC), Dec. 2024, pp. 219–226. https://doi.org/10.1109/SWC62898.2024.00064

[43] C.-W. Cheng, J. Huang, Y. Zhang, G. Yang, C.-B. Schönlieb, and A. I. Aviles-Rivero, “Mamba neural operator: Who wins? transformers vs. state-space models for PDEs”, J. Comput. Phys., vol. 548, p. 114567, Mar. 2026, https://doi.org/10.1016/j.jcp.2025.114567

[44] X. Zhu, Q. Ruan, S. Qian, and M. Zhang, “A hybrid model based on transformer and Mamba for enhanced sequence modeling”, Sci. Rep., vol. 15, no. 1, p. 11428, Apr. 2025, https://doi.org/10.1038/s41598-025-87574-8

[45] S. Tang, L. Ma, H. Li, M. Sun, and Z. Shen, “Bi-Mamba: Towards Accurate 1-Bit State Space Models”, Transactions on Machine Learning Research, 2025.

[46] S. Das, R. Sen, and S. Devendiran, “Mamba Models a possible replacement for Transformers?”, Python Sci. Conf., pp. 332–344, Jun. 2024, https://doi.org/10.25080/XHDR4700

[47] Z. Xu, J. Yan, A. Gupta, and V. Srikumar, “State Space Models are Strong Text Rerankers”, in Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025), V. Adlakha, A. Chronopoulou, X. L. Li, B. P. Majumder, F. Shi, and G. Vernikos, Eds., Albuquerque, NM: Association for Computational Linguistics, May 2025, pp. 152–169. https://doi.org/10.18653/v1/2025.repl4nlp-1.12

[48] A. Terzic, M. Hersche, G. Camposampiero, T. Hofmann, A. Sebastian, and A. Rahimi, “On the Expressiveness and Length Generalization of Selective State Space Models on Regular Languages”, Proc. AAAI Conf. Artif. Intell., vol. 39, no. 19, pp. 20876–20884, Apr. 2025, https://doi.org/10.1609/aaai.v39i19.34301

[49] S. Azizi, S. Kundu, M. E. Sadeghi, and M. Pedram, “MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba”, presented at the 13th International Conference on Learning Representations, 2025. [Online]. Available: https://openreview.net/forum?id=LgzRo1RpLS

[50] Q. Miao, L. Jia, K. Xie, K. Fu, and Z. Yang, “A comprehensive survey and taxonomy of mamba: Applications, Challenges, and Future Directions”, Inf. Fusion, vol. 130, p. 104094, Jun. 2026, https://doi.org/10.1016/j.inffus.2025.104094

[51] A. Salam, R. Mahmud, T. Islam, S. Mukta, and S. Shatabda, “A Comprehensive Survey on Mamba: Architectures, Challenges, and Opportunities”, Computer, vol. 58, no. 8, pp. 64–76, Aug. 2025, https://doi.org/10.1109/MC.2025.3571322

[52] R. Ren, Z. Li, and Y. Liu, “Exploring the Limitations of Mamba in COPY and CoT Reasoning”, in Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, C. Christodoulopoulos, T. Chakraborty, C. Rose, and V. Peng, Eds., Suzhou, China: Association for Computational Linguistics, Nov. 2025, pp. 12539–12563. https://doi.org/10.18653/v1/2025.emnlp-main.634

[53] A. A. Ali, I. Zimerman, and L. Wolf, “The Hidden Attention of Mamba Models”, in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar, Eds., Vienna, Austria: Association for Computational Linguistics, Jul. 2025, pp. 1516–1534. https://doi.org/10.18653/v1/2025.acl-long.76

[54] F. Rezaei Jafari, G. Montavon, K.-R. Müller, and O. Eberle, “MambaLRP: Explaining Selective State Space Sequence Models”, Adv. Neural Inf. Process. Syst., vol. 37, pp. 118540–118570, Dec. 2024, https://doi.org/10.52202/079017-3764

Downloads

Published

2026-04-01

Issue

Section

Review Articles

How to Cite

[1]
S. J. Lianet, D. . López Ramos, A. B. Bestard Columbié, and D. E. . Cruz Morete, “MAMBA: SSM model as alternative to the transformers”, Enfoque UTE, vol. 17, no. 2, pp. 8–17, Apr. 2026, doi: 10.29019/enfoqueute.1204.