View on GitHub

GroupMamba

Parameter-Efficient and Accurate Group Visual State Space Model

Abdelrahman Shaker, Syed Talal Wasim, Salman Khan, Jürgen Gall, and Fahad Khan


                                                   Paper: , Code:


News

Introduction

We introduce GroupMamba, inspired by GroupConvolution, designed to enhance computational efficiency and interaction within state-space models. By employing a multi-directional scanning method, GroupMamba ensures comprehensive spatial coverage and effective modeling of both local and global information. We present a series of parameter-efficient generic classification models under the GroupMamba name, based on our proposed Modulated Group Mamba layer. Our tiny variant achieves 83.3% top-1 accuracy on ImageNet-1k with just 23M parameters. Furthermore, the base variant reaches a top-1 accuracy of 84.5% with 57M parameters, outperforming all recent state-space model methods.

Overview

main figure

Overview of the proposed method. Top Row: The overall architecture of our framework with a consistent hierarchical design comprising four stages. Bottom Row: We present (b) The design of the modulated group mamba layer. The input channels are divided into four groups with a single scanning direction for each VSSS block. This significantly reduces the computational complexity compared to the standard mamba layer, with similar performance. Channel Affinity Modulation mechanism is introduced to address the limited interactions within the VSSS blocks. (c) The design of VSSS block. It consists of Mamba block with 1D Selective Scanning block followed by FFN. (d) The four scanning directions used for the four VSSS blocks are illustrated.

Model Zoo

Model pretrain Image Res. #param. Top-1 Acc. Model
GroupMamba - Tiny ImageNet-1k 224x224 23M 83.3 Link
GroupMamba - Small ImageNet-1k 224x224 34M 83.9 Link
GroupMamba - Base ImageNet-1k 224x224 57M 84.5 Link

Comparison on ImageNet-1k

results

Comparison on Object Detection and Instance Segmentation

results

Comparison on Semantic Segmentation

results

Qualitative Results (Object Detection and Instance Segmentation)

results

Qualitative Results (Semantic Segmentation)

results

Citation

If you use our work, please consider citing:

@article{shaker2024GroupMamba,
  title={GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model},
  author={Abdelrahman Shaker and Syed Talal Wasim and Salman Khan and Gall Jürgen and Fahad Shahbaz Khan},
  journal={arXiv preprint arXiv:2407.13772},
  year={2024},
  url={https://arxiv.org/pdf/2407.13772}
}

Contact

Should you have any question, please create an issue on this repository or contact me at abdelrahman.youssief@mbzuai.ac.ae.