ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding

Diao, Linshuang; Song, Sensen; Qian, Yurong; Ren, Dayong

ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding

Linshuang Diao¹, Sensen Song^1†, Yurong Qian², Dayong Ren^3†

¹Key Laboratory of Signal Detection and Processing, Xinjiang University ²Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University ³Department of Computer Science and Technology, Nanjing University
^†Corresponding authors | NeurIPS 2025

Paper Code arXiv

Abstract

State Space models (SSMs) like PointMamba provide efficient feature extraction for point cloud self-supervised learning with linear complexity, surpassing Transformers in computational efficiency. However, existing PointMamba-based methods rely on complex token ordering and random masking, disrupting spatial continuity and local semantic correlations. We propose ZigzagPointMamba to address these challenges. The key to our approach is a simple zigzag scan path that globally sequences point cloud tokens, enhancing spatial continuity by preserving the proximity of spatially adjacent point tokens. Yet, random masking impairs local semantic modeling in self-supervised learning. To overcome this, we introduce a Semantic-Siamese Masking Strategy (SMS), which masks semantically similar tokens to facilitate reconstruction by integrating local features of original and similar tokens, thus overcoming dependence on isolated local features and enabling robust global semantic modeling. Our pre-training ZigzagPointMamba weights significantly boost downstream tasks, achieving a 1.59% mIoU gain on ShapeNetPart for part segmentation, a 0.4% higher accuracy on ModelNet40 for classification, and 0.19%, 1.22%, and 0.72% higher accuracies respectively for the classification tasks on the OBJ-BG, OBJ-ONLY, and PB-T50-RS subsets of ScanObjectNN.

Method Overview and Results

Comprehensive results showing performance comparison, masking strategy effects, and fine-tuning improvements

Comprehensive Results: (a) Performance comparison across datasets, (b) SMS vs. random masking reconstruction quality, (c) Feature representations before and after fine-tuning.

ZigzagPointMamba pre-training pipeline with zigzag scan and SMS

Pipeline Overview: ZigzagPointMamba pre-training with zigzag scan path and Semantic-Siamese Masking Strategy.

Zigzag scan path and semantic masking visualization

Zigzag Scan Path: 3D extension preserving spatial proximity while SMS masks semantically similar tokens.

Experimental results on ModelNet40 and ShapeNet Part

Experimental Results: Classification on ModelNet40 and part segmentation on ShapeNet Part datasets.

Few-shot Learning: Superior performance on ModelNet40 few-shot classification tasks.

ScanObjectNN Results: Consistent improvements across OBJ-BG, OBJ-ONLY, and PB-T50-RS subsets.

Poster

Additional Qualitative Results

Reconstruction Quality Analysis

Qualitative analysis of mask predictions on ShapeNet validation set — **Figure:** Qualitative analysis of mask predictions from ZigzagPointMamba on ShapeNet validation set. From left to right: Input point cloud, Masked version, Reconstructed result, and additional object examples.

Part Segmentation Comparison

Comparison of part segmentation results between PointMamba and ZigzagPointMamba — **Figure:** Qualitative comparison of part segmentation results. Top: Ground Truth, Middle: PointMamba predictions, Bottom: ZigzagPointMamba predictions. Objects include laptop, lamp, guitar, airplane, and table.

BibTeX

@inproceedings{diao2025zigzagpointmamba,
  title={ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding},
  author={Diao, Linshuang and Song, Sensen and Qian, Yurong and Ren, Dayong},
  booktitle={Advances in neural information processing systems},
  year={2025}
}