Pretrained inpainting for robotic exploration; today's reading.

16 Oct 2023 inpainting masked-autoencoder robotics today-i-read

Vishnu Dutt Sharma, Anukriti Singh, and Pratap Tokekar, “Pre-Trained Masked Image Model for Mobile Robot Navigation.” arXiv, October 2023 [Online]. Available at: http://arxiv.org/abs/2310.07021. [Accessed: October 15, 2023]

@misc{sharma_pre_2023,
  title = {Pre-{Trained} {Masked} {Image} {Model} for {Mobile} {Robot} {Navigation}},
  url = {http://arxiv.org/abs/2310.07021},
  arxivdoi = {10.48550/arXiv.2310.07021},
  urldate = {2023-10-15},
  publisher = {arXiv},
  author = {Sharma, Vishnu Dutt and Singh, Anukriti and Tokekar, Pratap},
  month = oct,
  year = {2023},
  note = {arXiv:2310.07021 [cs]},
  keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
  website = {https://raaslab.org/projects/MIM4Robots},
  freepdf = {https://arxiv.org/pdf/2310.07021.pdf},
  tldr = {Robotic exploration and map building. Uses an off-the-shelf model for  inpainting, MAE (Masked Autoencoder, He et al. 2022), and applies it to three contexts. For field-of-view expansion experiments, the larger the patches to be inpainted, the worse the performance. Tested with semantic and binary (occupancy) maps, synthetic data. No fine-tuning of MAE, and performance is better than classical techniques on single-agent and multiple-agent exploration. I liked the writing in this paper -- the hypothesis and themes are very clear throughout.}
}

2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction-driven applications, especially in the dearth of training data. For more qualitative results see https://raaslab.org/projects/MIM4Robots.

tl;dr: Robotic exploration and map building. Uses an off-the-shelf model for inpainting, MAE (Masked Autoencoder, He et al. 2022), and applies it to three contexts. For field-of-view expansion experiments, the larger the patches to be inpainted, the worse the performance. Tested with semantic and binary (occupancy) maps, synthetic data. No fine-tuning of MAE, and performance is better than classical techniques on single-agent and multiple-agent exploration. I liked the writing in this paper – the hypothesis and themes are very clear throughout.

Pre-Trained Masked Image Model for Mobile Robot Navigation.

Vishnu Dutt Sharma, Anukriti Singh, and Pratap Tokekar.

arXiv: http://doi.org/10.48550/arXiv.2310.07021
pdf: https://arxiv.org/pdf/2310.07021.pdf
web: https://raaslab.org/projects/MIM4Robots

h/t Amy Tabb

Share or discuss.

Share on Twitter. Share on Facebook. Share on LinkedIn.

Amy Tabb

Pretrained inpainting for robotic exploration; today's reading.

Share or discuss.

Related Posts

(Camera) Calibration Wizard; today's reading. 19 Oct 2024

Camera-to-camera infrastructure-based calibration; today's reading. 01 Apr 2024

Tree roots and drought; today's reading. 01 Apr 2024

Root reconstruction from images; today's reading. 01 Apr 2024

Potato phenotyping; today's reading. 26 Mar 2024