Today I read.

(Camera) Calibration Wizard; today’s reading. 19 Oct 2024

Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty.

Songyou Peng and Peter Sturm.

tl;dr: Uses three freely-acquired poses to initialize, creates an optimization problem for the next pose such that the expected uncertainty of the intrinsic camera parameters is minimized. The process is to formulate the calibration problem as geometric reprojection error, and Jacobian matrices are computed. The data is extended to a hypothetical next pose, the next pose and intrinsic parameters are parameterized within the Jacobian. Through some matrix transformations, the covariance matrix of the intrinsic parameters can be extracted using the Jacobian. Corner uncertainty is incorporated, as poses that reduce uncertainty may be perpendicular to the image plane and be unusable. Code is available but in Matlab.

@inproceedings{peng_calibration_2019,
  address = {Seoul, Korea (South)},
  title = {Calibration {Wizard}: {A} {Guidance} {System} for {Camera} {Calibration} {Based} on {Modelling} {Geometric} and {Corner} {Uncertainty}},
  copyright = {https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html},
  isbn = {978-1-72814-803-8},
  shorttitle = {Calibration {Wizard}},
  url = {https://ieeexplore.ieee.org/document/9009540/},
  doi = {10.1109/ICCV.2019.00158},
  language = {en},
  urldate = {2024-10-20},
  booktitle = {2019 {IEEE}/{CVF} {International} {Conference} on {Computer} {Vision} ({ICCV})},
  publisher = {IEEE},
  author = {Peng, Songyou and Sturm, Peter},
  month = oct,
  year = {2019},
  pages = {1497--1505},
  freepdf = {https://openaccess.thecvf.com/content_ICCV_2019/papers/Peng_Calibration_Wizard_A_Guidance_System_for_Camera_Calibration_Based_on_ICCV_2019_paper.pdf},
  tldr = {Uses three freely-acquired poses to initialize, creates an optimization problem for the next pose such that the expected uncertainty of the intrinsic camera parameters is minimized. The process is to formulate the calibration problem as geometric reprojection error, and Jacobian matrices are computed. The data is extended to a hypothetical next pose, the next pose and intrinsic parameters are parameterized within the Jacobian. Through some matrix transformations, the covariance matrix of the intrinsic parameters can be extracted using the Jacobian. Corner uncertainty is incorporated, as poses that reduce uncertainty may be perpendicular to the image plane and be unusable. Code is available but in Matlab.},
  code = {https://github.com/pengsongyou/CalibrationWizard}
}

It is well known that the accuracy of a calibration depends strongly on the choice of camera poses from which images of a calibration object are acquired. We present a system – Calibration Wizard – that interactively guides a user towards taking optimal calibration images. For each new image to be taken, the system computes, from all previously acquired images, the pose that leads to the globally maximum reduction of expected uncertainty on intrinsic parameters and then guides the user towards that pose. We also show how to incorporate uncertainty in corner point position in a novel principled manner, for both, calibration and computation of the next best pose. Synthetic and realworld experiments are performed to demonstrate the effectiveness of Calibration Wizard.

Camera-to-camera infrastructure-based calibration; today’s reading. 01 Apr 2024

Motion-Based Extrinsic Sensor-to-Sensor Calibration: Effect of Reference Frame Selection for New and Existing Methods.

Tuomas Välimäki, Bharath Garigipati, and Reza Ghabcheloo.

tl;dr: Uses hand-eye robot calibration formulation of AX=XB to calibrate sensor pairs in infrastructure context. The paper explores different methods for the calibration, as well as the choice of relative coordinate frame, as for the hand-eye calibration problem, the transformations are relative. I have not seen any treatment of this issue in the literature before, of how to choose the relative transformations when using AX=XB. The answer: ’it depends.’

@article{valimaki_motion_2023,
  author = {Välimäki, Tuomas and Garigipati, Bharath and Ghabcheloo, Reza},
  title = {Motion-Based Extrinsic Sensor-to-Sensor Calibration: Effect of Reference Frame Selection for New and Existing Methods},
  journal = {Sensors},
  volume = {23},
  year = {2023},
  number = {7},
  article-number = {3740},
  url = {https://www.mdpi.com/1424-8220/23/7/3740},
  pubmedid = {37050800},
  issn = {1424-8220},
  doi = {10.3390/s23073740},
  code = {https://github.com/tau-alma/trajectory_calibration_experiments},
  tldr = {Uses hand-eye robot calibration formulation of AX=XB to calibrate sensor pairs in infrastructure context. The paper explores different methods for the calibration, as well as the choice of relative coordinate frame, as for the hand-eye calibration problem, the transformations are relative. I have not seen any treatment of this issue in the literature before, of how to choose the relative transformations when using AX=XB. The answer: 'it depends.'}
}

This paper studies the effect of reference frame selection in sensor-to-sensor extrinsic calibration when formulated as a motion-based hand–eye calibration problem. As the sensor trajectories typically contain some composition of noise, the aim is to determine which selection strategies work best under which noise conditions. Different reference selection options are tested under varying noise conditions in simulations, and the findings are validated with real data from the KITTI dataset. The study is conducted for four state-of-the-art methods, as well as two proposed cost functions for nonlinear optimization. One of the proposed cost functions incorporates outlier rejection to improve calibration performance and was shown to significantly improve performance in the presence of outliers, and either match or outperform the other algorithms in other noise conditions. However, the performance gain from reference frame selection was deemed larger than that from algorithm selection. In addition, we show that with realistic noise, the reference frame selection method commonly used in the literature, is inferior to other tested options, and that relative error metrics are not reliable for telling which method achieves best calibration performance.

Tree roots and drought; today’s reading. 01 Apr 2024

How tree roots respond to drought.

Ivano Brunner, Claude Herzog, Melissa A. Dawes, Matthias Arend, and Christoph Sperisen.

tl;dr: A review article covering responses of tree roots to drought conditions in the forest context; discusses drought avoidance as well as drought tolerance. New to me was the discussion of how root turnover contributes to soil organic matter. Table 1 on page 5 lists root traits and how each trait is affected by drought; ’growth’, ’architectural’, and ’morphological’ traits likely to be of interest for root phenotyping.

@article{brunner_how_2016,
  author = {Brunner, Ivano and Herzog, Claude and Dawes, Melissa A. and Arend, Matthias and Sperisen, Christoph},
  title = {How tree roots respond to drought},
  journal = {Frontiers in Plant Science},
  volume = {6},
  year = {2015},
  url = {https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2015.00547},
  doi = {10.3389/fpls.2015.00547},
  issn = {1664-462X},
  tldr = {A review article covering responses of tree roots to drought conditions in the forest context; discusses drought avoidance as well as drought tolerance. New to me was the discussion of how root turnover contributes to soil organic matter. Table 1 on page 5 lists root traits and how each trait is affected by drought; 'growth', 'architectural', and 'morphological' traits likely to be of interest for root phenotyping.}
}

The ongoing climate change is characterized by increased temperatures and altered precipitation patterns. In addition, there has been an increase in both the frequency and intensity of extreme climatic events such as drought. Episodes of drought induce a series of interconnected effects, all of which have the potential to alter the carbon balance of forest ecosystems profoundly at different scales of plant organization and ecosystem functioning. During recent years, considerable progress has been made in the understanding of how aboveground parts of trees respond to drought and how these responses affect carbon assimilation. In contrast, processes of belowground parts are relatively underrepresented in research on climate change. In this review, we describe current knowledge about responses of tree roots to drought. Tree roots are capable of responding to drought through a variety of strategies that enable them to avoid and tolerate stress. Responses include root biomass adjustments, anatomical alterations, and physiological acclimations. The molecular mechanisms underlying these responses are characterized to some extent, and involve stress signaling and the induction of numerous genes, leading to the activation of tolerance pathways. In addition, mycorrhizas seem to play important protective roles. The current knowledge compiled in this review supports the view that tree roots are well equipped to withstand drought situations and maintain morphological and physiological functions as long as possible. Further, the reviewed literature demonstrates the important role of tree roots in the functioning of forest ecosystems and highlights the need for more research in this emerging field.

Root reconstruction from images; today’s reading. 01 Apr 2024

Simultaneous Direct Depth Estimation and Synthesis Stereo for Single Image Plant Root Reconstruction.

Yawen Lu, Yuxing Wang, Devarth Parikh, Awais Khan, and Guoyu Lu.

tl;dr: Root reconstruction from one image of young apple tree roots. Two approaches to generating depth maps; first is to predict depth map from one image. The second is to generate another image from a single image, and then generate the depth map using a stereo technique. The results are combined to form the resulting point cloud.

@article{lu_simul_2021,
  author = {Lu, Yawen and Wang, Yuxing and Parikh, Devarth and Khan, Awais and Lu, Guoyu},
  journal = {IEEE Transactions on Image Processing},
  title = {Simultaneous Direct Depth Estimation and Synthesis Stereo for Single Image Plant Root Reconstruction},
  year = {2021},
  volume = {30},
  pages = {4883-4893},
  doi = {10.1109/TIP.2021.3069578},
  freepdf = {https://drive.google.com/file/d/1flz3VJ_Nix18wPcz6btxxJDo2B7TwLzs/view},
  tldr = {Root reconstruction from one image of young apple tree roots. Two approaches to generating depth maps; first is to predict depth map from one image. The second is to generate another image from a single image, and then generate the depth map using a stereo technique. The results are combined to form the resulting point cloud.},
  keywords = {Three-dimensional displays;Image reconstruction;Estimation;Solid modeling;Shape;Cameras;Periodic structures;Root reconstruction;cross-view synthesis;single image depth estimation}
}

Plant roots are the main conduit to its interaction with the physical and biological environment. A 3D root system architecture can provide fundamental and applied knowledge of a plant’s ability to thrive, but the construction of 3D structures for thin and complicated plant roots is challenging. Existing methods such as structure-from-motion and shape-from-silhouette require multiple images, as input, under a complicated optimization process, which is usually not convenient in fieldwork. Little effort has been put into investigating the applications of deep neural network methods to reconstruct thin objects, like plant root systems, from a single image. We propose an unsupervised learning scheme to estimate the root depth from only one image as input, which is further applied to reconstruct the complete root system. The boundaries of the reconstructed object usually contain large errors, which is a significant problem for roots with many thin branches. To reduce reconstruction errors, we integrate a cross-view GAN-based network into the reconstruction process, which predicts the root image from a different perspective. Based on the predicted view, we reconstruct the root system using stereo reconstruction, which helps to identify the accurately reconstructed points by enforcing their consistency. The results on both the real plant root dataset and the synthetic dataset demonstrate the effectiveness of the proposed algorithm compared with state-of-the-art single image 3D reconstruction models on plant roots.

Potato phenotyping; today’s reading. 26 Mar 2024

A scalable, low-cost phenotyping strategy to assess tuber size, shape, and the colorimetric features of tuber skin and flesh in potato breeding populations.

Max J. Feldman et al.

tl;dr: Measures the following traits from images: length and width, aspect ratio, eccentricity, biomass profiles; uses a size marker in images. Color assessed of skin and flesh in consumer camera and flat-bed scanner images, uses color checker and perform color calibration. Deep learning to classify halved tubers as possessing the hollow heart defect. Lists tools to automate capture with python and links to code. Population of 189 tubers.

@article{feldman_scalable_2023,
  author = {Feldman, Max J. and Park, Jaebum and Miller, Nathan and Wakholi, Collins and Greene, Katelyn and Abbasi, Arash and Rippner, Devin A. and Navarre, Duroy and Schmitz Carley, Cari and Shannon, Laura M. and Novy, Rich},
  title = {A scalable, low-cost phenotyping strategy to assess tuber size, shape, and the colorimetric features of tuber skin and flesh in potato breeding populations},
  journal = {The Plant Phenome Journal},
  volume = {7},
  number = {1},
  pages = {e20099},
  doi = {https://doi.org/10.1002/ppj2.20099},
  url = {https://acsess.onlinelibrary.wiley.com/doi/abs/10.1002/ppj2.20099},
  eprint = {https://acsess.onlinelibrary.wiley.com/doi/pdf/10.1002/ppj2.20099},
  year = {2024},
  tldr = {Measures the following traits from images: length and width, aspect ratio, eccentricity, biomass profiles; uses a size marker in images. Color assessed of skin and flesh in consumer camera and flat-bed scanner images, uses color checker and perform color calibration. Deep learning to classify halved tubers as possessing the hollow heart defect. Lists tools to automate capture with python and links to code. Population of 189 tubers.}
}

Abstract Tuber size, shape, colorimetric characteristics, and defect susceptibility are all factors that influence the acceptance of new potato cultivars. Despite the importance of these characteristics, our understanding of their inheritance is substantially limited by our inability to precisely measure these features quantitatively on the scale needed to evaluate breeding populations. To alleviate this bottleneck, we developed a low-cost, semiautomated workflow to capture data and measure each of these characteristics using machine vision. This workflow was applied to assess the phenotypic variation present within 189 F1 progeny of the A08241 breeding population. Machine vision was applied to estimate linear and volumetric tuber size, assess tuber shape characteristics using aspect ratio and biomass profiles, and quantify tuber skin and flesh color; additionally, a deep learning mode was developed to classify the presence of hollow-heart defect. Our results provide an example of quantitative measurements acquired using machine vision methods that are reliable, heritable, and capable of being used to understand and select multiple traits simultaneously in structured potato breeding populations.

Dynamic SLAM and factor graphs; today’s reading. 22 Jan 2024

The Importance of Coordinate Frames in Dynamic SLAM.

Jesse Morris, Yiduo Wang, and Viorela Ila.

tl;dr: A back-end for Dynamic SLAM. Discusses object- versus world-centric dynamic SLAM, advocates for world-centric formulation but evaluates both object- and world-centric versions in factor graph library GTSAM. “Model free" in that tracked points are used (another option would be object pose). Dynamic objects are assumed to be rigid. An example of where gauge choice leads to different formulations and results.

@misc{morris_importance_2023,
  title = {The {Importance} of {Coordinate} {Frames} in {Dynamic} {SLAM}},
  url = {http://arxiv.org/abs/2312.04031},
  arxivdoi = {10.48550/arXiv.2312.04031},
  freepdf = {https://arxiv.org/pdf/2312.04031.pdf},
  urldate = {2024-01-22},
  publisher = {arXiv},
  author = {Morris, Jesse and Wang, Yiduo and Ila, Viorela},
  month = dec,
  year = {2023},
  note = {arXiv:2312.04031 [cs]},
  keywords = {Computer Science - Robotics},
  annote = {Comment: 7 pages, 4 figures, submitted to ICRA 2024},
  tldr = {A back-end for Dynamic SLAM. Discusses object- versus world-centric dynamic SLAM, advocates for world-centric formulation but evaluates both object- and world-centric versions in factor graph library GTSAM. ``Model free" in that tracked points are used (another option would be object pose). Dynamic objects are assumed to be rigid. An example of where gauge choice leads to different formulations and results.}
}

Most Simultaneous localisation and mapping (SLAM) systems have traditionally assumed a static world, which does not align with real-world scenarios. To enable robots to safely navigate and plan in dynamic environments, it is essential to employ representations capable of handling moving objects. Dynamic SLAM is an emerging field in SLAM research as it improves the overall system accuracy while providing additional estimation of object motions. State-of-the-art literature informs two main formulations for Dynamic SLAM, representing dynamic object points in either the world or object coordinate frame. While expressing object points in a local reference frame may seem intuitive, it may not necessarily lead to the most accurate and robust solutions. This paper conducts and presents a thorough analysis of various Dynamic SLAM formulations, identifying the best approach to address the problem. To this end, we introduce a front-end agnostic framework using GTSAM that can be used to evaluate various Dynamic SLAM formulations.

Comparison of YOLOv8 and Mask-RCNN in a fruit orchard setting; today’s reading. 14 Dec 2023

Comparing YOLOv8 and Mask RCNN for object segmentation in complex orchard environments.

Ranjan Sapkota, Dawood Ahmed, and Manoj Karkee.

tl;dr: Evaluation of YOLO8 and Mask RCNN in two datasets and for two different tasks. Datasets are color images of production apple trees; Dataset 1 from the dormant season (leafless trees) and Datatset 2 from the growing season with fruitlets. Tasks are single-class instance segmentation of fruitlets from Dataset 2, and multi-class instance segmentation of branches and tree trunks from Datatset 1. Total of 1550 images, all manually annotated and split into train / val / test sets; models trained on this data. References of other works using YOLO-N or Mask-RCNN in orchard environments is useful. Concludes that YOLO8 works better in these environments than Mask-RCNN, with better precision and recall and lower inference times.

@techreport{sapkota_comparing_2023,
  type = {preprint},
  title = {Comparing {YOLOv8} and {Mask} {RCNN} for object segmentation in complex orchard environments},
  url = {https://arxiv.org/abs/2312.07935},
  language = {en},
  urldate = {2023-12-14},
  author = {Sapkota, Ranjan and Ahmed, Dawood and Karkee, Manoj},
  month = dec,
  year = {2023},
  arxivdoi = {10.48550/arXiv.2312.07935},
  freepdf = {https://arxiv.org/pdf/2312.07935.pdf},
  tldr = {Evaluation of YOLO8 and Mask RCNN in two datasets and for two different tasks. Datasets are color images of production apple trees; Dataset 1 from the dormant season (leafless trees) and Datatset 2 from the growing season with fruitlets. Tasks are single-class instance segmentation of fruitlets from Dataset 2, and multi-class instance segmentation of branches and tree trunks from Datatset 1. Total of ~1550 images, all manually annotated and split into train / val / test sets; models trained on this data. References of other works using YOLO-N or Mask-RCNN in orchard environments is useful. Concludes that YOLO8 works better in these environments than Mask-RCNN, with better precision and recall and lower inference times.}
}

Instance segmentation, an important image processing operation for automation in agriculture, is used to precisely delineate individual objects of interest within images, which provides foundational information for various automated or robotic tasks such as selective harvesting and precision pruning. This study compares the one-stage YOLOv8 and the two-stage Mask R-CNN machine learning models for instance segmentation under varying orchard conditions across two datasets. Dataset 1, collected in dormant season, includes images of dormant apple trees, which were used to train multi-object segmentation models delineating tree branches and trunks. Dataset 2, collected in the early growing season, includes images of apple tree canopies with green foliage and immature (green) apples (also called fruitlet), which were used to train single-object segmentation models delineating only immature green apples. The results showed that YOLOv8 performed better than Mask R-CNN, achieving good precision and near-perfect recall across both datasets at a confidence threshold of 0.5. Specifically, for Dataset 1, YOLOv8 achieved a precision of 0.90 and a recall of 0.95 for all classes. In comparison, Mask R-CNN demonstrated a precision of 0.81 and a recall of 0.81 for the same dataset. With Dataset 2, YOLOv8 achieved a precision of 0.93 and a recall of 0.97. Mask R-CNN, in this single-class scenario, achieved a precision of 0.85 and a recall of 0.88. Additionally, the inference times for YOLOv8 were 10.9 ms for multi-class segmentation (Dataset 1) and 7.8 ms for single-class segmentation (Dataset 2), compared to 15.6 ms and 12.8 ms achieved by Mask R-CNN’s, respectively. These findings show YOLOv8’s superior accuracy and efficiency in machine learning applications compared to two-stage models, specifically Mast-RCNN, which suggests its suitability in developing smart and automated orchard operations, particularly when real-time applications are necessary in such cases as robotic harvesting and robotic immature green fruit thinning.

Solver and certification of optimality in N-view triangulation; today’s reading. 12 Dec 2023

Certifiable Solver for Real-Time N-View Triangulation.

Mercedes Garcia-Salguero and Javier Gonzalez-Jimenez.

tl;dr: Formulates L2 norm N-view triangulation problem as a QCQP (Quadratically Constrained Quadratic Problem), where constraints are pair-wise epipolar constraints. Iterative solve using linear relaxations of the QCQP. Solutions are certified for optimality by checking for constraints’ satisfaction and positive semi-definiteness of a Hessian.

@article{garcia_salguero_certifiable_2023,
  title = {Certifiable {Solver} for {Real}-{Time} {N}-{View} {Triangulation}},
  volume = {8},
  issn = {2377-3766, 2377-3774},
  url = {https://ieeexplore.ieee.org/document/10044919/},
  doi = {10.1109/LRA.2023.3245408},
  language = {en},
  number = {4},
  urldate = {2023-12-12},
  journal = {IEEE Robotics and Automation Letters},
  author = {Garcia-Salguero, Mercedes and Gonzalez-Jimenez, Javier},
  month = apr,
  year = {2023},
  pages = {1999--2005},
  code = {https://github.com/mergarsal/FastNViewTriangulation},
  freepdf = {https://mapir.isa.uma.es/papersrepo/2023/2023_mercedes_RAL_Nview_triangulation_paper.pdf},
  tldr = {Formulates L2 norm N-view triangulation problem as a QCQP (Quadratically Constrained Quadratic Problem), where constraints are pair-wise epipolar constraints. Iterative solve using linear relaxations of the QCQP. Solutions are certified for optimality by checking for constraints' satisfaction and positive semi-definiteness of a Hessian.}
}

Cutting-edge field robotic systems, such as UAV or autonomous cars, demand fast and optimal solutions for any component at the core of their critical navigational tasks. Among them, we focus on the triangulation of image points from multiple views, which is a cornerstone for more complex tasks such as visual localization and SLAM. In this paper we present a fast and certifiable solver for the N-view triangulation problem that doesn’t require any specific optimization software package and can be implemented with any linear algebra library. The proposal relies on a series of linear convexifications which, in the limit, recovers the original problem, allowing us to solve problem instances with N = 10 views in 150 microseconds on a standard desktop computer. On real data our solver obtains and certifies the optimal solution in more than 99% of the problem instances. We make the code available at https://github.com/mergarsal.

RANSAC filters outliers and selects well-conditioned minimal problems; today’s reading. 11 Dec 2023

Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC.

Hongyi Fan, Joe Kileel, and Benjamin Kimia.

tl;dr: Argues that the 5-point and 7-point (compute essential, and fundamental matrix, respectively from image correspondences) algorithms may be numerically unstable even in cases with no outliers. Then RANSAC not only filters outliers, but also tends towards selecting data points such that condition numbers are well-behaved.

@misc{fan_condition_2023,
  title = {Condition numbers in multiview geometry, instability in relative pose estimation, and {RANSAC}},
  url = {http://arxiv.org/abs/2310.02719},
  arxivdoi = {10.48550/arXiv.2310.02719},
  urldate = {2023-10-27},
  publisher = {arXiv},
  author = {Fan, Hongyi and Kileel, Joe and Kimia, Benjamin},
  month = oct,
  year = {2023},
  note = {arXiv:2310.02719 [cs, math]},
  keywords = {Computer Science - Computer Vision and Pattern Recognition, Mathematics - Numerical Analysis},
  freepdf = {https://arxiv.org/pdf/2310.02719.pdf},
  tldr = {Argues that the 5-point and 7-point (compute essential, and fundamental matrix, respectively from image correspondences) algorithms may be numerically unstable even in cases with no outliers. Then RANSAC not only filters outliers, but also tends towards selecting data points such that condition numbers are well-behaved.}
}

In this paper we introduce a general framework for analyzing the numerical conditioning of minimal problems in multiple view geometry, using tools from computational algebra and Riemannian geometry. Special motivation comes from the fact that relative pose estimation, based on standard 5-point or 7-point Random Sample Consensus (RANSAC) algorithms, can fail even when no outliers are present and there is enough data to support a hypothesis. We argue that these cases arise due to the intrinsic instability of the 5- and 7-point minimal problems. We apply our framework to characterize the instabilities, both in terms of the world scenes that lead to infinite condition number, and directly in terms of ill-conditioned image data. The approach produces computational tests for assessing the condition number before solving the minimal problem. Lastly synthetic and real data experiments suggest that RANSAC serves not only to remove outliers, but also to select for well-conditioned image data, as predicted by our theory.

Masked autoencoder for image recognition; today’s reading. 19 Oct 2023

Masked Autoencoders Are Scalable Vision Learners.

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick.

tl;dr: Proposes a masked autoencoder (MAE) for pretraining a Vision Transformer (ViT) for the image recognition task. The masked autoencoder is trained for the reconstruction task, with an asymmetric design; encoder does not take masked patches as input, while the decoder does. For image recognition, the decoder is abandoned and the encoder fine-tuned. Best results: ViT-Huge model, experiments on ImageNet-1K. Ablations abound in the paper.

@inproceedings{he_masked_2022,
  author = {He, Kaiming and Chen, Xinlei and Xie, Saining and Li, Yanghao and Doll\'ar, Piotr and Girshick, Ross},
  title = {Masked Autoencoders Are Scalable Vision Learners},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = jun,
  year = {2022},
  pages = {16000-16009},
  arxivdoi = {10.48550/arXiv.2111.06377},
  tldr = {Proposes a masked autoencoder (MAE) for pretraining a Vision Transformer (ViT) for the image recognition task. The masked autoencoder is trained for the reconstruction task, with an asymmetric design; encoder does not take masked patches as input, while the decoder does. For image recognition, the decoder is abandoned and the encoder fine-tuned. Best results: ViT-Huge model, experiments on ImageNet-1K. Ablations abound in the paper.},
  supplemental = {https://openaccess.thecvf.com/content/CVPR2022/supplemental/He_Masked_Autoencoders_Are_CVPR_2022_supplemental.pdf},
  freepdf = {https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf},
  doi = {10.1109/CVPR52688.2022.01553},
  cvf = {https://openaccess.thecvf.com/content/CVPR2022/html/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.html},
  code = {https://paperswithcode.com/paper/masked-autoencoders-are-scalable-vision}
}

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3× or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pretraining and shows promising scaling behavior.

Variational autoencoders: the beginning; today’s reading. 18 Oct 2023

Auto-Encoding Variational Bayes.

Diederik P. Kingma and Max Welling.

tl;dr: One of the first variational auto encoder papers.

@misc{kingma_auto_encoding_2013,
  title = {Auto-{Encoding} {Variational} {Bayes}},
  url = {https://arxiv.org/abs/1312.6114v11},
  language = {en},
  urldate = {2023-10-11},
  journal = {arXiv.org},
  author = {Kingma, Diederik P. and Welling, Max},
  month = dec,
  year = {2013},
  freepdf = {https://arxiv.org/pdf/1312.6114v11.pdf},
  code = {https://paperswithcode.com/paper/auto-encoding-variational-bayes},
  tldr = {One of the first variational auto encoder papers.}
}

How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions are two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.

Pretrained inpainting for robotic exploration; today’s reading. 16 Oct 2023

Pre-Trained Masked Image Model for Mobile Robot Navigation.

Vishnu Dutt Sharma, Anukriti Singh, and Pratap Tokekar.

tl;dr: Robotic exploration and map building. Uses an off-the-shelf model for inpainting, MAE (Masked Autoencoder, He et al. 2022), and applies it to three contexts. For field-of-view expansion experiments, the larger the patches to be inpainted, the worse the performance. Tested with semantic and binary (occupancy) maps, synthetic data. No fine-tuning of MAE, and performance is better than classical techniques on single-agent and multiple-agent exploration. I liked the writing in this paper – the hypothesis and themes are very clear throughout.

@misc{sharma_pre_2023,
  title = {Pre-{Trained} {Masked} {Image} {Model} for {Mobile} {Robot} {Navigation}},
  url = {http://arxiv.org/abs/2310.07021},
  arxivdoi = {10.48550/arXiv.2310.07021},
  urldate = {2023-10-15},
  publisher = {arXiv},
  author = {Sharma, Vishnu Dutt and Singh, Anukriti and Tokekar, Pratap},
  month = oct,
  year = {2023},
  note = {arXiv:2310.07021 [cs]},
  keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics},
  website = {https://raaslab.org/projects/MIM4Robots},
  freepdf = {https://arxiv.org/pdf/2310.07021.pdf},
  tldr = {Robotic exploration and map building. Uses an off-the-shelf model for  inpainting, MAE (Masked Autoencoder, He et al. 2022), and applies it to three contexts. For field-of-view expansion experiments, the larger the patches to be inpainted, the worse the performance. Tested with semantic and binary (occupancy) maps, synthetic data. No fine-tuning of MAE, and performance is better than classical techniques on single-agent and multiple-agent exploration. I liked the writing in this paper -- the hypothesis and themes are very clear throughout.}
}

2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction-driven applications, especially in the dearth of training data. For more qualitative results see https://raaslab.org/projects/MIM4Robots.

3D reconstruction of seeds; today’s reading. 11 Oct 2023

Deep Learning Based 3d Reconstruction for Phenotyping of Wheat Seeds: a Dataset, Challenge, and Baseline Method.

Vsevolod Cherepashkin, Erenus Yildiz, Andreas Fischbach, Leif Kobbelt, and Hanno Scharr.

tl;dr: Three-dimensional reconstruction of wheat seeds for phenotyping. The most relevant trait is seed volume, because it is indicative of seed mass, which correlates to nutrients available to a seedling plant. The dataset consists of image data from robotic system phenoSeeder; different proportions of data are used in different scenarios. Test data is held back and uses three views; dataset train / val sets and challenge at the website link. Baseline methods are VGG11 and ResNet-152, no code published for the baselines.
```
@inproceedings{cherepashkin_deep_2023,
  author = {Cherepashkin, Vsevolod and Yildiz, Erenus and Fischbach, Andreas and Kobbelt, Leif and Scharr, Hanno},
  title = {Deep Learning Based 3d Reconstruction for Phenotyping of Wheat Seeds: a Dataset, Challenge, and Baseline Method},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
  month = oct,
  year = {2023},
  pages = {561-571},
  freepdf = {https://openaccess.thecvf.com/content/ICCV2023W/CVPPA/html/Cherepashkin_Deep_Learning_Based_3d_Reconstruction_for_Phenotyping_of_Wheat_Seeds_ICCVW_2023_paper.html},
  website = {https://helmholtz-data-challenges.de/web/challenges/challenge-page/135/overview},
  tldr = {Three-dimensional reconstruction of wheat seeds for phenotyping. The most relevant trait is seed volume, because it is indicative of seed mass, which correlates to nutrients available to a seedling plant. The dataset consists of image data from robotic system phenoSeeder; different proportions of data are used in different scenarios. Test data is held back and uses three views; dataset train / val sets and challenge at the website link. Baseline methods are VGG11 and ResNet-152, no code published for the baselines.}
}
```

Anomaly detection with VAEs; today’s reading. 09 Oct 2023

Variational autoencoder based anomaly detection using reconstruction probability.

Jinwon An and Sungzoon Cho.

tl;dr: In the context of anomaly detection with varational autoencoders, argues that reconstruction probability is a more objective measure than reconstruction error. Experiments with MNIST and KDD cup 1999 network intrustion dataset. The VAEs provide reconstructions as well as reconstruction probabilities.

@article{an_variational_2015,
  title = {Variational autoencoder based anomaly detection using reconstruction probability},
  author = {An, Jinwon and Cho, Sungzoon},
  journal = {Special lecture on IE},
  volume = {2},
  number = {1},
  pages = {1--18},
  year = {2015},
  tldr = {In the context of anomaly detection with varational autoencoders, argues that reconstruction probability is a more objective measure than reconstruction error. Experiments with MNIST and KDD cup 1999 network intrustion dataset. The VAEs provide reconstructions as well as reconstruction probabilities.},
  code = {https://paperswithcode.com/paper/variational-autoencoder-based-anomaly},
  freepdf = {http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf}
}

We propose an anomaly detection method using the reconstruction probability from the variational autoencoder. The reconstruction probability is a probabilistic measure that takes into account the variability of the distribution of variables. The reconstruction probability has a theoretical background making it a more principled and objective anomaly score than the reconstruction error, which is used by autoencoder and principal components based anomaly detection methods. Experimental results show that the proposed method outperforms autoencoder based and principal components based methods. Utilizing the generative characteristics of the variational autoencoder enables deriving the reconstruction of the data to analyze the underlying cause of the anomaly.

Lidar dataset paper; today’s reading. 06 Oct 2023

TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Mapping of Trees in Forests and Orchards .

Derek Cheng et al.

tl;dr: Dataset paper. Acquired LiDAR scans of forestry and large orchard trees (almond, pistachio) from under the canopy, using small UAVs or a mobile unit in a backpack or cart. Provides semantic segmentation labels of scans for tree stems, ground, and misc. Provides ground truth diameter at breast height (DBH) measurements. Baseline semantic segmentation methods are RangeNet++, SqueezeSegV2, and SqueezeSegV3. Baseline diameter estimation methods are DBCRE and SLOAM.

@misc{cheng_treescope_2023,
  title = { TreeScope: {An} {Agricultural} {Robotics} {Dataset} for {LiDAR}-{Based} {Mapping} of {Trees} in {Forests} and {Orchards} },
  shorttitle = {TreeScope},
  url = {http://arxiv.org/abs/2310.02162},
  arxivdoi = {10.48550/arXiv.2310.02162},
  urldate = {2023-10-06},
  publisher = {arXiv},
  author = {Cheng, Derek and Ojeda, Fernando Cladera and Prabhu, Ankit and Liu, Xu and Zhu, Alan and Green, Patrick Corey and Ehsani, Reza and Chaudhari, Pratik and Kumar, Vijay},
  month = oct,
  year = {2023},
  note = {arXiv:2310.02162 [cs]},
  keywords = {Computer Science - Robotics},
  annote = {Comment: Submitted to 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) for review},
  freepdf = {https://arxiv.org/pdf/2310.02162.pdf},
  website = {https://treescope.org/},
  code = {https://github.com/KumarRobotics/treescope},
  tldr = {Dataset paper. Acquired LiDAR scans of forestry and large orchard trees (almond, pistachio) from under the canopy, using small UAVs or a mobile unit in a backpack or cart. Provides semantic segmentation labels of scans for tree stems, ground, and misc. Provides ground truth diameter at breast height (DBH) measurements. Baseline semantic segmentation methods are RangeNet++, SqueezeSegV2, and SqueezeSegV3. Baseline diameter estimation methods are DBCRE and SLOAM.}
}

Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics dataset for precision agriculture and forestry addressing the counting and mapping of trees in forestry and orchards. TreeScope provides LiDAR data from agricultural environments collected with robotics platforms, such as UAV and mobile robot platforms carried by vehicles and human operators. In the first release of this dataset, we provide ground-truth data with over 1,800 manually annotated semantic labels for tree stems and field-measured tree diameters. We share benchmark scripts for these tasks that researchers may use to evaluate the accuracy of their algorithms. Finally, we run our open-source diameter estimation and off-the-shelf semantic segmentation algorithms and share our baseline results.

3D Modeling of grape vines; today’s reading. 04 Oct 2023

Modelling wine grapevines for autonomous robotic cane pruning.

Henry Williams et al.

tl;dr: Systems paper concerning the 3D modeling of grape vines, for cane pruning. Uses learned methods for panoptic segmentation and stereo inference. (Detectron 2 for panoptic segmentation, HSMnet for stereo inference.) Uses an over-the-row unit with two UR5 arms to acquire camera data.

@article{williams_modelling_2023,
  title = {Modelling wine grapevines for autonomous robotic cane pruning},
  author = {Williams, Henry and Smith, David and Shahabi, Jalil and Gee, Trevor and Nejati, Mahla and McGuinness, Ben and Black, Kale and Tobias, Jonathan and Jangali, Rahul and Lim, Hin and others},
  journal = {Biosystems Engineering},
  volume = {235},
  pages = {31--49},
  year = {2023},
  publisher = {Elsevier},
  doi = {https://doi.org/10.1016/j.biosystemseng.2023.09.006},
  url = {https://www.sciencedirect.com/science/article/pii/S1537511023001897},
  freepdf = {https://doi.org/10.1016/j.biosystemseng.2023.09.006},
  tldr = {Systems paper concerning the 3D modeling of grape vines, for cane pruning. Uses learned methods for panoptic segmentation and stereo inference. (Detectron 2 for panoptic segmentation, HSMnet for stereo inference.) Uses an over-the-row unit with two UR5 arms to acquire camera data.}
}

Aotearoa (New Zealand) has a strong and growing winegrape industry struggling to access workers to complete skilled, seasonal tasks such as pruning. Maintaining high-producing vines requires training agricultural workers that can make quality cane pruning decisions, which can be difficult when workers are not readily available. A novel vision system for an autonomous cane pruning robot is presented that can assess a vine to make quality pruning decisions like an expert. The vision system is designed to generate an accurate digital 3D model of a vine with skeletonised cane structures to estimate key pruning metrics for each cane. The presented approach has been extensively evaluated in a real-world vineyard as a commercial platform would be expected to operate. The system is demonstrated to perform consistently at extracting dimensionally accurate digital models of the vines. Detailed evaluation of the digital models shows that 51.45% of the canes were modelled entirely, with a further 35.51% only missing a single internode connection. The quantified results demonstrate that the robotic platform can generate dimensionally accurate metrics of the canes for future decision-making and automation of pruning.

Non-plant roots; today’s reading. 01 Oct 2023

The Beauty of Roots.

John C. Baez, J. Daniel Christensen, and Sam Derbyshire.

tl;dr: A ’Short Stories’ paper, 3 pages. Considers Littlewood polynomials (each coefficient is +1 or -1) of degree n, and the patterns that arise from plotting the set of all roots for a particular degree. Note: Figures are plots of the complex plane, with the intensity proportional to the number of roots at that point. The plots resemble a unit circle with fractal patterns on the circle’s boundary. Subject area is outside of my regular reading; I enjoyed the article.
```
@article{baez_beauty_2023,
  title = {The {Beauty} of {Roots}},
  url = {https://www.ams.org/journals/notices/202309/noti2789/noti2789.html?adat=October%202023&trk=2789&galt=none&cat=feature&pdfissue=202309&pdffile=rnoti-p1495.pdf},
  doi = {10.1090/noti2789},
  journal = {Notices of the American Mathematical Society},
  author = {Baez, John C. and Christensen, J. Daniel and Derbyshire, Sam},
  month = oct,
  year = {2023},
  freepdf = {https://www.ams.org/journals/notices/202309/rnoti-p1495.pdf},
  website = {https://johncarlosbaez.wordpress.com/2011/12/11/the-beauty-of-roots/},
  tldr = {A 'Short Stories' paper, 3 pages. Considers Littlewood polynomials (each coefficient is +1 or -1) of degree n, and the patterns that arise from plotting the set of all roots for a particular degree. Note: Figures are plots of the complex plane, with the intensity proportional to the number of roots at that point. The plots resemble a unit circle with fractal patterns on the circle's boundary. Subject area is outside of my regular reading; I enjoyed the article.}
}
```

Normalization methods in DNNs; today’s reading. 20 Sep 2023

Normalization Techniques in Training DNNs: Methodology, Analysis and Application.

Lei Huang, Jie Qin, Yi Zhou, Fan Zhu, Li Liu, and Ling Shao.

tl;dr: A review and commentary of normalization methods in DNNs. I skimmed this one. Good for definitions of all of the normalization terms and especially Figure 1.

@article{huang_normalization_2023,
  title = {Normalization {Techniques} in {Training} {DNNs}: {Methodology}, {Analysis} and {Application}},
  volume = {45},
  issn = {1939-3539},
  shorttitle = {Normalization {Techniques} in {Training} {DNNs}},
  doi = {10.1109/TPAMI.2023.3250241},
  number = {8},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  author = {Huang, Lei and Qin, Jie and Zhou, Yi and Zhu, Fan and Liu, Li and Shao, Ling},
  month = aug,
  year = {2023},
  note = {Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence},
  keywords = {Batch normalization, Biological neural networks, Covariance matrices, Decorrelation, deep neural networks, image classification, Optimization, survey, Task analysis, Tensors, Training, weight normalization},
  pages = {10173--10196},
  arxiv = {arXiv:2009.12836 [cs, stat]},
  arxivdoi = {https://doi.org/10.48550/arXiv.2009.12836},
  freepdf = {https://arxiv.org/pdf/2009.12836.pdf},
  tldr = {A review and commentary of normalization methods in DNNs. I skimmed this one.  Good for definitions of all of the normalization terms and especially Figure 1.}
}

Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs), and have successfully been used in various applications. This paper reviews and comments on the past, present and future of normalization methods in the context of DNN training. We provide a unified picture of the main motivation behind different approaches from the perspective of optimization, and present a taxonomy for understanding the similarities and differences between them. Specifically, we decompose the pipeline of the most representative normalizing activation methods into three components: the normalization area partitioning, normalization operation and normalization representation recovery. In doing so, we provide insight for designing new normalization technique. Finally, we discuss the current progress in understanding normalization methods, and provide a comprehensive review of the applications of normalization for particular tasks, in which it can effectively solve the key issues.

Whitening transformations and orthogonality of random variables; today’s reading. 08 Sep 2023

Optimal Whitening and Decorrelation.

Agnan Kessy, Alex Lewin, and Korbinian Strimmer.

tl;dr: Covers ‘whitening’, linear transforms that convert random vectors to another random vector, where the new random vector has covariance equal to the identity matrix. Five types discussed: zero-phase components analysis (ZCA) or Mahalanobis whitening, PCA whitening, Cholesky whitening, ZCA-cor, and PCA-cor. ZCA whitening is used in paper ‘CamP: Camera Preconditioning for Neural Radiance Fields’, Park et al. 2023. ‘Whitening’ is equivalent to the term ‘sphering’.

@article{kessy_optimal_2018,
  title = {Optimal {Whitening} and {Decorrelation}},
  volume = {72},
  issn = {0003-1305},
  url = {https://doi.org/10.1080/00031305.2016.1277159},
  doi = {10.1080/00031305.2016.1277159},
  number = {4},
  urldate = {2023-09-08},
  journal = {The American Statistician},
  author = {Kessy, Agnan and Lewin, Alex and Strimmer, Korbinian},
  month = oct,
  year = {2018},
  note = {Publisher: Taylor \& Francis
  \_eprint: https://doi.org/10.1080/00031305.2016.1277159},
  keywords = {CAR score, CAT score, Cholesky decomposition, Decorrelation, Principal components analysis, Whitening, ZCA-Mahalanobis transformation},
  pages = {309--314},
  arxiv = {arXiv:1512.00809 [stat]},
  arxivdoi = {https://doi.org/10.48550/arXiv.1512.00809},
  tldr = {Covers `whitening', linear transforms that convert random vectors to another random vector, where the new random vector has covariance equal to the identity matrix. Five types discussed: zero-phase components analysis (ZCA) or Mahalanobis whitening, PCA whitening, Cholesky whitening, ZCA-cor, and PCA-cor. ZCA whitening is used in paper `CamP: Camera Preconditioning for Neural Radiance Fields', Park et al. 2023. `Whitening' is equivalent to the term `sphering'.}
}

Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example, based on principal component analysis (PCA), Cholesky matrix decomposition, and zero-phase component analysis (ZCA), among others. Here, we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.

Preconditioning in NeRF; today’s reading. 06 Sep 2023

CamP: Camera Preconditioning for Neural Radiance Fields.

Keunhong Park, Philipp Henzler, Ben Mildenhall, Jonathan T. Barron, and Ricardo Martin-Brualla.

tl;dr: NeRF joint optimization of camera parameters and scene reconstruction. Uses a left preconditioner for each camera’s parameters (Zero Component Analysis (ZCA) whitening transform (Kessy et al. 2018)), derived from a projection function; apply this at the initial iteration of the optimization. The new method is implemented on top of Zip-NeRF (Barron et al. 2023).

@article{park_camp_2023,
  author = {Park, Keunhong and Henzler, Philipp and Mildenhall, Ben and Barron, Jonathan T. and Martin-Brualla, Ricardo},
  title = {CamP: Camera Preconditioning for Neural Radiance Fields},
  journal = {ACM Trans. Graph.},
  publisher = {ACM},
  year = {2023},
  issue_date = {December 2023},
  freepdf = {https://arxiv.org/pdf/2308.10902},
  website = {https://camp-nerf.github.io/},
  arxiv = {arXiv:2308.10902 [cs.CV]},
  arxivdoi = {https://doi.org/10.48550/arXiv.2308.10902},
  tldr = {NeRF joint optimization of camera parameters and scene reconstruction. Uses a left preconditioner for each camera's parameters (Zero Component Analysis (ZCA) whitening transform (Kessy et al. 2018)), derived from a projection function; apply this at the initial iteration of the optimization. The new method is implemented on top of Zip-NeRF (Barron et al. 2023).}
}

Neural Radiance Fields (NeRF) can be optimized to obtain high-fidelity 3D scene reconstructions of objects and large-scale scenes. However, NeRFs require accurate camera parameters as input – inaccurate camera parameters result in blurry renderings. Extrinsic and intrinsic camera parameters are usually estimated using Structure-from-Motion (SfM) methods as a pre-processing step to NeRF, but these techniques rarely yield perfect estimates. Thus, prior works have proposed jointly optimizing camera parameters alongside a NeRF, but these methods are prone to local minima in challenging settings. In this work, we analyze how different camera parameterizations affect this joint optimization problem, and observe that standard parameterizations exhibit large differences in magnitude with respect to small perturbations, which can lead to an ill-conditioned optimization problem. We propose using a proxy problem to compute a whitening transform that eliminates the correlation between camera parameters and normalizes their effects, and we propose to use this transform as a preconditioner for the camera parameters during joint optimization. Our preconditioned camera optimization significantly improves reconstruction quality on scenes from the Mip-NeRF 360 dataset: we reduce error rates (RMSE) by 67% compared to state-of-the-art NeRF approaches that do not optimize for cameras like Zip-NeRF, and by 29% relative to state-of-the-art joint optimization approaches using the camera parameterization of SCNeRF. Our approach is easy to implement, does not significantly increase runtime, can be applied to a wide variety of camera parameterizations, and can straightforwardly be incorporated into other NeRF-like models.

Deep learning terminology and references in François Fleuret’s book; today’s reading. 01 Sep 2023

The Little Book of Deep Learning.

François Fleuret.

tl;dr: I really like this introduction to deep learning and reference guide. Want to remember a term without getting in too deep? This little book has it, and the top-level references if I want to read more. See the website to order a physical version, printing two book pages per printed page worked well for me too.
```
@article{fleuret_little_2023,
  title = {The {Little} {Book} of {Deep} {Learning}},
  language = {en},
  author = {Fleuret, François},
  isbn = {9781447678618},
  year = {2023},
  freepdf = {https://fleuret.org/public/lbdl.pdf},
  website = {https://fleuret.org/francois/lbdl.html},
  tldr = {I really like this introduction to deep learning and reference guide. Want to remember a term without getting in too deep? This little book has it, and the top-level references if I want to read more. See the website to order a physical version, printing two book pages per printed page worked well for me too.}
}
```
```
This is a short introduction to deep learning for readers with a STEM background, originally designed to be read on a phone screen.
```

Code optimization in Python; Today’s reading. 31 Aug 2023

Making Your Python Code Run Faster.

Brandon Rohrer.

tl;dr: Profiling, vectorization, pre-compilation with Numba, 10 optimization suggestions, "try it and test it", examples presented in context of a physics simulation. Good discussions about troubleshooting and debugging, when to visualize, determining project goals.
```
@incollection{rohrer_chapter6_2023,
  author = {Rohrer, Brandon},
  title = {Making Your Python Code Run Faster},
  chapter = {6},
  booktitle = {How to Train Your Robot},
  crossref = {rohrer_train_2023},
  year = {2023},
  urldate = {2023-08-30},
  url = {https://raw.githubusercontent.com/brohrer/how-to-train-your-robot/main/chapter_6/chapter_6.pdf},
  code = {https://github.com/brohrer/how-to-train-your-robot/tree/main/chapter_6},
  tldr = {Profiling, vectorization, pre-compilation with Numba, 10 optimization suggestions, "try it and test it", examples presented in context of a physics simulation. Good discussions about troubleshooting and debugging, when to visualize, determining project goals.},
  freepdf = {https://tyr.fyi/6},
  website = {https://tyr.fyi}
}
```

Aerial sampling of insects with a UAS; Today’s reading. 30 Aug 2023

Potential of Unmanned Aerial Sampling for Monitoring Insect Populations in Rice Fields.

Hong Geun Kim, Jong-Seok Park, and Doo-Hyung Lee.

tl;dr: Need to monitor for seasonal insect migrations in rice fields. Uses a UAS with small nets to collect samples of insects at different altitudes. To my knowledge, the only work to collect insects with a UAS versus using already-tagged insects.

@article{kim_potential_2018,
  title = {Potential of {Unmanned} {Aerial} {Sampling} for {Monitoring} {Insect} {Populations} in {Rice} {Fields}},
  volume = {101},
  issn = {0015-4040, 1938-5102},
  url = {https://doi.org/10.1653/024.101.0229.full},
  doi = {10.1653/024.101.0229},
  number = {2},
  urldate = {2023-08-28},
  journal = {Florida Entomologist},
  author = {Kim, Hong Geun and Park, Jong-Seok and Lee, Doo-Hyung},
  month = jun,
  year = {2018},
  note = {Publisher: Florida Entomological Society},
  pages = {330--334},
  tldr = {Need to monitor for seasonal insect migrations in rice fields. Uses a UAS with small nets to collect samples of insects at different altitudes. To my knowledge, the only work to collect insects with a UAS versus using already-tagged insects.}
}

Conventionally, sampling for insects has been limited to the ground level or low altitudes. Recent progress in unmanned aerial vehicles has made it more feasible to use this technique for aerial sampling of insect populations. In this study, we developed a rotary-wing unmanned aerial vehicle with remote-controlled insect net openings that allows serial sampling at designated altitudes. A total of 21 flights using the unmanned aerial vehicle system captured 251 insects in 6 orders and 22 families at 5, 10, 50, and 100 m above rice fields in South Korea. The results of this study demonstrate that the aerial sampling can collect diverse pest and beneficial insects above rice fields and demonstrate a promising alternative to conventional sampling methods.

VAEs; Today’s reading. 29 Aug 2023

Tutorial on Variational Autoencoders.

Carl Doersch.

tl;dr: Published 2016 with edits in 2021. Tutorial on Variational Autoencoders, reparameterization trick, and Conditional Variational Autoencoders. Examples using MNIST. Does not assume prior knowledge of variational Bayesian methods.

@misc{doersch_tutorial_2021,
  title = {Tutorial on {Variational} {Autoencoders}},
  url = {http://arxiv.org/abs/1606.05908},
  language = {en},
  urldate = {2023-08-28},
  publisher = {arXiv},
  author = {Doersch, Carl},
  month = jan,
  year = {2021},
  arxiv = {arXiv:1606.05908 [cs, stat]},
  arxivdoi = {https://doi.org/10.48550/arXiv.1606.05908},
  keywords = {Computer Science - Machine Learning, Statistics - Machine Learning},
  code = {https://github.com/cdoersch/vae_tutorial},
  tldr = {Published 2016 with edits in 2021. Tutorial on Variational Autoencoders, reparameterization trick, and Conditional Variational Autoencoders. Examples using MNIST. Does not assume prior knowledge of  variational Bayesian methods.}
}

In just three years, Variational Autoencoders (VAEs) have emerged as one of the most popular approaches to unsupervised learning of complicated distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks), and can be trained with stochastic gradient descent. VAEs have already shown promise in generating many kinds of complicated data, including handwritten digits [1, 2], faces [1, 3, 4], house numbers [5, 6], CIFAR images [6], physical models of scenes [4], segmentation [7], and predicting the future from static images [8]. This tutorial introduces the intuitions behind VAEs, explains the mathematics behind them, and describes some empirical behavior. No prior knowledge of variational Bayesian methods is assumed.

Amy Tabb

Today I read.