Publications
This page contains my Publications, Patents and Projects
2025
-
FAV3R: Fast and Accurate 3D VR Sketch to 3D Shape RetrievalMritunjoy Halder, Shivam Ashok Shukla, Lokender Tiwari, and 2 more authorsIn , 2025With the increasing demand for 3D content in AR/VR applications, 3D shape libraries are expanding rapidly. As a result, the ability to retrieve 3D shapes from these libraries has become an essential component. It is well established that text-based queries are insufficient for accurately describing geometric shapes, and 2D sketches lack the necessary detail. In contrast, 3D sketches are more effective at conveying geometric properties. In this paper, we introduce a new learning-based approach for retrieving 3D shapes from 3D VR sketches. Since there is a lack of large-scale datasets containing paired 3D VR sketches and 3D shapes, our second contribution is a novel data generation technique that automatically converts 3D shapes into human-like 3D VR sketches. We conduct a comprehensive evaluation of our data generation and retrieval methods against state-of-the-art approaches, demonstrating that our method is both efficient and achieves superior retrieval accuracy.
-
InGenCo : Integrated In-Place 3D Scenario Generation and CollaborationRaghav Mittal, Lokender Tiwari, Satyam Bhardwaj, and 2 more authorsIn 2025 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), 2025Human-robot collaboration is essential in AI-powered Industry 5.0 and personal spaces, aiming to boost human creativity and efficiency. Traditional methods involve creating virtual scenarios in VR to simulate interactions, but they often rely on pre-defined 3D assets or procedural generation, which are time-consuming, expensive, and limited by restricted design options and the need for 3D expertise. Many systems are also desktop-based and lack intuitive interfaces, limiting interactive scenario creation and in-place testing. This paper presents an integrated XR-based system that combines GenAI based 3D scenario generation and human-robot collaboration in real-time. It allows users to create and modify virtual environments on-the-fly using voice and gesture inputs, and test collaborations directly in mixed reality, accelerating decision-making. The system, powered by Generative AI and XR, enables even non-experts to generate 3D assets, assemble them, configure robots, and perform collaborative tasks. It features an intuitive interface and multiple configuration options. Through experiments and ablations, the system’s effectiveness is validated, showing its potential for interactive 3D scenario creation and in-place collaboration in XR, with applications in robotics, gaming and other fields.
2024
-
A transmission model based deep neural network for image dehazingTannistha Pal, Mritunjoy Halder, and Sattwik BaruaMultimedia Tools and Applications, 2024In recent years, one of the main contributors to traffic fatalities and poor vision in poor weather conditions has been rising. In order to lower accident rates, improving visibility is essential for driver assistance systems, image acquisition, and surveillance systems. Thus, as a result, this research introduces a novel Deep Neural Network (DNN) based on a scattering model for defogging an image that has been affected with fog. The proposed model consists of two components: the fog dilution model and the diluted fog removal model. In addition, we have incorporated a deep learning-based transmission estimation module. The fog dilution model is designed to mitigate the presence of fog, while the diluted fog removal model aims to completely eliminate the remaining fog. To enhance the accuracy of transmission estimation, we employ a green channel prior-based approach. This approach effectively reduces the distortion caused in the sky in the resulting defogged image. By combining these components, our model offers a comprehensive solution for fog removal, with improved image quality and reduced sky artifacts. In addition to addressing the shortcomings of current vision enhancement techniques as mentioned in the paper, the proposed method also satisfies human perceptual needs. Experimental results have demonstrated that the proposed model delivers superior visibility enhancement results for both non-reference and full-reference metrics following a qualitative comparison with eight state-of-the-art defogging techniques. The experimental results have proven that the proposed method achieves superior performance in terms of qualitative evaluation on non-reference metric (i.e., in terms of e = 0.50, = 0.15, r = 1.49) and reference metric (i.e. in terms of MSE= 397.16, PSNR = 23.38, NCC = 0.99, MD = 24.39, NAE = 0.13) compared with eight state-of-the-art dehazing methods. Furthermore, based on the average computational time achieved by the proposed method (0.17 s using HSTS dataset), it can be highly suitable for real-time applications. The suggested approach could therefore be employed as a viable route for improving vision in surveillance and acquisition systems while lowering the risk to users’ safety.
-
Anomalous activity detection for mobile surveillance robotsSnehasis Banerjee, Balamuralidhar Purushothaman, and Mritunjoy HalderMay 2024US Patent App. 18/473,595Technical challenge in unusual human activity detection task is to rightly identify only unexpected or unusual movements from constant regular movements present in a scene, with most techniques built on understanding that camera is static. However, ego view camera of mobile surveillance robot is in motion as robot navigates. Embodiments herein provide a method and system for anomalous activity detection for mobile surveillance robots by mimicking ‘Konio-Parvocellular-Magno’ cells of the human brain into a NN model, which are responsible for detecting slow, normal, and swift changes in perceived scenes. To detect anomalous activity, the static or normal movements of scene captured by ego view camera are identified as redundant information and only RoI is forwarded for further processing using the Optical flow and SSIM techniques. The NN model mimicking KPM is trained only on the RoI to detect normal or anomalous activity.
2023
-
Anomalous activity detection from ego view camera of surveillance robotsMritunjoy Halder, Snehasis Banerjee, and Balamuralidhar PurushothamanIn 2023 International Joint Conference on Neural Networks (IJCNN), May 2023Can a surveillance robot autonomously detect anomalous activity from its ego view camera perception? This is a challenging task as it requires identifying what is normal and what is an abnormal pattern - given the variations of possible anomalies and abnormalities. This paper presents an architecture and method based on a spatio-temporal convolution neural network to detect and classify anomalies. This work is inspired by the ‘Konio-Magno-Parvocellular’ cells of the human brain, which is claimed to aid humans in organizing changes in perceived scenes. The model is trained and tested on a benchmark video dataset of human activity. We have obtained 91% testing accuracy on this dataset. Experiments in simulation as well as deployment on a real robot shows that the proposed methodology can identify anomalous activities effectively. We have also listed down the observations from practical deployment of the model.
- Multi-feature based hazy image classification for vision enhancementTannistha Pal, Mritunjoy Halder, and Sattwik BaruaProcedia Computer Science, May 2023
Deteriorating contrast, low glow, restricted dynamic range, poor resolution details, non-bright natural landscape colours, and reduced saturation of an image are subjected to various degrees of influence and deterioration in hazy weather conditions. Dehazing haze degraded images becomes challenging if they are not classified as hazy or clear, given that image dehazing techniques can only be used with hazy images. The competence to differentiate between hazy and clear images can not be left to human perception; hence a robust model is needed that classifies the input image into hazy and clear. Thus, we propose a nine unique features-based image classification framework based on K-Nearest Neighbour (KNN), which can accurately classify hazy and clear images. Experimental results demonstrate that the proposed method can efficiently classify the hazy and clear images, with an accuracy of 92%, a precision of 0.90, recall of 0.96, and F1 score of 0.93 for a benchmark dataset, which has both theoretical and practical implications.
- A deep learning model to detect foggy images for vision enhancementTannistha Pal, Mritunjoy Halder, and Sattwik BaruaThe Imaging Science Journal, May 2023
Fog limits meteorological visibility, posing a significant danger to road safety. Poor visibility is considered as a significant contributor to road accidents in foggy weather conditions. Regardless, image defogging techniques can only work with foggy images. In a real-time system, however, defogging foggy images becomes difficult if they are not identified as foggy or clear. Because we cannot rely on human vision to distinguish between foggy and clear pictures, we need a robust model that classifies the input image as foggy or clear based on some features. This paper proposes a robust Deep Learning (DL) model based on Convolutional Neural Network (CNN) for classifying the input as foggy and clear. The proposed Deep Neural Network (DNN) architecture is efficient and precise enough to classify images as foggy or clear, with a training time complexity of O(n2) and a prediction time complexity of O(n). The experimental results reveal promising results in both qualitative and quantitative assessments. The model has an accuracy of 94.8%, precision of 91.8%, recall of 75.8% and F1 score of 80.3% when evaluated on the SOTS dataset, indicating that it might be utilized to mitigate the safety risk in vision enhancement systems.
- Dehazing and vision enhancement: challenges and future scopeSattwik Barua, Tannistha Pal, and Mritunjoy HalderIn IET Intelligent Multimedia Processing and Computer Vision, May 2023
Poor visibility of outdoor images has been drastically increased. Applications using computer vision, including surveillance systems, intelligent transportation systems, are not able to function properly due to limited visibility. Numerous image dehazing methods have been introduced as a solution to this problem, and they are crucial in enhancing the functionality of several computer vision systems. The dehazing approaches are intriguing to researchers as a consequence. In order to demonstrate that dehazing techniques could be successfully used in actual practice, this study conducts an extensive examination of the state-of-the-art dehazing approaches. In contrast, it motivates scholars to apply some of these methods for removing haze from hazy images. In this chapter, we discuss several robust mathematical models along with some neural network-based approaches and their implementations in various aspects. Finally, we address several concerns about difficulties and potential future applications of dehazing approaches. Due to poor visibility conditions, the visibility of outdoor images is drastically decreased. Applications using computer vision, including surveillance systems, intelligent transportation systems, etc., are not able to function properly due to limited visibility. Numerous image dehazing methods have been introduced as a solution to this issue, and they are crucial in enhancing the functionality of several computer vision systems. The dehazing approaches are intriguing to researchers as a consequence. In order to demonstrate that dehazing techniques could be successfully used in actual practice, this study conducts an extensive examination of the state-of-the-art dehazing approaches. In contrast, it motivates scholars to apply some of these methods for removing haze from hazy images. We keep an eye on the robust mathematical models along with some neural network-based approaches and their implementations in various aspects. Finally, we address several concerns about difficulties and potential future applications of dehazing approaches.
- Detecting Emotional Sentiment in CartoonsMay 2023Revelation23
Social media platforms are widely used by individuals and organizations to express emotions, opinions, and ideas. These platforms generate vast amounts of data, which can be analyzed to gain insights into user behavior, preferences, and sentiment. Accurately classifying the sentiment of social media posts can provide valuable insights for businesses, individuals, and organizations to make informed decisions. To accomplish this task, a customized private cartoon dataset (original images) of social media posts has been provided, which contains labels for each post’s emotion category, such as happy, angry, sad, or neutral. The task is to build and fine-tune a machine-learning model that accurately classifies social media posts into their corresponding emotion categories, using synthetic images.
2022
-
A Framework for Sex Identification, Accent and Emotion Recognition from Speech SamplesSattwik Barua, Mritunjoy Halder, and Mohit KumarIn 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), May 2022The purpose of understanding and interpreting the world lies in the cognitive system of a living body. The difficulties that humans face in acknowledging cognitive senses and abilities are only because of the singularity of our views on human cognition. If animals have feelings, there might be a case to be made that robots do have. Non-expert users interacting with robots may mislead the functionality of the robot. If someone is present in front of a robot and the robot can determine their sex, accent of English and emotion simply from their speech data, it helps in development of an embedded cognitive model. Though having limitations, studies in neuroscience have shown that animal models have significantly contributed to the understanding of the mechanistic and functional aspects of cognitive activities such as speaking. As a result, the usage of robots in a range of industries requires the capacity to make understandable and expressive speech. Thus, we present three models for identifying sex, accent, and emotion. To attain better outcomes, we have proposed a Deep Neural Network (DNN) and adjusted it as needed. To evaluate how well the model can perform, we have employed the attention layer, bidirectional Long Short Term Memory (LSTM) and Dropout concepts and judged our trainable datapoints to be very large. The experimental results show our superior findings, in terms of accuracy obtained in identification of sex and accent: 94.62%, 97.37% consecutively and recognition of emotion: 99.84%, which suggest that our study might be applied in the future as a noteworthy solution for the aforementioned problems.