Electrical Engineering and Systems Science
- [1] arXiv:2405.14875 [pdf, ps, other]
-
Title: BloodCell-Net: A lightweight convolutional neural network for the classification of all microscopic blood cell images of the human bodySohag Kumar Mondal, Md. Simul Hasan Talukder, Mohammad Aljaidi, Rejwan Bin Sulaiman, Md Mohiuddin Sarker Tushar, Amjad A AlsuwaylimiComments: 24 pages, 7 tables and 13 FiguresSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Blood cell classification and counting are vital for the diagnosis of various blood-related diseases, such as anemia, leukemia, and thrombocytopenia. The manual process of blood cell classification and counting is time-consuming, prone to errors, and labor-intensive. Therefore, we have proposed a DL based automated system for blood cell classification and counting from microscopic blood smear images. We classify total of nine types of blood cells, including Erythrocyte, Erythroblast, Neutrophil, Basophil, Eosinophil, Lymphocyte, Monocyte, Immature Granulocytes, and Platelet. Several preprocessing steps like image resizing, rescaling, contrast enhancement and augmentation are utilized. To segment the blood cells from the entire microscopic images, we employed the U-Net model. This segmentation technique aids in extracting the region of interest (ROI) by removing complex and noisy background elements. Both pixel-level metrics such as accuracy, precision, and sensitivity, and object-level evaluation metrics like Intersection over Union (IOU) and Dice coefficient are considered to comprehensively evaluate the performance of the U-Net model. The segmentation model achieved impressive performance metrics, including 98.23% accuracy, 98.40% precision, 98.25% sensitivity, 95.97% Intersection over Union (IOU), and 97.92% Dice coefficient. Subsequently, a watershed algorithm is applied to the segmented images to separate overlapped blood cells and extract individual cells. We have proposed a BloodCell-Net approach incorporated with custom light weight convolutional neural network (LWCNN) for classifying individual blood cells into nine types. Comprehensive evaluation of the classifier's performance is conducted using metrics including accuracy, precision, recall, and F1 score. The classifier achieved an average accuracy of 97.10%, precision of 97.19%, recall of 97.01%, and F1 score of 97.10%.
- [2] arXiv:2405.14878 [pdf, ps, other]
-
Title: Improving and Evaluating Machine Learning Methods for Forensic Shoeprint MatchingDivij Jain, Saatvik Kher, Lena Liang, Yufeng Wu, Ashley Zheng, Xizhen Cai, Anna Plantinga, Elizabeth UptonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Applications (stat.AP)
We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random forest that generates a probabilistic measurement of how likely two prints are to have originated from the same outsole. We assess the generalisability of machine learning methods trained on lab shoeprint scans to more realistic crime scene shoeprint data by evaluating the accuracy of our methods on several shoeprint scenarios: partial prints, prints with varying levels of blurriness, prints with different amounts of wear, and prints from different shoe models. We find that models trained on one type of shoeprint yield extremely high levels of accuracy when tested on shoeprint pairs of the same scenario but fail to generalise to other scenarios. We also discover that models trained on a variety of scenarios predict almost as accurately as models trained on specific scenarios.
- [3] arXiv:2405.14886 [pdf, ps, other]
-
Title: Brain MRI detection by Sematic Segmentation models- Transfer Learning approachSubjects: Image and Video Processing (eess.IV)
The paper discusses the use of MRI for segmentation techniques, specifically focusing on brain tumor detection. It discusses the use of convolutional neural networks (CNN) for automatic segmentation but also discusses challenges such as non-isotropic resolution, Rician noise, and bias field effects. The paper proposes models like VGG16, ResNet50, and ResU-net to predict MRI images based on original and predicted masks. ResNet50 is found to be a promising model with high accuracy and F1 score.
- [4] arXiv:2405.14900 [pdf, ps, other]
-
Title: Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning ChallengeKendall Schmidt (American College of Radiology, USA), Benjamin Bearce (The Massachusetts General Hospital, USA and University of Colorado, USA), Ken Chang (The Massachusetts General Hospital), Laura Coombs (American College of Radiology, USA), Keyvan Farahani (National Institutes of Health National Cancer Institute, USA), Marawan Elbatele (Computer Vision and Robotics Institute, University of Girona, Spain), Kaouther Mouhebe (Computer Vision and Robotics Institute, University of Girona, Spain), Robert Marti (Computer Vision and Robotics Institute, University of Girona, Spain), Ruipeng Zhang (Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China and Shanghai AI Laboratory, China), Yao Zhang (Shanghai AI Laboratory, China), Yanfeng Wang (Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China and Shanghai AI Laboratory, China), Yaojun Hu (Real Doctor AI Research Centre, Zhejiang University, China), Haochao Ying (Real Doctor AI Research Centre, Zhejiang University, China and School of Public Health, Zhejiang University, China), Yuyang Xu (Real Doctor AI Research Centre, Zhejiang University, China and College of Computer Science and Technology, Zhejiang University, China), Conrad Testagrose (University of North Florida College of Computing Jacksonville, USA), Mutlu Demirer (Mayo Clinic Florida Radiology, USA), Vikash Gupta (Mayo Clinic Florida Radiology, USA), Ünal Akünal (Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany), Markus Bujotzek (Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany), Klaus H. Maier-Hein (Division of Medical Image Computing, German Cancer Research Center, Heidelberg, Germany), Yi Qin (Electronic and Computer Engineering, Hong Kong University of Science and Technology, China), Xiaomeng Li (Electronic and Computer Engineering, Hong Kong University of Science and Technology, China), Jayashree Kalpathy-Cramer (The Massachusetts General Hospital, USA and University of Colorado, USA), Holger R. Roth (NVIDIA, USA)Comments: 16 pages, 9 figuresJournal-ref: Medical Image Analysis Volume 95, July 2024, 103206Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the generalizability of AI without the need to share data, the best way to preserve features from all training data during FL is an active area of research. To explore FL methodology, the breast density classification FL challenge was hosted in partnership with the American College of Radiology, Harvard Medical School's Mass General Brigham, University of Colorado, NVIDIA, and the National Institutes of Health National Cancer Institute. Challenge participants were able to submit docker containers capable of implementing FL on three simulated medical facilities, each containing a unique large mammography dataset. The breast density FL challenge ran from June 15 to September 5, 2022, attracting seven finalists from around the world. The winning FL submission reached a linear kappa score of 0.653 on the challenge test data and 0.413 on an external testing dataset, scoring comparably to a model trained on the same data in a central location.
- [5] arXiv:2405.14905 [pdf, ps, other]
-
Title: Structural Entities Extraction and Patient Indications Incorporation for Chest X-ray Report GenerationKang Liu, Zhuoqi Ma, Xiaolu Kang, Zhusi Zhong, Zhicheng Jiao, Grayson Baird, Harrison Bai, Qiguang MiaoSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
The automated generation of imaging reports proves invaluable in alleviating the workload of radiologists. A clinically applicable reports generation algorithm should demonstrate its effectiveness in producing reports that accurately describe radiology findings and attend to patient-specific indications. In this paper, we introduce a novel method, \textbf{S}tructural \textbf{E}ntities extraction and patient indications \textbf{I}ncorporation (SEI) for chest X-ray report generation. Specifically, we employ a structural entities extraction (SEE) approach to eliminate presentation-style vocabulary in reports and improve the quality of factual entity sequences. This reduces the noise in the following cross-modal alignment module by aligning X-ray images with factual entity sequences in reports, thereby enhancing the precision of cross-modal alignment and further aiding the model in gradient-free retrieval of similar historical cases. Subsequently, we propose a cross-modal fusion network to integrate information from X-ray images, similar historical cases, and patient-specific indications. This process allows the text decoder to attend to discriminative features of X-ray images, assimilate historical diagnostic information from similar cases, and understand the examination intention of patients. This, in turn, assists in triggering the text decoder to produce high-quality reports. Experiments conducted on MIMIC-CXR validate the superiority of SEI over state-of-the-art approaches on both natural language generation and clinical efficacy metrics.
- [6] arXiv:2405.14920 [pdf, ps, html, other]
-
Title: On Robust Controlled Invariants for Continuous-time Monotone SystemsComments: 11 pages, 3 figures, accepted: IFAC ADHS 2024, This paper explores the idea presented in arXiv:2306.13822 for the class of continuous systemsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper delves into the problem of computing robust controlled invariants for monotone continuous-time systems, with a specific focus on lower-closed specifications. We consider the classes of state monotone (SM) and control-state monotone (CSM) systems, we provide the structural properties of robust controlled invariants for these classes of systems and show how these classes significantly impact the computation of invariants. Additionally, we introduce a notion of feasible points, demonstrating that their existence is sufficient to characterize robust controlled invariants for the considered class of systems. The study further investigates the necessity of reducing the feasibility condition for CSM and Lipschitz systems, unveiling conditions that guide this reduction. Leveraging these insights, we construct an algorithm for the computation of robust controlled invariants. To demonstrate the practicality of our approach, we applied the developed algorithm to the coupled tank problem.
- [7] arXiv:2405.14934 [pdf, ps, html, other]
-
Title: Universal Robustness via Median Randomized Smoothing for Real-World Super-ResolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Most of the recent literature on image Super-Resolution (SR) can be classified into two main approaches. The first one involves learning a corruption model tailored to a specific dataset, aiming to mimic the noise and corruption in low-resolution images, such as sensor noise. However, this approach is data-specific, tends to lack adaptability, and its accuracy diminishes when faced with unseen types of image corruptions. A second and more recent approach, referred to as Robust Super-Resolution (RSR), proposes to improve real-world SR by harnessing the generalization capabilities of a model by making it robust to adversarial attacks. To delve further into this second approach, our paper explores the universality of various methods for enhancing the robustness of deep learning SR models. In other words, we inquire: "Which robustness method exhibits the highest degree of adaptability when dealing with a wide range of adversarial attacks ?". Our extensive experimentation on both synthetic and real-world images empirically demonstrates that median randomized smoothing (MRS) is more general in terms of robustness compared to adversarial learning techniques, which tend to focus on specific types of attacks. Furthermore, as expected, we also illustrate that the proposed universal robust method enables the SR model to handle standard corruptions more effectively, such as blur and Gaussian noise, and notably, corruptions naturally present in real-world images. These results support the significance of shifting the paradigm in the development of real-world SR methods towards RSR, especially via MRS.
- [8] arXiv:2405.14964 [pdf, ps, other]
-
Title: Black Start Operation of Grid-Forming Converters Based on Generalized Three-phase Droop Control Under Unbalanced ConditionsSubjects: Systems and Control (eess.SY)
This paper focuses on the challenging task of bottom-up restoration in a complete blackout system using Grid-forming (GFM) converters. Challenges arise due to the limited current capability of power converters, resulting in distinct dynamic responses and fault current characteristics compared to synchronous generators. Additionally, GFM control needs to address the presence of unbalanced conditions commonly found in distribution systems. To address these challenges, this paper explores the black start capability of GFM converters with a generalized three-phase GFM droop control. This approach integrates GFM controls individually for each phase, incorporating phase-balancing feedback and enabling current limiting for each phase during unbalanced faults or overloading. The introduction of a phase-balancing gain provides flexibility to trade-off between voltage and power imbalances. The study further investigates bottom-up black start operations using GFM converters, incorporating advanced load relays into breakers for gradual load energization without central coordination. The effectiveness of bottom-up black start operations with GFM converters, utilizing the generalized three-phase GFM droop, is evaluated through electromagnetic transient (EMT) simulations in MATLAB/Simulink. The results confirm the performance and effectiveness of this approach in achieving successful black start operations under unbalanced conditions.
- [9] arXiv:2405.14978 [pdf, ps, html, other]
-
Title: Analog or Digital In-memory Computing? Benchmarking through Quantitative ModelingSubjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Image and Video Processing (eess.IV)
In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exploring and benchmarking the peak performance of a single IMC macro rather than full system performance on real workloads. This paper aims to address the lack of a quantitative comparison of Analog In-Memory Computing (AIMC) and Digital In-Memory Computing (DIMC) processor architectures. We propose an analytical IMC performance model that is validated against published implementations and integrated into a system-level exploration framework for comprehensive performance assessments on different workloads with varying IMC configurations. Our experiments show that while DIMC generally has higher computational density than AIMC, AIMC with large macro sizes may have better energy efficiency than DIMC on convolutional-layers and pointwise-layers, which can exploit high spatial unrolling. On the other hand, DIMC with small macro size outperforms AIMC on depthwise-layers, which feature limited spatial unrolling opportunities inside a macro.
- [10] arXiv:2405.14994 [pdf, ps, html, other]
-
Title: Combining Euclidean Alignment and Data Augmentation for BCI decodingComments: 8 pages, 4 figures, 2 tables, accepted at Eusipco 2024Subjects: Signal Processing (eess.SP)
Automated classification of electroencephalogram (EEG) signals is complex due to their high dimensionality, non-stationarity, low signal-to-noise ratio, and variability between subjects. Deep neural networks (DNNs) have shown promising results for EEG classification, but the above challenges hinder their performance. Euclidean Alignment (EA) and Data Augmentation (DA) are two promising techniques for improving DNN training by permitting the use of data from multiple subjects, increasing the data, and regularizing the available data. In this paper, we perform a detailed evaluation of the combined use of EA and DA with DNNs for EEG decoding. We trained individual models and shared models with data from multiple subjects and showed that combining EA and DA generates synergies that improve the accuracy of most models and datasets. Also, the shared models combined with fine-tuning benefited the most, with an overall increase of 8.41\% in classification accuracy.
- [11] arXiv:2405.15085 [pdf, ps, html, other]
-
Title: Acoustical Features as Knee Health Biomarkers: A Critical AnalysisChristodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David AtienzaSubjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals and the underlying mechanisms at play. By addressing this critical gap, the present paper introduces a novel causal framework for validating knee acoustical features. We argue that current machine learning methodologies for acoustical knee diagnosis lack the required assurances and thus cannot be used to classify acoustic features as biomarkers. Our framework establishes a set of essential theoretical guarantees necessary to validate this claim. We apply our methodology to three real-world experiments investigating the effect of researchers' expectations, the experimental protocol and the wearable employed sensor. This investigation reveals latent issues such as underlying shortcut learning and performance inflation. This study is the first independent result reproduction study in the field of acoustical knee health evaluation. We conclude with actionable insights from our findings, offering valuable guidance to navigate these crucial limitations in future research.
- [12] arXiv:2405.15093 [pdf, ps, html, other]
-
Title: Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow SynthesisComments: 5 pages,4 figuresSubjects: Audio and Speech Processing (eess.AS)
Singing voice conversion is to convert the source sing voice into the target sing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech processing. In this paper, we propose a high-fidelity flow-based model based on multi-decoupling feature constraints, which enhances the capture of vocal details by integrating multiple encoders. We also use iSTFT to enhance the speed of speech processing by replacing some layers of the Vocoder. We compare the synthesized singing voice with other models from multiple dimensions, and our proposed model is highly consistent with the current state-of-the-art, with the demo which is available at \url{this https URL}
- [13] arXiv:2405.15098 [pdf, ps, other]
-
Title: Magnetic Resonance Image Processing Transformer for General ReconstructionComments: 29 pages, 3 figures, 3 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Medical Physics (physics.med-ph)
Purpose: To develop and evaluate a deep learning model for general accelerated MRI reconstruction.
Materials and Methods: This retrospective study built a magnetic resonance image processing transformer (MR-IPT) which includes multi-head-tails and a single shared window transformer main body. Three mutations of MR-IPT with different transformer structures were implemented to guide the design of our MR-IPT model. Pre-trained on the MRI set of RadImageNet including 672675 images with multiple anatomy categories, the model was further migrated and evaluated on fastMRI knee dataset with 25012 images for downstream reconstruction tasks. We performed comparison studies with three CNN-based conventional networks in zero- and few-shot learning scenarios. Transfer learning process was conducted on both MR-IPT and CNN networks to further validate the generalizability of MR-IPT. To study the model performance stability, we evaluated our model with various downstream dataset sizes ranging from 10 to 2500 images.
Result: The MR-IPT model provided superior performance in multiple downstream tasks compared to conventional CNN networks. MR-IPT achieved a PSNR/SSIM of 26.521/0.6102 (4-fold) and 24.861/0.4996 (8-fold) in 10-epoch learning, surpassing UNet128 at 25.056/0.5832 (4-fold) and 22.984/0.4637 (8-fold). With the same large-scale pre-training, MR-IPT provided a 5% performance boost compared to UNet128 in zero-shot learning in 8-fold and 3% in 4-fold.
Conclusion: MR-IPT framework benefits from its transformer-based structure and large-scale pre-training and can serve as a solid backbone in other downstream tasks with zero- and few-shot learning. - [14] arXiv:2405.15099 [pdf, ps, html, other]
-
Title: Stability analysis of nonlinear stochastic flexibility function in smart energy systemsSeyed Shahabaldin Tohidi, Tobias K. S. Ritschel, Georgios Tsaousoglou, Uffe Høgsbro Thygesen, Henrik MadsenSubjects: Systems and Control (eess.SY)
Demand-side management provides a great potential for improving the efficiency and reliability of energy systems. This requires a mechanism to connect the market level and the demand side. The flexibility function is a novel approach that bridges the gap between the markets and the dynamics of physical assets at the lower levels of the energy systems and activates demand-side flexibility with the purpose of decision-making as well as for offering a new framework for balancing and grid services. Employing this function as a key for many decision-making and control algorithms reveals that a mathematically rigorous stability analysis is required for it. In this paper, we investigate the stability properties of two nonlinear flexibility functions, as a dynamic mapping between electricity price and power consumption. Specifically, we analyze the stability of a deterministic flexibility function and an Itô stochastic flexibility function. Simulation results are also provided to demonstrate the dynamics of the flexibility functions and to show that the analytical results hold.
- [15] arXiv:2405.15127 [pdf, ps, html, other]
-
Title: Benchmarking Hierarchical Image Pyramid Transformer for the classification of colon biopsies and polyps in histopathology imagesNohemi Sofia Leon Contreras, Marina D'Amato, Francesco Ciompi, Clement Grisi, Witali Aswolinskiy, Simona Vatrano, Filippo Fraggetta, Iris NagtegaalComments: 4 pages, 3 figures, to be published in the 2024 IEEE International Symposium on Biomedical Imaging (ISBI) proceedingsSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Training neural networks with high-quality pixel-level annotation in histopathology whole-slide images (WSI) is an expensive process due to gigapixel resolution of WSIs. However, recent advances in self-supervised learning have shown that highly descriptive image representations can be learned without the need for annotations. We investigate the application of the recent Hierarchical Image Pyramid Transformer (HIPT) model for the specific task of classification of colorectal biopsies and polyps. After evaluating the effectiveness of TCGA-learned features in the original HIPT model, we incorporate colon biopsy image information into HIPT's pretraining using two distinct strategies: (1) fine-tuning HIPT from the existing TCGA weights and (2) pretraining HIPT from random weight initialization. We compare the performance of these pretraining regimes on two colorectal biopsy classification tasks: binary and multiclass classification.
- [16] arXiv:2405.15153 [pdf, ps, html, other]
-
Title: Optimal Reference Nodes Deployment for Positioning Seafloor Anchor NodesSubjects: Signal Processing (eess.SP)
Seafloor anchor nodes, which form a geodetic network, are designed to provide surface and underwater users with positioning, navigation and timing (PNT) services. Due to the non-uniform distribution of underwater sound speed, accurate positioning of underwater anchor nodes is a challenge work. Traditional anchor node positioning typically uses cross or circular shapes, however, how to optimize the deployment of reference nodes for positioning underwater anchor nodes considering the variability of sound speed has not yet been studied. This paper focuses on the optimal reference nodes deployment strategies for time--of--arrival (TOA) localization in the three-dimensional (3D) underwater space. We adopt the criterion that minimizing the trace of the inverse Fisher information matrix (FIM) to determine optimal reference nodes deployment with Gaussian measurement noise, which is positive related to the signal propagation path. A comprehensive analysis of optimal reference-target geometries is provided in the general circumstance with no restriction on the number of reference nodes, elevation angle and reference-target range. A new semi-closed form solution is found to detemine the optimal geometries. To demonstrate the findings in this paper, we conducted both simulations and sea trials on underwater anchor node positioning. Both the simulation and experiment results are consistent with theoretical analysis.
- [17] arXiv:2405.15159 [pdf, ps, html, other]
-
Title: Leveraging Gated Recurrent Units for Iterative Online Precise Attitude Control for Geodetic MissionsComments: 14 pagesSubjects: Systems and Control (eess.SY)
In this paper, we consider the problem of precise attitude control for geodetic missions, such as the GRACE Follow-on (GRACE-FO) mission. Traditional and well-established control methods, such as Proportional-Integral-Derivative (PID) controllers, have been the standard in attitude control for most space missions, including the GRACE-FO mission. Instead of significantly modifying (or replacing) the original PID controllers that are being used for these missions, we introduce an iterative modification to the PID controller that ensures improved attitude control precision (i.e., reduction in attitude error). The proposed modification leverages Gated Recurrent Units (GRU) to learn and predict external disturbance trends derived from incoming attitude measurements from the GRACE satellites. Our analysis has revealed a distinct trend in the external disturbance time-series data, suggesting the potential utility of GRU's to predict future disturbances acting on the system. The learned GRU model compensates for these disturbances within the standard PID control loop in real time via an additive correction term which is updated at regular time intervals. The simulation results verify the significant reduction in attitude error, verifying the efficacy of our proposed approach.
- [18] arXiv:2405.15178 [pdf, ps, html, other]
-
Title: Distributed Adaptive Control of Disturbed Interconnected Systems with High-Order TunersComments: This is the extended version of the paper accepted for publication in IEEE Control Systems Letters (L-CSS) 2024Subjects: Systems and Control (eess.SY)
This paper addresses the challenge of network synchronization under limited communication, involving heterogeneous agents with different dynamics and various network topologies, to achieve consensus. We investigate the distributed adaptive control for interconnected unknown linear subsystems with a leader and followers, in the presence of input-output disturbance. We enhance the communication within multi-agent systems to achieve consensus under the leadership's guidance. While the measured variable is similar among the followers, the incoming measurements are weighted and constructed based on their proximity to the leader. We also explore the convergence rates across various balanced topologies (Star-like, Cyclic-like, Path, Random), featuring different numbers of agents, using three distributed algorithms, ranging from first- to high-order tuners to effectively address time-varying regressors. The mathematical foundation is rigorously presented from the network designs of the unknown agents following a leader, to the distributed methods. Moreover, we conduct several numerical simulations across various networks, agents and tuners to evaluate the effects of sparsity in the interaction between subsystems using the $L_2-$norm and $L_\infty-$norm. Some networks exhibit a trend where an increasing number of agents results in smaller errors, although this is not universally the case. Additionally, patterns observed at initial times may not reliably predict overall performance across different networks. Finally, we demonstrate that the proposed modified high-order tuner outperforms its counterparts, and we provide related insights along with our conclusions.
- [19] arXiv:2405.15187 [pdf, ps, html, other]
-
Title: Chance-Constrained Economic Dispatch with Flexible Loads and RESSubjects: Systems and Control (eess.SY)
With the increasing penetration of intermittent renewable energy sources (RESs), it becomes increasingly challenging to maintain the supply-demand balance of power systems by solely relying on the generation side. To combat the volatility led by the uncertain RESs, demand-side management by leveraging the multi-dimensional flexibility (MDF) has been recognized as an economic and efficient approach. Thus, it is important to integrate MDF into existing power systems. In this paper, we propose an enhanced day-ahead energy market, where the MDFs of aggregate loads are traded to minimize the generation cost and mitigate the volatility of locational marginal prices (LMPs) in the transmission network. We first explicitly capture the negative impact of the uncertainty from RESs on the day-ahead market by a chance-constrained economic dispatch problem (CEDP). Then, we propose a bidding mechanism for the MDF of the aggregate loads and combine this mechanism into the CEDP for the day-ahead market. Through multiple case studies, we show that MDF from load aggregators can reduce the volatility of LMPs. In addition, we identify the values of the different flexibilities in the MDF bids, which provide useful insights into the design of more complex MDF markets.
- [20] arXiv:2405.15205 [pdf, ps, html, other]
-
Title: Enhancing Generalized Fetal Brain MRI Segmentation using A Cascade Network with Depth-wise Separable Convolution and Attention MechanismSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Automatic segmentation of the fetal brain is still challenging due to the health state of fetal development, motion artifacts, and variability across gestational ages, since existing methods rely on high-quality datasets of healthy fetuses. In this work, we propose a novel cascade network called CasUNext to enhance the accuracy and generalization of fetal brain MRI segmentation. CasUNext incorporates depth-wise separable convolution, attention mechanisms, and a two-step cascade architecture for efficient high-precision segmentation. The first network localizes the fetal brain region, while the second network focuses on detailed segmentation. We evaluate CasUNext on 150 fetal MRI scans between 20 to 36 weeks from two scanners made by Philips and Siemens including axial, coronal, and sagittal views, and also validated on a dataset of 50 abnormal fetuses. Results demonstrate that CasUNext achieves improved segmentation performance compared to U-Nets and other state-of-the-art approaches. It obtains an average Dice coefficient of 96.1% and mean intersection over union of 95.9% across diverse scenarios. CasUNext shows promising capabilities for handling the challenges of multi-view fetal MRI and abnormal cases, which could facilitate various quantitative analyses and apply to multi-site data.
- [21] arXiv:2405.15241 [pdf, ps, html, other]
-
Title: Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem SolvingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Solving 3D medical inverse problems such as image restoration and reconstruction is crucial in modern medical field. However, the curse of dimensionality in 3D medical data leads mainstream volume-wise methods to suffer from high resource consumption and challenges models to successfully capture the natural distribution, resulting in inevitable volume inconsistency and artifacts. Some recent works attempt to simplify generation in the latent space but lack the capability to efficiently model intricate image details. To address these limitations, we present Blaze3DM, a novel approach that enables fast and high-fidelity generation by integrating compact triplane neural field and powerful diffusion model. In technique, Blaze3DM begins by optimizing data-dependent triplane embeddings and a shared decoder simultaneously, reconstructing each triplane back to the corresponding 3D volume. To further enhance 3D consistency, we introduce a lightweight 3D aware module to model the correlation of three vertical planes. Then, diffusion model is trained on latent triplane embeddings and achieves both unconditional and conditional triplane generation, which is finally decoded to arbitrary size volume. Extensive experiments on zero-shot 3D medical inverse problem solving, including sparse-view CT, limited-angle CT, compressed-sensing MRI, and MRI isotropic super-resolution, demonstrate that Blaze3DM not only achieves state-of-the-art performance but also markedly improves computational efficiency over existing methods (22~40x faster than previous work).
- [22] arXiv:2405.15259 [pdf, ps, html, other]
-
Title: Robust Economic Dispatch with Flexible Demand and Adjustable Uncertainty SetSubjects: Systems and Control (eess.SY)
With more renewable energy sources (RES) integrated into the power system, the intermittency of RES places a heavy burden on the system. The uncertainty of RES is traditionally handled by controllable generators to balance the real time wind power deviation. As the demand side management develops, the flexibility of aggregate loads can be leveraged to mitigate the negative impact of the wind power. In view of this, we study the problem of how to exploit the multi-dimensional flexibility of elastic loads to balance the trade-off between a low generation cost and a low system risk related to the wind curtailment and the power deficiency. These risks are captured by the conditional value-at-risk. Also, unlike most of the existing studies, the uncertainty set of the wind power output in our model is not fixed. By contrast, it is undetermined and co-optimized based on the available load flexibility. We transform the original optimization problem into a convex one using surrogate affine approximation such that it can be solved efficiently. In case studies, we apply our model on a six-bus transmission network and demonstrate that how flexible load aggregators can help to determine the optimal admissible region for the wind power deviations.
- [23] arXiv:2405.15271 [pdf, ps, other]
-
Title: Seamless Integration and Implementation of Distributed Contact and Contactless Vital Sign MonitoringComments: 14 pages,9 figuresSubjects: Systems and Control (eess.SY); Instrumentation and Detectors (physics.ins-det); Optics (physics.optics)
Real-time vital sign monitoring is gaining immense significance not only in the medical field but also in personal health management. Facing the needs of different application scenarios of the smart and healthy city in the future, the low-cost, large-scale, scalable, and distributed vital sign monitoring system is of great significance. In this work, a seamlessly integrated contact and contactless vital sign monitoring system, which can simultaneously implement respiration and heartbeat monitoring, is proposed. In contact vital sign monitoring, the chest wall movement due to respiration and heartbeat is translated into changes in the optical output intensity of a fiber Bragg grating (FBG). The FBG is also an important part of radar signal generation for contactless vital sign monitoring, in which the chest wall movement is translated into phase changes of the radar de-chirped signal. By analyzing the intensity of the FBG output and phase of the radar de-chirped signal, real-time respiration and heartbeat monitoring are realized. In addition, due to the distributed structure of the system and its good integration with the wavelength-division multiplexing optical network, it can be massively scaled by employing more wavelengths. A proof-of-concept experiment is carried out. Contact and contactless respiration and heartbeat monitoring of three people are simultaneously realized. During a monitoring time of 60 s, the maximum absolute measurement errors of respiration and heartbeat rates are 1.6 respirations per minute and 2.3 beats per minute, respectively. The measurement error does not have an obvious change even when the monitoring time is decreased to 5 s.
- [24] arXiv:2405.15275 [pdf, ps, html, other]
-
Title: NMGrad: Advancing Histopathological Bladder Cancer Grading with Weakly Supervised Deep LearningComments: this https URLSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
The most prevalent form of bladder cancer is urothelial carcinoma, characterized by a high recurrence rate and substantial lifetime treatment costs for patients. Grading is a prime factor for patient risk stratification, although it suffers from inconsistencies and variations among pathologists. Moreover, absence of annotations in medical imaging difficults training deep learning models. To address these challenges, we introduce a pipeline designed for bladder cancer grading using histological slides. First, it extracts urothelium tissue tiles at different magnification levels, employing a convolutional neural network for processing for feature extraction. Then, it engages in the slide-level prediction process. It employs a nested multiple instance learning approach with attention to predict the grade. To distinguish different levels of malignancy within specific regions of the slide, we include the origins of the tiles in our analysis. The attention scores at region level is shown to correlate with verified high-grade regions, giving some explainability to the model. Clinical evaluations demonstrate that our model consistently outperforms previous state-of-the-art methods.
- [25] arXiv:2405.15339 [pdf, ps, html, other]
-
Title: Environment Sensing-aided Beam Prediction with Transfer Learning for Smart FactorySubjects: Signal Processing (eess.SP)
In this paper, we propose an environment sensing-aided beam prediction model for smart factory that can be transferred from given environments to a new environment. In particular, we first design a pre-training model that predicts the optimal beam by sensing the present environmental information. When encountering a new environment, it generally requires collecting a large amount of new training data to retrain the model, whose cost severely impedes the application of the designed pre-training model. Therefore, we next design a transfer learning strategy that fine-tunes the pre-trained model by limited labeled data of the new environment. Simulation results show that when the pre-trained model is fine-tuned by 30\% of labeled data from the new environment, the Top-10 beam prediction accuracy reaches 94\%. Moreover, compared with the way to completely re-training the prediction model, the amount of training data and the time cost of the proposed transfer learning strategy reduce 70\% and 75\% respectively.
- [26] arXiv:2405.15345 [pdf, ps, html, other]
-
Title: Hybrid-Field Channel Estimation for XL-MIMO Systems with Stochastic Gradient Pursuit AlgorithmComments: 30 pages, 6 figures, been ACCEPTED for publication as a REGULAR paper in the IEEE Transactions on Signal ProcessingSubjects: Signal Processing (eess.SP)
Extremely large-scale multiple-input multiple-output (XL-MIMO) is crucial for satisfying the high data rate requirements of the sixth-generation (6G) wireless networks. In this context, ensuring accurate acquisition of channel state information (CSI) with low complexity becomes imperative. Moreover, deploying an extremely large antenna array at the base station (BS) might result in some scatterers being located in near-field, while others are situated in far-field, leading to a hybrid-field communication scenario. To address these challenges, this paper introduces two stochastic gradient pursuit (SGP)-based schemes for the hybrid-field channel estimation in two scenarios. For the first scenario in which the prior knowledge of the specific proportion of the number of near-field and far-field channel paths is known, the scheme can effectively leverage the angular-domain sparsity of the far-field channels and the polar-domain sparsity of the near-field channels such that the channel estimation in these two fields can be performed separately. For the second scenario which the proportion is not available, we propose an off-grid SGP-based channel estimation scheme, which iterates through the values of the proportion parameter based on a criterion before performing the hybrid-field channel estimation. We demonstrate numerically that both of the proposed channel estimation schemes achieve superior performance in terms of both estimation accuracy and achievable rates while enjoying lower computational complexity compared with existing schemes. Additionally, we reveal that as the number of antennas at the UE increases, the normalized mean square error (NMSE) performances of the proposed schemes remain basically unchanged, while the NMSE performances of existing ones improve. Remarkably, even in this scenario, the proposed schemes continue to outperform the existing ones.
- [27] arXiv:2405.15399 [pdf, ps, html, other]
-
Title: Stochastic SR for Gaussian microtexturesSubjects: Image and Video Processing (eess.IV)
Super-Resolution (SR) is the problem that consists in reconstructing images that have been degraded by a zoom-out operator. This is an ill-posed problem that does not have a unique solution, and numerical approaches rely on a prior on high-resolution images. While optimization-based methods are generally deterministic, with the rise of image generative models more and more interest has been given to stochastic SR, that is, sampling among all possible SR images associated with a given low-resolution input. In this paper, we construct an efficient, stable and provably exact sampler for the stochastic SR of Gaussian microtextures. Even though our approach is limited regarding the scope of images it encompasses, our algorithm is competitive with deep learning state-of-the-art methods both in terms of perceptual metric and execution time when applied to microtextures. The framework of Gaussian microtextures also allows us to rigorously discuss the limitations of various reconstruction metrics to evaluate the efficiency of SR routines.
- [28] arXiv:2405.15413 [pdf, ps, html, other]
-
Title: MambaVC: Learned Visual Compression with Selective State SpacesShiyu Qin, Jinpeng Wang, Yimin Zhou, Bin Chen, Tianci Luo, Baoyi An, Tao Dai, Shutao Xia, Yaowei WangComments: 17pages,15 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
Learned visual compression is an important and active task in multimedia. Existing approaches have explored various CNN- and Transformer-based designs to model content distribution and eliminate redundancy, where balancing efficacy (i.e., rate-distortion trade-off) and efficiency remains a challenge. Recently, state-space models (SSMs) have shown promise due to their long-range modeling capacity and efficiency. Inspired by this, we take the first step to explore SSMs for visual compression. We introduce MambaVC, a simple, strong and efficient compression network based on SSM. MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling, which helps to capture informative global contexts and enhances compression. On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads. Specifically, it outperforms CNN and Transformer variants by 9.3% and 15.6% on Kodak, respectively, while reducing computation by 42% and 24%, and saving 12% and 71% of memory. MambaVC shows even greater improvements with high-resolution images, highlighting its potential and scalability in real-world applications. We also provide a comprehensive comparison of different network designs, underscoring MambaVC's advantages.
- [29] arXiv:2405.15432 [pdf, ps, html, other]
-
Title: Throughput Requirements for RAN Functional Splits in 3D-NetworksMohammadAmin Vakilifard, Tim Düe, Mohammad Rihan, Maik Röper, Dirk Wübben, Carsten Bockelmann, Armin DekorsyComments: submitted to Globecom2024 SELECTED AREAS IN COMMUNICATIONS SATELLITE AND SPACE COMMUNICATIONSSubjects: Signal Processing (eess.SP); Networking and Internet Architecture (cs.NI)
The rapid growth of non-terrestrial communication necessitates its integration with existing terrestrial networks, as highlighted in 3GPP Releases 16 and 17. This paper analyses the concept of functional splits in 3D-Networks. To manage this complex structure effectively, the adoption of a Radio Access Network (RAN) architecture with Functional Split (FS) offers advantages in flexibility, scalability, and cost-efficiency. RAN achieves this by disaggregating functionalities into three separate units. Analogous to the terrestrial network approach, 3GPP is extending this concept to non-terrestrial platforms as well. This work presents a general analysis of the requested Fronthaul (FH) data rate on feeder link between a non-terrestrial platform and the ground-station. Each split option is a trade-of between FH data rate and the respected complexity. Since flying nodes face more limitations regarding power consumption and complexity on board in comparison to terrestrial ones, we are investigating the split options between lower and higher physical layer.
- [30] arXiv:2405.15442 [pdf, ps, html, other]
-
Title: Towards Precision Healthcare: Robust Fusion of Time Series and Image DataAli Rasekh, Reza Heidari, Amir Hosein Haji Mohammad Rezaie, Parsa Sharifi Sedeh, Zahra Ahmadi, Prasenjit Mitra, Wolfgang NejdlSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code will be made available online.
- [31] arXiv:2405.15482 [pdf, ps, html, other]
-
Title: An input-output continuous-time version of Willems' lemmaComments: 6 pagesSubjects: Systems and Control (eess.SY)
We illustrate a novel version of Willems' lemma for data-based representation of continuous-time systems. The main novelties compared to previous works are two. First, the proposed framework relies only on measured input-output trajectories from the system and no internal (state) information is required. Second, our system representation makes use of exact system trajectories, without resorting to orthogonal bases representations and consequent approximations. We first establish sufficient and necessary conditions for data-based generation of system trajectories in terms of suitable latent variables. Subsequently, we reformulate these conditions using measured input-output data and show how to span the full behavior of the system. Furthermore, we show how to use the developed framework to solve the data-based continuous-time simulation problem.
- [32] arXiv:2405.15493 [pdf, ps, other]
-
Title: Design and Implementation of DC-DC Buck Converter based on Deep Neural Network Sliding Mode ControlSubjects: Systems and Control (eess.SY)
In order to address the challenge of traditional sliding mode controllers struggling to balance between suppressing system jitter and accelerating convergence speed, a deep neural network (DNN)-based sliding mode control strategy is proposed in this paper. The strategy achieves dynamic adjustment of parameters by modelling and learning the system through deep neural networks, which suppresses the system jitter while ensuring the convergence speed of the system. To demonstrate the stability of the system, a Lyapunov function is designed to prove the stability of the mathematical model of the DNN-based sliding mode control strategy for DC-DC buck switching power supply. We adopt a double closed-loop control mode to combine the sliding mode control of the voltage inner loop with the PI control of the current outer loop. Simultaneously, The DNN performance is evaluated through simulation and hardware experiments and compared with conventional control methods. The results demonstrate that the sliding mode controller based on the DNN exhibits faster system convergence speed, enhanced jitter suppression capability, and greater robustness.
- [33] arXiv:2405.15500 [pdf, ps, html, other]
-
Title: Hierarchical Loss And Geometric Mask Refinement For Multilabel Ribs SegmentationComments: Accepted to IEEE ISBI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Automatic ribs segmentation and numeration can increase computed tomography assessment speed and reduce radiologists mistakes. We introduce a model for multilabel ribs segmentation with hierarchical loss function, which enable to improve multilabel segmentation quality. Also we propose postprocessing technique to further increase labeling quality. Our model achieved new state-of-the-art 98.2% label accuracy on public RibSeg v2 dataset, surpassing previous result by 6.7%.
- [34] arXiv:2405.15517 [pdf, ps, html, other]
-
Title: Erase to Enhance: Data-Efficient Machine Unlearning in MRI ReconstructionComments: The paper is accpeted by MIDL 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Machine unlearning is a promising paradigm for removing unwanted data samples from a trained model, towards ensuring compliance with privacy regulations and limiting harmful biases. Although unlearning has been shown in, e.g., classification and recommendation systems, its potential in medical image-to-image translation, specifically in image recon-struction, has not been thoroughly investigated. This paper shows that machine unlearning is possible in MRI tasks and has the potential to benefit for bias removal. We set up a protocol to study how much shared knowledge exists between datasets of different organs, allowing us to effectively quantify the effect of unlearning. Our study reveals that combining training data can lead to hallucinations and reduced image quality in the reconstructed data. We use unlearning to remove hallucinations as a proxy exemplar of undesired data removal. Indeed, we show that machine unlearning is possible without full retraining. Furthermore, our observations indicate that maintaining high performance is feasible even when using only a subset of retain data. We have made our code publicly accessible.
- [35] arXiv:2405.15553 [pdf, ps, html, other]
-
Title: Massive MIMO-ISAC System With 1-Bit ADCs/DACsSubjects: Signal Processing (eess.SP)
This paper investigates a hardware-efficient massive multiple-input multiple-output integrated sensing and communication (MIMO-ISAC) system with 1-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs). The proposed system, referred to as 1BitISAC, employs 1-bit DACs at the ISAC transmitter and 1-bit ADCs at the sensing receiver, achieving significant reductions in power consumption and hardware costs. For such kind of systems, two 1BitISAC joint transceiver designs, i.e., i) quality of service constrained 1BitISAC design and ii) quality of detection constrained design, are considered and the corresponding problems are formulated. In order to address these problems, we thoroughly analyze the radar detection performance after 1-bit ADCs quantization and the communication bit error rate. This analysis yields new design insights and leads to unique radar and communication metrics, which enables us to simplify the original problems and employ majorization-minimization and integer linear programming methods to solve the problems. Numerical results are provided to validate the performance analysis of the proposed 1BitISAC and to compare with other ISAC configurations. The superiority of the proposed 1BitISAC system in terms of balancing ISAC performance and energy efficiency is also demonstrated.
- [36] arXiv:2405.15607 [pdf, ps, html, other]
-
Title: Channel Estimation and Reconstruction in Fluid Antenna System: Oversampling is EssentialComments: 12 pages, 14 figures - including subfigures. Submitted for potential publicationSubjects: Signal Processing (eess.SP)
Fluid antenna system (FAS) has recently surfaced as a promising technology for the upcoming sixth generation (6G) wireless networks. Unlike traditional antenna system (TAS) with fixed antenna location, FAS introduces a flexible component where the radiating element can switch its position within a predefined space. This capability allows FAS to achieve additional diversity and multiplexing gains. Nevertheless, to fully reap the benefits of FAS, obtaining channel state information (CSI) over the predefined space is crucial. In this paper, we explore the interaction between a transmitter equipped with a traditional antenna and a receiver with a fluid antenna over an electromagnetic-compliant channel model. We address the challenges of channel estimation and reconstruction using Nyquist sampling and maximum likelihood estimation (MLE) methods. Our analysis reveals a fundamental tradeoff between the accuracy of the reconstructed channel and the number of estimated channels, indicating that half-wavelength sampling is insufficient for perfect reconstruction and that oversampling is essential to enhance accuracy. Despite its advantages, oversampling can introduce practical challenges. Consequently, we propose a suboptimal sampling distance that facilitates efficient channel reconstruction. In addition, we employ the MLE method to bound the channel estimation error by $\epsilon$, with a specific confidence interval (CI). Our findings enable us to determine the minimum number of estimated channels and the total number of pilot symbols required for efficient channel reconstruction in a given space. Lastly, we investigate the rate performance of FAS and TAS and demonstrate that FAS with imperfect CSI can outperform TAS with perfect CSI.
- [37] arXiv:2405.15701 [pdf, ps, html, other]
-
Title: realSEUDO for real-time calcium imaging analysisComments: 20 pages, 8 figuresSubjects: Image and Video Processing (eess.IV); Neurons and Cognition (q-bio.NC); Quantitative Methods (q-bio.QM); Computation (stat.CO)
Closed-loop neuroscience experimentation, where recorded neural activity is used to modify the experiment on-the-fly, is critical for deducing causal connections and optimizing experimental time. A critical step in creating a closed-loop experiment is real-time inference of neural activity from streaming recordings. One challenging modality for real-time processing is multi-photon calcium imaging (CI). CI enables the recording of activity in large populations of neurons however, often requires batch processing of the video data to extract single-neuron activity from the fluorescence videos. We use the recently proposed robust time-trace estimator-Sparse Emulation of Unused Dictionary Objects (SEUDO) algorithm-as a basis for a new on-line processing algorithm that simultaneously identifies neurons in the fluorescence video and infers their time traces in a way that is robust to as-yet unidentified neurons. To achieve real-time SEUDO (realSEUDO), we optimize the core estimator via both algorithmic improvements and an fast C-based implementation, and create a new cell finding loop to enable realSEUDO to also identify new cells. We demonstrate comparable performance to offline algorithms (e.g., CNMF), and improved performance over the current on-line approach (OnACID) at speeds of 120 Hz on average.
- [38] arXiv:2405.15717 [pdf, ps, html, other]
-
Title: Integrated Design for Wave Energy Converter Farms: Assessing Plant, Control, Layout, and Site Selection Coupling in the Presence of Irregular WavesComments: 12 pages and 7 figuresSubjects: Systems and Control (eess.SY)
A promising direction towards reducing the levelized cost of energy for wave energy converter (WEC) farms is to improve their performance. WEC design studies generally focus on a single design domain (e.g., geometry, control, or layout) to improve the farm's performance under simplifying assumptions, such as regular waves. This strategy, however, has resulted in design recommendations that are impractical or limited in scope because WEC farms are complex systems that exhibit strong coupling among geometry, control, and layout domains. In addition, the location of the candidate site, which has a large impact on the performance of the farm, is often overlooked. Motivated by some of the limitations observed in WEC literature, this study uses an integrated design framework, based on simultaneous control co-design (CCD) principles, to discuss the impact of site selection and wave type on WEC farm design. Interactions among plant, control, and layout are also investigated and discussed using a wide range of simulations and optimization studies. All of the studies were conducted using frequency-domain heaving cylinder WEC devices within a farm with a linear reactive controller in the presence of irregular probabilistic waves. The results provide high-level guidelines to help the WEC design community move toward an integrated design perspective.
New submissions for Monday, 27 May 2024 (showing 38 of 38 entries )
- [39] arXiv:2405.14882 (cross-list from cs.CV) [pdf, ps, other]
-
Title: LookUp3D: Data-Driven 3D ScanningComments: 10 pages and 4 ancillary filesSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Image and Video Processing (eess.IV)
We introduce a novel calibration and reconstruction procedure for structured light scanning that foregoes explicit point triangulation in favor of a data-driven lookup procedure. The key idea is to sweep a calibration checkerboard over the entire scanning volume with a linear stage and acquire a dense stack of images to build a per-pixel lookup table from colors to depths. Imperfections in the setup, lens distortion, and sensor defects are baked into the calibration data, leading to a more reliable and accurate reconstruction. Existing structured light scanners can be reused without modifications while enjoying the superior precision and resilience that our calibration and reconstruction algorithms offer. Our algorithm shines when paired with a custom-designed analog projector, which enables 1-megapixel high-speed 3D scanning at up to 500 fps. We describe our algorithm and hardware prototype for high-speed 3D scanning and compare them with commercial and open-source structured light scanning methods.
- [40] arXiv:2405.15033 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Generating camera failures as a class of physics-based adversarial examplesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
While there has been extensive work on generating physics-based adversarial samples recently, an overlooked class of such samples come from physical failures in the camera. Camera failures can occur as a result of an external physical process, i.e. breakdown of a component due to stress, or an internal component failure. In this work, we develop a simulated physical process for generating broken lens as a class of physics-based adversarial samples. We create a stress-based physical simulation by generating particles constrained in a mesh and apply stress at a random point and at a random angle. We perform stress propagation through the mesh and the end result of the mesh is a corresponding image which simulates the broken lens pattern. We also develop a neural emulator which learns the non-linear mapping between the mesh as a graph and the stress propagation using constrained propagation setup. We can then statistically compare the difference between the generated adversarial samples with real, simulated and emulated adversarial examples using the detection failure rate of the different classes and in between the samples using the Frechet Inception distance. Our goal through this work is to provide a robust physics based process for generating adversarial samples.
- [41] arXiv:2405.15096 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: Music Genre Classification: Training an AI modelSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals, in which applications range from content recommendation systems to music recommendation systems. In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.The systems are namely, a Multilayer Perceptron (built from scratch), a k-Nearest Neighbours (also built from scratch), a Convolutional Neural Network and lastly a Random Forest wide model. In order to process the audio signals, feature extraction methods such as Short-Time Fourier Transform, and the extraction of Mel Cepstral Coefficients (MFCCs), is performed. Through this extensive research, I aim to asses the robustness of machine learning models for genre classification, and to compare their results.
- [42] arXiv:2405.15101 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: Social Zone as a Barrier Function for Socially-Compliant Robot NavigationSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
This study addresses the challenge of integrating social norms into robot navigation, which is essential for ensuring that robots operate safely and efficiently in human-centric environments. Social norms, often unspoken and implicitly understood among people, are difficult to explicitly define and implement in robotic systems. To overcome this, we derive these norms from real human trajectory data, utilizing the comprehensive ATC dataset to identify the minimum social zones humans and robots must respect. These zones are integrated into the robot' navigation system by applying barrier functions, ensuring the robot consistently remains within the designated safety set. Simulation results demonstrate that our system effectively mimics human-like navigation strategies, such as passing on the right side and adjusting speed or pausing in constrained spaces. The proposed framework is versatile, easily comprehensible, and tunable, demonstrating the potential to advance the development of robots designed to navigate effectively in human-centric environments.
- [43] arXiv:2405.15103 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: The Rarity of Musical Audio Signals Within the Space of Possible Audio GenerationSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
A white noise signal can access any possible configuration of values, though statistically over many samples tends to a uniform spectral distribution, and is highly unlikely to produce intelligible sound. But how unlikely? The probability that white noise generates a music-like signal over different durations is analyzed, based on some necessary features observed in real music audio signals such as mostly proximate movement and zero crossing rate. Given the mathematical results, the rarity of music as a signal is considered overall. The applicability of this study is not just to show that music has a precious rarity value, but that examination of the size of music relative to the overall size of audio signal space provides information to inform new generations of algorithmic music system (which are now often founded on audio signal generation directly, and may relate to white noise via such machine learning processes as diffusion). Estimated upper bounds on the rarity of music to the size of various physical and musical spaces are compared, to better understand the magnitude of the results (pun intended). Underlying the research are the questions `how much music is still out there?' and `how much music could a machine learning process actually reach?'.
- [44] arXiv:2405.15163 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Provably Quantum-Secure Microgrids through Enhanced Quantum Distributed ControlSubjects: Quantum Physics (quant-ph); Systems and Control (eess.SY)
Distributed control of multi-inverter microgrids has attracted considerable attention as it can achieve the combined goals of flexible plug-and-play architecture guaranteeing frequency and voltage regulation while preserving power sharing among nonidentical distributed energy resources (DERs). However, it turns out that cybersecurity has emerged as a serious concern in distributed control schemes. Inspired by quantum communication developments and their security advantages, this paper devises a scalable quantum distributed controller that can guarantee synchronization, and power sharing among DERs. The key innovation lies in the fact that the new quantum distributed scheme allows for exchanging secret information directly through quantum channels among the participating DERs, making microgrids inherently cybersecure. Case studies on two ac and dc microgrids verify the efficacy of the new quantum distributed control strategy.
- [45] arXiv:2405.15216 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Denoising LM: Pushing the Limits of Error Correction Models for Speech RecognitionComments: under reviewSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make. Error correction models are designed to fix ASR errors, however, they showed little improvement over traditional LMs mainly due to the lack of supervised training data. In this paper, we present Denoising LM (DLM), which is a $\textit{scaled}$ error correction model trained with vast amounts of synthetic data, significantly exceeding prior attempts meanwhile achieving new state-of-the-art ASR performance. We use text-to-speech (TTS) systems to synthesize audio, which is fed into an ASR system to produce noisy hypotheses, which are then paired with the original texts to train the DLM. DLM has several $\textit{key ingredients}$: (i) up-scaled model and data; (ii) usage of multi-speaker TTS systems; (iii) combination of multiple noise augmentation strategies; and (iv) new decoding techniques. With a Transformer-CTC ASR, DLM achieves 1.5% word error rate (WER) on $\textit{test-clean}$ and 3.3% WER on $\textit{test-other}$ on Librispeech, which to our knowledge are the best reported numbers in the setting where no external audio data are used and even match self-supervised methods which use external audio data. Furthermore, a single DLM is applicable to different ASRs, and greatly surpassing the performance of conventional LM based beam-search rescoring. These results indicate that properly investigated error correction models have the potential to replace conventional LMs, holding the key to a new level of accuracy in ASR systems.
- [46] arXiv:2405.15336 (cross-list from cs.RO) [pdf, ps, other]
-
Title: An iterative closest point algorithm for marker-free 3D shape registration of continuum robotsComments: 11 pages, 8 figures, 2 algorithms, journalSubjects: Robotics (cs.RO); Image and Video Processing (eess.IV)
Continuum robots have emerged as a promising technology in the medical field due to their potential of accessing deep sited locations of the human body with low surgical trauma. When deriving physics-based models for these robots, evaluating the models poses a significant challenge due to the difficulty in accurately measuring their intricate shapes. In this work, we present an optimization based 3D shape registration algorithm for estimation of the backbone shape of slender continuum robots as part of a pho togrammetric measurement. Our approach to estimating the backbones optimally matches a parametric three-dimensional curve to images of the robot. Since we incorporate an iterative closest point algorithm into our method, we do not need prior knowledge of the robots position within the respective images. In our experiments with artificial and real images of a concentric tube continuum robot, we found an average maximum deviation of the reconstruction from simulation data of 0.665 mm and 0.939 mm from manual measurements. These results show that our algorithm is well capable of producing high accuracy positional data from images of continuum robots.
- [47] arXiv:2405.15338 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound GenerationSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model. Unlike recent large-scale sound generation models, our model can be efficiently trained under limited computational resources. The integration of a contrastive learning strategy further enhances the connection between text conditions and the generated outputs, resulting in coherent and high-fidelity performance. Our experiments demonstrate that SoundLoCD outperforms the baseline with greatly reduced computational resources. A comprehensive ablation study further validates the contribution of each component within SoundLoCD. Demo page: \url{this https URL}.
- [48] arXiv:2405.15381 (cross-list from cs.AR) [pdf, ps, html, other]
-
Title: Single-Event Upset Analysis of a Systolic Array based Deep Neural Network AcceleratorComments: This work has been submitted to RADECS 2024 for possible publicationSubjects: Hardware Architecture (cs.AR); Signal Processing (eess.SP)
Deep Neural Network (DNN) accelerators are extensively used to improve the computational efficiency of DNNs, but are prone to faults through Single-Event Upsets (SEUs). In this work, we present an in-depth analysis of the impact of SEUs on a Systolic Array (SA) based DNN accelerator. A fault injection campaign is performed through a Register-Transfer Level (RTL) based simulation environment to improve the observability of each hardware block, including the SA itself as well as the post-processing pipeline. From this analysis, we present the sensitivity, independent of a DNN model architecture, for various flip-flop groups both in terms of fault propagation probability and fault magnitude. This allows us to draw detailed conclusions and determine optimal mitigation strategies.
- [49] arXiv:2405.15415 (cross-list from cs.IT) [pdf, ps, other]
-
Title: Semi-Supervised Learning via Cross-Prediction-Powered Inference for Wireless SystemsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In many wireless application scenarios, acquiring labeled data can be prohibitively costly, requiring complex optimization processes or measurement campaigns. Semi-supervised learning leverages unlabeled samples to augment the available dataset by assigning synthetic labels obtained via machine learning (ML)-based predictions. However, treating the synthetic labels as true labels may yield worse-performing models as compared to models trained using only labeled data. Inspired by the recently developed prediction-powered inference (PPI) framework, this work investigates how to leverage the synthetic labels produced by an ML model, while accounting for the inherent bias with respect to true labels. To this end, we first review PPI and its recent extensions, namely tuned PPI and cross-prediction-powered inference (CPPI). Then, we introduce a novel variant of PPI, referred to as tuned CPPI, that provides CPPI with an additional degree of freedom in adapting to the quality of the ML-based labels. Finally, we showcase two applications of PPI-based techniques in wireless systems, namely beam alignment based on channel knowledge maps in millimeter-wave systems and received signal strength information-based indoor localization. Simulation results show the advantages of PPI-based techniques over conventional approaches that rely solely on labeled data or that apply standard pseudo-labeling strategies from semi-supervised learning. Furthermore, the proposed tuned CPPI method is observed to guarantee the best performance among all benchmark schemes, especially in the regime of limited labeled data.
- [50] arXiv:2405.15438 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern ChinaWenquan Dong, Edward T.A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven HancockSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance.
- [51] arXiv:2405.15454 (cross-list from cs.CL) [pdf, ps, html, other]
-
Title: Linearly Controlled Language Generation with Performative GuaranteesSubjects: Computation and Language (cs.CL); Systems and Control (eess.SY)
The increasing prevalence of Large Language Models (LMs) in critical applications highlights the need for controlled language generation strategies that are not only computationally efficient but that also enjoy performance guarantees. To achieve this, we use a common model of concept semantics as linearly represented in an LM's latent space. In particular, we take the view that natural language generation traces a trajectory in this continuous semantic space, realized by the language model's hidden activations. This view permits a control-theoretic treatment of text generation in latent space, in which we propose a lightweight, gradient-free intervention that dynamically steers trajectories away from regions corresponding to undesired meanings. Crucially, we show that this intervention, which we compute in closed form, is guaranteed (in probability) to steer the output into the allowed region. Finally, we demonstrate on a toxicity avoidance objective that the intervention steers language away from undesired content while maintaining text quality.
- [52] arXiv:2405.15477 (cross-list from cs.CV) [pdf, ps, other]
-
Title: MagicBathyNet: A Multimodal Remote Sensing Dataset for Bathymetry Prediction and Pixel-based Classification in Shallow WatersComments: 5 pages, 3 figures, 5 tables. Accepted at IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Accurate, detailed, and high-frequent bathymetry, coupled with complex semantic content, is crucial for the undermapped shallow seabed areas facing intense climatological and anthropogenic pressures. Current methods exploiting remote sensing images to derive bathymetry or seabed classes mainly exploit non-open data. This lack of openly accessible benchmark archives prevents the wider use of deep learning methods in such applications. To address this issue, in this paper we present the MagicBathyNet, which is a benchmark dataset made up of image patches of Sentinel2, SPOT-6 and aerial imagery, bathymetry in raster format and annotations of seabed classes. MagicBathyNet is then exploited to benchmark state-of-the-art methods in learning-based bathymetry and pixel-based classification. Dataset, pre-trained weights, and code are publicly available at this http URL.
- [53] arXiv:2405.15519 (cross-list from physics.optics) [pdf, ps, other]
-
Title: Confocal structured illumination microscopySubjects: Optics (physics.optics); Image and Video Processing (eess.IV)
Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce the concept of confocal imaging into OS-SIM and propose confocal structured illumination microscopy (CSIM) to enhance the imaging performance of OS-SIM. CSIM exploits the principle of dual photography to reconstruct a dual image from each pixel of the camera. The reconstructed dual image is equivalent to the image obtained by using the spatial light modulator (SLM) as a virtual camera, enabling the separation of the conjugate and non-conjugate signals recorded by the camera pixel. We can reject the non-conjugate signals by extracting the conjugate signal from each dual image to reconstruct a confocal image when establishing the conjugate relationship between the camera and the SLM. We have constructed the theoretical framework of CSIM. Optical-sectioning experimental results demonstrate that CSIM can reconstruct images with superior SNR and greater imaging depth compared with existing OS-SIM. CSIM is expected to expand the application scope of OS-SIM.
- [54] arXiv:2405.15542 (cross-list from cs.NI) [pdf, ps, html, other]
-
Title: SATSense: Multi-Satellite Collaborative Framework for Spectrum SensingComments: 13 pages, 16 figuresSubjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential. However, spectrum sensing in space is more challenging than in terrestrial networks due to variable channel conditions, making single-satellite sensing unstable. Therefore, we first attempt to design a collaborative sensing scheme utilizing diverse data from multiple satellites. However, it is non-trivial to achieve this collaboration due to heterogeneous channel quality, considerable raw sampling data, and packet loss. To address the above challenges, we first establish connections between the satellites by modeling their sensing data as a graph and devising a graph neural network-based algorithm to achieve effective spectrum sensing. Meanwhile, we establish a joint sub-Nyquist sampling and autoencoder data compression framework to reduce the amount of transmitted sensing data. Finally, we propose a contrastive learning-based mechanism compensates for missing packets. Extensive experiments demonstrate that our proposed strategy can achieve efficient spectrum sensing performance and outperform the conventional deep learning algorithm in spectrum sensing accuracy.
- [55] arXiv:2405.15550 (cross-list from cs.CV) [pdf, ps, other]
-
Title: CowScreeningDB: A public benchmark dataset for lameness detection in dairy cowsJournal-ref: Computers and Electronics in Agriculture, vol.216, pp.108500, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Lameness is one of the costliest pathological problems affecting dairy animals. It is usually assessed by trained veterinary clinicians who observe features such as gait symmetry or gait parameters as step counts in real-time. With the development of artificial intelligence, various modular systems have been proposed to minimize subjectivity in lameness assessment. However, the major limitation in their development is the unavailability of a public dataset which is currently either commercial or privately held. To tackle this limitation, we have introduced CowScreeningDB which was created using sensory data. This dataset was sourced from 43 cows at a dairy located in Gran Canaria, Spain. It consists of a multi-sensor dataset built on data collected using an Apple Watch 6 during the normal daily routine of a dairy cow. Thanks to the collection environment, sampling technique, information regarding the sensors, the applications used for data conversion and storage make the dataset a transparent one. This transparency of data can thus be used for further development of techniques for lameness detection for dairy cows which can be objectively compared. Aside from the public sharing of the dataset, we have also shared a machine-learning technique which classifies the caws in healthy and lame by using the raw sensory data. Hence validating the major objective which is to establish the relationship between sensor data and lameness.
- [56] arXiv:2405.15570 (cross-list from cs.NI) [pdf, ps, html, other]
-
Title: Multi-Gigabit Interactive Extended Reality over Millimeter-Wave: An End-to-End System ApproachComments: Accepted at IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2024Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Achieving high-quality wireless interactive Extended Reality (XR) will require multi-gigabit throughput at extremely low latency. The Millimeter-Wave (mmWave) frequency bands, between 24 and 300GHz, can achieve such extreme performance. However, maintaining a consistently high Quality of Experience with highly mobile users is challenging, as mmWave communications are inherently directional. In this work, we present and evaluate an end-to-end approach to such a mmWave-based mobile XR system. We perform a highly realistic simulation of the system, incorporating accurate XR data traffic, detailed mmWave propagation models and actual user motion. We evaluate the impact of the beamforming strategy and frequency on the overall performance. In addition, we provide the first system-level evaluation of the CoVRage algorithm, a proactive and spatially aware user-side beamforming approach designed specifically for highly mobile XR environments.
- [57] arXiv:2405.15655 (cross-list from cs.SD) [pdf, ps, html, other]
-
Title: HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification SystemSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In recent years, the remarkable advancements in deep neural networks have brought tremendous convenience. However, the training process of a highly effective model necessitates a substantial quantity of samples, which brings huge potential threats, like unauthorized exploitation with privacy leakage. In response, we propose a framework named HiddenSpeaker, embedding imperceptible perturbations within the training speech samples and rendering them unlearnable for deep-learning-based speaker verification systems that employ large-scale speakers for efficient training. The HiddenSpeaker utilizes a simplified error-minimizing method named Single-Level Error-Minimizing (SLEM) to generate specific and effective perturbations. Additionally, a hybrid objective function is employed for human perceptual optimization, ensuring the perturbation is indistinguishable from human listeners. We conduct extensive experiments on multiple state-of-the-art (SOTA) models in the speaker verification domain to evaluate HiddenSpeaker. Our results demonstrate that HiddenSpeaker not only deceives the model with unlearnable samples but also enhances the imperceptibility of the perturbations, showcasing strong transferability across different models.
- [58] arXiv:2405.15705 (cross-list from cs.AR) [pdf, ps, html, other]
-
Title: Sums: Sniffing Unknown Multiband Signals under Low Sampling RatesJinbo Peng, Zhe Chen, Zheng Lin, Haoxuan Yuan, Zihan Fang, Lingzhong Bao, Zihang Song, Ying Li, Jing Ren, Yue GaoComments: 12 pages, 9 figuresSubjects: Hardware Architecture (cs.AR); Systems and Control (eess.SY)
Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist sampling) can significantly improve their cost- and energy-efficiency. However, to achieve a multiband signals sniffer is really a challenge. To this end, we propose Sums, a system that can sniff and analyze multiband signals in a blind manner. Our Sums takes advantage of hardware and algorithm co-design, multi-coset sub-Nyquist sampling hardware, and a multi-task deep learning framework. The hardware component breaks the Nyquist rule to sample GHz bandwidth, but only pays for a 50 MSPS sampling rate. Our multi-task learning framework directly tackles the sampling data to perform spectrum sensing, physical layer protocol recognition, and demodulation for deep inspection from multiband signals. Extensive experiments demonstrate that Sums achieves higher accuracy than the state-of-theart baselines in spectrum sensing, modulation classification, and demodulation. As a result, our Sums can help researchers and end-users to diagnose or troubleshoot their problems of wireless infrastructures deployments in practice.
- [59] arXiv:2405.15719 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Hierarchical Uncertainty Exploration via Feedforward Posterior TreesComments: 32 pages, 21 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
When solving ill-posed inverse problems, one often desires to explore the space of potential solutions rather than be presented with a single plausible reconstruction. Valuable insights into these feasible solutions and their associated probabilities are embedded in the posterior distribution. However, when confronted with data of high dimensionality (such as images), visualizing this distribution becomes a formidable challenge, necessitating the application of effective summarization techniques before user examination. In this work, we introduce a new approach for visualizing posteriors across multiple levels of granularity using tree-valued predictions. Our method predicts a tree-valued hierarchical summarization of the posterior distribution for any input measurement, in a single forward pass of a neural network. We showcase the efficacy of our approach across diverse datasets and image restoration challenges, highlighting its prowess in uncertainty quantification and visualization. Our findings reveal that our method performs comparably to a baseline that hierarchically clusters samples from a diffusion-based posterior sampler, yet achieves this with orders of magnitude greater speed.
- [60] arXiv:2405.15731 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural NetworksSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
- [61] arXiv:2405.15762 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Sliding-Mode Nash Equilibrium Seeking for a Quadratic Duopoly GameComments: 8 pages and 2 figures. arXiv admin note: substantial text overlap with arXiv:2404.07287Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper introduces a new method to achieve stable convergence to Nash equilibrium in duopoly noncooperative games. Inspired by the recent fixed-time Nash Equilibrium seeking (NES) as well as prescribed-time extremum seeking (ES) and source seeking schemes, our approach employs a distributed sliding mode control (SMC) scheme, integrating extremum seeking with sinusoidal perturbation signals to estimate the pseudogradients of quadratic payoff functions. Notably, this is the first attempt to address noncooperative games without relying on models, combining classical extremum seeking with relay components instead of proportional control laws. We prove finite-time convergence of the closed-loop average system to Nash equilibrium using stability analysis techniques such as time-scaling, Lyapunov's direct method, and averaging theory for discontinuous systems. Additionally, we quantify the size of residual sets around the Nash equilibrium and validate our theoretical results through simulations.
Cross submissions for Monday, 27 May 2024 (showing 23 of 23 entries )
- [62] arXiv:2211.17182 (replaced) [pdf, ps, html, other]
-
Title: Direct Data-Driven State-Feedback Control of Linear Parameter-Varying SystemsComments: 27 pagesSubjects: Systems and Control (eess.SY)
The framework of linear parameter-varying (LPV) systems has shown to be a powerful tool for the design of controllers for complex nonlinear systems using linear tools. In this work, we derive novel methods that allow to synthesize LPV state-feedback controllers directly from a single sequence of data and guarantee stability and performance of the closed-loop system, without knowing the model of the plant. We show that if the measured open-loop data from the system satisfies a persistency of excitation condition, then the full open-loop and closed-loop input-scheduling-state behavior can be represented using only the data. With this representation, we formulate synthesis problems that yield controllers that guarantee stability and performance in terms of infinite horizon quadratic cost, generalized $\mathcal{H}_2$-norm and $\ell_2$-gain of the closed-loop system. The controllers are synthesized by solving an SDP with a finite set of LMI constraints. Additionally, we provide a synthesis method to handle noisy measurement data. Competitive performance of the proposed data-driven synthesis methods is demonstrated w.r.t. model-based synthesis that have complete knowledge of the true system model in multiple simulation studies, including a nonlinear unbalanced disc system.
- [63] arXiv:2301.03701 (replaced) [pdf, ps, html, other]
-
Title: Artificial Intelligence Model for Tumoral Clinical Decision Support SystemsComments: 16 pages, 8 figures, 3 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)
Comparative diagnostic in brain tumor evaluation makes possible to use the available information of a medical center to compare similar cases when a new patient is evaluated. By leveraging Artificial Intelligence models, the proposed system is able of retrieving the most similar cases of brain tumors for a given query. The primary objective is to enhance the diagnostic process by generating more accurate representations of medical images, with a particular focus on patient-specific normal features and pathologies. The proposed model uses Artificial Intelligence to detect patient features to recommend the most similar cases from a database. The system not only suggests similar cases but also balances the representation of healthy and abnormal features in its design. This not only encourages the generalization of its use but also aids clinicians in their decision-making processes. We conducted a comparative analysis of our approach in relation to similar studies. The proposed architecture obtains a Dice coefficient of 0.474 in both tumoral and healthy regions of the patients, which outperforms previous literature. Our proposed model excels at extracting and combining anatomical and pathological features from brain \glspl{mr}, achieving state-of-the-art results while relying on less expensive label information. This substantially reduces the overall cost of the training process. This paper provides substantial grounds for further exploration of the broader applicability and optimization of the proposed architecture to enhance clinical decision-making. The novel approach presented in this work marks a significant advancement in the field of medical diagnosis, particularly in the context of Artificial Intelligence-assisted image retrieval, and promises to reduce costs and improve the quality of patient care using Artificial Intelligence as a support tool instead of a black box system.
- [64] arXiv:2303.11423 (replaced) [pdf, ps, html, other]
-
Title: Heart Murmur and Abnormal PCG Detection via Wavelet Scattering Transform & a 1D-CNNComments: 11 pages, 8 figures, 10 tables, under review with a journalSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Heart murmurs provide valuable information about mechanical activity of the heart, which aids in diagnosis of various heart valve diseases. This work does automatic and accurate heart murmur detection from phonocardiogram (PCG) recordings. Two public PCG datasets (CirCor Digiscope 2022 dataset and PCG 2016 dataset) from Physionet online database are utilized to train and test three custom neural networks (NN): a 1D convolutional neural network (CNN), a long short-term memory (LSTM) recurrent neural network (RNN), and a convolutional RNN (C-RNN). We first do pre-processing which includes the following key steps: denoising, segmentation, re-labeling of noise-only segments, data normalization, and time-frequency analysis of the PCG segments using wavelet scattering transform. We then conduct four experiments, first three (E1-E3) using PCG 2022 dataset, and fourth (E4) using PCG 2016 dataset. It turns out that our custom 1D-CNN outperforms other two NNs (LSTM-RNN and C-RNN). Further, our 1D-CNN model outperforms the related work in terms of accuracy, weighted accuracy, F1-score and AUROC, for experiment E3 (that utilizes the cleaned and re-labeled PCG 2022 dataset). As for experiment E1 (that utilizes the original PCG 2022 dataset), our model performs quite close to the related work in terms of weighted accuracy and F1-score.
- [65] arXiv:2307.06129 (replaced) [pdf, ps, html, other]
-
Title: Channel Estimation for Beyond Diagonal Reconfigurable Intelligent Surfaces with Group-Connected ArchitecturesComments: 5 pages, 2 figures, accepted by CAMSAP 2023Subjects: Signal Processing (eess.SP); Information Theory (cs.IT)
We study channel estimation for a beyond diagonal reconfigurable intelligent surface (BD-RIS) aided multiple input single output system. We first describe the channel estimation strategy based on the least square (LS) method, derive the mean square error (MSE) of the LS estimator, and formulate the BD-RIS design problem that minimizes the estimation MSE with unique constraints induced by group-connected architectures of BD-RIS. Then, we propose an efficient BD-RIS design which theoretically guarantees to achieve the MSE lower bound. Finally, we provide simulation results to verify the effectiveness of the proposed channel estimation scheme.
- [66] arXiv:2310.02708 (replaced) [pdf, ps, html, other]
-
Title: Beyond Diagonal Reconfigurable Intelligent Surfaces with Mutual Coupling: Modeling and OptimizationComments: 5 pages, 3 figures, accepted by IEEE Commun. LettSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
This work studies the modeling and optimization of beyond diagonal reconfigurable intelligent surface (BD-RIS) aided wireless communication systems in the presence of mutual coupling among the RIS elements. Specifically, we first derive the mutual coupling aware BD-RIS aided communication model using scattering and impedance parameter analysis. Based on the obtained communication model, we propose a general BD-RIS optimization algorithm applicable to different architectures of BD-RIS to maximize the channel gain. Numerical results validate the effectiveness of the proposed design and demonstrate that the larger the mutual coupling the larger the gain offered by BD-RIS over conventional diagonal RIS.
- [67] arXiv:2310.04440 (replaced) [pdf, ps, html, other]
-
Title: Facilitating Battery Swapping Services for Freight Trucks with Spatial-Temporal Demand PredictionComments: 9 pages, 6 figuresSubjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI)
Electrifying heavy-duty trucks offers a substantial opportunity to curtail carbon emissions, advancing toward a carbon-neutral future. However, the inherent challenges of limited battery energy and the sheer weight of heavy-duty trucks lead to reduced mileage and prolonged charging durations. Consequently, battery-swapping services emerge as an attractive solution for these trucks. This paper employs a two-fold approach to investigate the potential and enhance the efficacy of such services. Firstly, spatial-temporal demand prediction models are adopted to predict the traffic patterns for the upcoming hours. Subsequently, the prediction guides an optimization module for efficient battery allocation and deployment. Analyzing the heavy-duty truck data on a highway network spanning over 2,500 miles, our model and analysis underscore the value of prediction/machine learning in facilitating future decision-makings. In particular, we find that the initial phase of implementing battery-swapping services favors mobile battery-swapping stations, but as the system matures, fixed-location stations are preferred.
- [68] arXiv:2310.08087 (replaced) [pdf, ps, html, other]
-
Title: A Carbon Tracking Model for Federated Learning: Impact of Quantization and SparsificationComments: accepted for presentation at IEEE CAMAD 2023Journal-ref: 2023 IEEE 28th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD)Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
- [69] arXiv:2312.12342 (replaced) [pdf, ps, html, other]
-
Title: Scalable Near-Field Localization Based on Partitioned Large-Scale Antenna ArraySubjects: Signal Processing (eess.SP)
This paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user equipment (UE) residing in its near-field (Fresnel) region. We propose a novel algorithm, named array partitioning-based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assumption that, by partitioning the ELAA into multiple subarrays, the UE can be approximated as in the far-field region of each subarray. We establish a Bayeian inference framework based on the geometric constraints between the UE location and the angles of arrivals (AoAs) at different subarrays. Then, the APLE algorithm is designed based on the message-passing principle for the localization of the UE. APLE exhibits linear computational complexity with the number of BS antennas, leading to a significant reduction in complexity compared to existing methods. We further propose an enhanced APLE (E-APLE) algorithm that refines the location estimate obtained from APLE by following the maximum likelihood principle. The E-APLE algorithm achieves superior localization accuracy compared to APLE while maintaining a linear complexity with the number of BS antennas. Numerical results demonstrate that the proposed APLE and E-APLE algorithms outperform the existing baselines in terms of localization accuracy.
- [70] arXiv:2401.02445 (replaced) [pdf, ps, html, other]
-
Title: Social and Economic Impact Analysis of Solar Mini-Grids in Rural Africa: A Cohort Study from Kenya and NigeriaAmy Town Carabajal, Akoua Orsot, Marie Pelagie Elimbi Moudio, Tracy Haggai, Chioma Joy Okonkwo, George Truett Jarrard III, Nicholas Stearns SelbyComments: 43 pages, 16 figures, accepted by __Environmental Research: Infrastructure and Sustainability__Subjects: Systems and Control (eess.SY); Applications (stat.AP)
This study presents the first comprehensive analysis of the social and economic effects of solar mini-grids in rural African settings, specifically in Kenya and Nigeria. A group of 2,658 household heads and business owners connected to mini-grids over the last five years were interviewed both before and one year after their connection. These interviews focused on changes in gender equality, productivity, health, safety, and economic activity. The results show notable improvements in all areas. Economic activities and productivity increased significantly among the connected households and businesses. The median income of rural Kenyan community members quadrupled. Gender equality also improved, with women gaining more opportunities in decision making and business. Health and safety enhancements were linked to reduced use of hazardous energy sources like kerosene lamps. The introduction of solar mini-grids not only transformed the energy landscape but also led to broad socioeconomic benefits in these rural areas. The research highlights the substantial impact of decentralized renewable energy on the social and economic development of rural African communities. Its findings are crucial for policymakers, development agencies, and stakeholders focused on promoting sustainable energy and development in Africa.
- [71] arXiv:2402.09423 (replaced) [pdf, ps, other]
-
Title: Online Mean Estimation for Multi-frame Optical Fiber Signals On HighwaysComments: 10 pages, 11figuresSubjects: Signal Processing (eess.SP); Data Analysis, Statistics and Probability (physics.data-an)
In the era of Big Data, prompt analysis and processing of data sets is critical. Meanwhile, statistical methods provide key tools and techniques to extract valuable insights and knowledge from complex data sets. This paper creatively applies statistical methods to the field of traffic, particularly focusing on the preprocessing of multi-frame signals obtained by optical fiber-based Distributed Acoustic Sensing (DAS) system. An online non-parametric regression model based on Local Polynomial Regression (LPR) and variable bandwidth selection is employed to dynamically update the estimation of mean function as signals flow in. This mean estimation method can derive average information of multi-frame fiber signals, thus providing the basis for the subsequent vehicle trajectory extraction algorithms. To further evaluate the effectiveness of the proposed method, comparison experiments were conducted under real highway scenarios, showing that our approach not only deals with multi-frame signals more accurately than the classical filter-based Kalman and Wavelet methods, but also meets the needs better under the condition of saving memory and rapid responses. It provides a new reliable means for signal processing which can be integrated with other existing methods.
- [72] arXiv:2403.12852 (replaced) [pdf, ps, html, other]
-
Title: Generative Enhancement for 3D Medical ImagesComments: 20 pages, 8 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The limited availability of 3D medical image datasets, due to privacy concerns and high collection or annotation costs, poses significant challenges in the field of medical imaging. While a promising alternative is the use of synthesized medical data, there are few solutions for realistic 3D medical image synthesis due to difficulties in backbone design and fewer 3D training samples compared to 2D counterparts. In this paper, we propose GEM-3D, a novel generative approach to the synthesis of 3D medical images and the enhancement of existing datasets using conditional diffusion models. Our method begins with a 2D slice, noted as the informed slice to serve the patient prior, and propagates the generation process using a 3D segmentation mask. By decomposing the 3D medical images into masks and patient prior information, GEM-3D offers a flexible yet effective solution for generating versatile 3D images from existing datasets. GEM-3D can enable dataset enhancement by combining informed slice selection and generation at random positions, along with editable mask volumes to introduce large variations in diffusion sampling. Moreover, as the informed slice contains patient-wise information, GEM-3D can also facilitate counterfactual image synthesis and dataset-level de-enhancement with desired control. Experiments on brain MRI and abdomen CT images demonstrate that GEM-3D is capable of synthesizing high-quality 3D medical images with volumetric consistency, offering a straightforward solution for dataset enhancement during inference. The code is available at this https URL.
- [73] arXiv:2404.13884 (replaced) [pdf, ps, other]
-
Title: MambaUIE&SR: Unraveling the Ocean's Secrets with Only 2.8 GFLOPsComments: arXiv admin note: text overlap with arXiv:2305.08824 by other authorsSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Underwater Image Enhancement (UIE) techniques aim to address the problem of underwater image degradation due to light absorption and scattering. In recent years, both Convolution Neural Network (CNN)-based and Transformer-based methods have been widely explored. In addition, combining CNN and Transformer can effectively combine global and local information for enhancement. However, this approach is still affected by the secondary complexity of the Transformer and cannot maximize the performance. Recently, the state-space model (SSM) based architecture Mamba has been proposed, which excels in modeling long distances while maintaining linear complexity. This paper explores the potential of this SSM-based model for UIE from both efficiency and effectiveness perspectives. However, the performance of directly applying Mamba is poor because local fine-grained features, which are crucial for image enhancement, cannot be fully utilized. Specifically, we customize the MambaUIE architecture for efficient UIE. Specifically, we introduce visual state space (VSS) blocks to capture global contextual information at the macro level while mining local information at the micro level. Also, for these two kinds of information, we propose a Dynamic Interaction Block (DIB) and Spatial feed-forward Network (SGFN) for intra-block feature aggregation. MambaUIE is able to efficiently synthesize global and local information and maintains a very small number of parameters with high accuracy. Experiments on UIEB datasets show that our method reduces GFLOPs by 67.4% (2.715G) relative to the SOTA method. To the best of our knowledge, this is the first UIE model constructed based on SSM that breaks the limitation of FLOPs on accuracy in UIE. The official repository of MambaUIE at this https URL.
- [74] arXiv:2404.14596 (replaced) [pdf, ps, html, other]
-
Title: Efficient and Timely Memory AccessComments: To be presented at ISIT 2024Subjects: Systems and Control (eess.SY); Information Theory (cs.IT)
This paper investigates the optimization of memory sampling in status updating systems, where source updates are published in shared memory, and reader process samples the memory for source updates by paying a sampling cost. We formulate a discrete-time decision problem to find a sampling policy that minimizes average cost comprising age at the client and the cost incurred due to sampling. We establish that an optimal policy is a stationary and deterministic threshold-type policy, and subsequently derive optimal threshold and the corresponding optimal average cost.
- [75] arXiv:2405.03334 (replaced) [pdf, ps, html, other]
-
Title: On the constrained feedback linearization control based on the MILP representation of a ReLU-ANNSubjects: Systems and Control (eess.SY)
In this work, we explore the efficacy of rectified linear unit artificial neural networks in addressing the intricate challenges of convoluted constraints arising from feedback linearization mapping. Our approach involves a comprehensive procedure, encompassing the approximation of constraints through a regression process. Subsequently, we transform these constraints into an equivalent representation of mixed-integer linear constraints, seamlessly integrating them into other stabilizing control architectures. The advantage resides in the compatibility with the linear control design and the constraint satisfaction in the model predictive control setup, even for forecasted trajectories. Simulations are provided to validate the proposed constraint reformulation.
- [76] arXiv:2405.08228 (replaced) [pdf, ps, html, other]
-
Title: Slow Inter-area Electro-mechanical Oscillations Revisited: Structural Property of Complex Multi-area Electric Power SystemsComments: 6 pages, 4 figures, 10th International Conference on Control, Decision and Information TechnologiesSubjects: Systems and Control (eess.SY)
This paper introduces a physically-intuitive notion of inter-area dynamics in systems comprising multiple interconnected energy conversion modules. The idea builds on an earlier general approach of setting their structural properties by modeling internal dynamics in stand-alone modules (components, areas) using the fundamental conservation laws between energy stored and generated, and then constraining explicitly their Tellegen's quantities (power and rate of change of power). In this paper we derive, by following the same principles, a transformed state-space model for a general nonlinear system. Using this model we show the existence of an area-level interaction variable, intVar, whose rate of change depends solely on the area internal power imbalance and is independent of the model complexity used for representing individual module dynamics in the area. Given these structural properties of stand-alone modules, we define in this paper for the first time an inter-area variable as the difference of power wave incident to tie-line from Area I and the power reflected into tie-lie from Area II. Notably, these power waves represent the interaction variables associated with the two respective interconnected areas. We illustrate these notions using a linearized case of two lossless inter-connected areas, and show the existence of a new inter-area mode when the areas get connected. We suggest that lessons learned in this paper open possibilities for computationally-efficient modeling and control of inter-area oscillations, and offer further the basis for modeling and control of dynamics in changing systems comprising faster energy conversion processes.
- [77] arXiv:2405.09298 (replaced) [pdf, ps, other]
-
Title: Deep Blur Multi-Model (DeepBlurMM) -- a strategy to mitigate the impact of image blur on deep learning model performance in histopathology image analysisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
AI-based analysis of histopathology whole slide images (WSIs) is central in computational pathology. However, image quality, including unsharp areas of WSIs, impacts model performance. We investigate the impact of blur and propose a multi-model approach to mitigate negative impact of unsharp image areas. In this study, we use a simulation approach, evaluating model performance under varying levels of added Gaussian blur to image tiles from >900 H&E-stained breast cancer WSIs. To reduce impact of blur, we propose a novel multi-model approach (DeepBlurMM) where multiple models trained on data with variable amounts of Gaussian blur are used to predict tiles based on their blur levels. Using histological grade as a principal example, we found that models trained with mildly blurred tiles improved performance over the base model when moderate-high blur was present. DeepBlurMM outperformed the base model in presence of moderate blur across all tiles (AUC:0.764 vs. 0.710), and in presence of a mix of low, moderate, and high blur across tiles (AUC:0.821 vs. 0.789). Unsharp image tiles in WSIs impact prediction performance. DeepBlurMM improved prediction performance under some conditions and has the potential to increase quality in both research and clinical applications.
- [78] arXiv:2405.10833 (replaced) [pdf, ps, html, other]
-
Title: Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scansSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients.
Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline.
Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission.
Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs. - [79] arXiv:2405.11401 (replaced) [pdf, ps, html, other]
-
Title: PDE Control Gym: A Benchmark for Data-Driven Boundary Control of Partial Differential EquationsComments: 26 pages 10 figures. Accepted L4DC 2024Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Optimization and Control (math.OC)
Over the last decade, data-driven methods have surged in popularity, emerging as valuable tools for control theory. As such, neural network approximations of control feedback laws, system dynamics, and even Lyapunov functions have attracted growing attention. With the ascent of learning based control, the need for accurate, fast, and easy-to-use benchmarks has increased. In this work, we present the first learning-based environment for boundary control of PDEs. In our benchmark, we introduce three foundational PDE problems - a 1D transport PDE, a 1D reaction-diffusion PDE, and a 2D Navier-Stokes PDE - whose solvers are bundled in an user-friendly reinforcement learning gym. With this gym, we then present the first set of model-free, reinforcement learning algorithms for solving this series of benchmark problems, achieving stability, although at a higher cost compared to model-based PDE backstepping. With the set of benchmark environments and detailed examples, this work significantly lowers the barrier to entry for learning-based PDE control - a topic largely unexplored by the data-driven control community. The entire benchmark is available on Github along with detailed documentation and the presented reinforcement learning models are open sourced.
- [80] arXiv:2405.12609 (replaced) [pdf, ps, html, other]
-
Title: Mamba in Speech: Towards an Alternative to Self-AttentionXiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien EppsSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Transformer and its derivatives have achieved success in diverse tasks across computer vision, natural language processing, and speech processing. To reduce the complexity of computations within the multi-head self-attention mechanism in Transformer, Selective State Space Models (i.e., Mamba) were proposed as an alternative. Mamba exhibited its effectiveness in natural language processing and computer vision tasks, but its superiority has rarely been investigated in speech signal processing. This paper explores solutions for applying Mamba to speech processing using two typical speech processing tasks: speech recognition, which requires semantic and sequential information, and speech enhancement, which focuses primarily on sequential patterns. The experimental results exhibit the superiority of bidirectional Mamba (BiMamba) for speech processing to vanilla Mamba. Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task. The crucial technologies for transferring Mamba to speech are then summarized in ablation studies and the discussion section to offer insights for future research.
- [81] arXiv:2405.14327 (replaced) [pdf, ps, html, other]
-
Title: Autoregressive Image Diffusion: Generation of Image Sequence and Application in MRISubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Magnetic resonance imaging (MRI) is a widely used non-invasive imaging modality. However, a persistent challenge lies in balancing image quality with imaging speed. This trade-off is primarily constrained by k-space measurements, which traverse specific trajectories in the spatial Fourier domain (k-space). These measurements are often undersampled to shorten acquisition times, resulting in image artifacts and compromised quality. Generative models learn image distributions and can be used to reconstruct high-quality images from undersampled k-space data. In this work, we present the autoregressive image diffusion (AID) model for image sequences and use it to sample the posterior for accelerated MRI reconstruction. The algorithm incorporates both undersampled k-space and pre-existing information. Models trained with fastMRI dataset are evaluated comprehensively. The results show that the AID model can robustly generate sequentially coherent image sequences. In 3D and dynamic MRI, the AID can outperform the standard diffusion model and reduce hallucinations, due to the learned inter-image dependencies.
- [82] arXiv:2405.14802 (replaced) [pdf, ps, html, other]
-
Title: Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image GenerationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensionality of medical images, which are often 3D or 4D. Training a diffusion model on medical images typically takes days to weeks, while sampling each image volume takes minutes to hours. To address this challenge, we introduce Fast-DDPM, a simple yet effective approach capable of improving training speed, sampling speed, and generation quality simultaneously. Unlike DDPM, which trains the image denoiser across 1,000 time steps, Fast-DDPM trains and samples using only 10 time steps. The key to our method lies in aligning the training and sampling procedures to optimize time-step utilization. Specifically, we introduced two efficient noise schedulers with 10 time steps: one with uniform time step sampling and another with non-uniform sampling. We evaluated Fast-DDPM across three medical image-to-image generation tasks: multi-image super-resolution, image denoising, and image-to-image translation. Fast-DDPM outperformed DDPM and current state-of-the-art methods based on convolutional networks and generative adversarial networks in all tasks. Additionally, Fast-DDPM reduced the training time to 0.2x and the sampling time to 0.01x compared to DDPM. Our code is publicly available at: this https URL.
- [83] arXiv:2302.04344 (replaced) [pdf, ps, html, other]
-
Title: Learning Dynamical Systems by Leveraging Data from Similar SystemsComments: 15 pages,9 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
We consider the problem of learning the dynamics of a linear system when one has access to data generated by an auxiliary system that shares similar (but not identical) dynamics, in addition to data from the true system. We use a weighted least squares approach, and provide finite sample error bounds of the learned model as a function of the number of samples and various system parameters from the two systems as well as the weight assigned to the auxiliary data. We show that the auxiliary data can help to reduce the intrinsic system identification error due to noise, at the price of adding a portion of error that is due to the differences between the two system models. We further provide a data-dependent bound that is computable when some prior knowledge about the systems, such as upper bounds on noise levels and model difference, is available. This bound can also be used to determine the weight that should be assigned to the auxiliary data during the model training stage.
- [84] arXiv:2308.03586 (replaced) [pdf, ps, other]
-
Title: SoilNet: An Attention-based Spatio-temporal Deep Learning Framework for Soil Organic Carbon Prediction with Digital Soil Mapping in EuropeComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Digital soil mapping (DSM) is an advanced approach that integrates statistical modeling and cutting-edge technologies, including machine learning (ML) methods, to accurately depict soil properties and their spatial distribution. Soil organic carbon (SOC) is a crucial soil attribute providing valuable insights into soil health, nutrient cycling, greenhouse gas emissions, and overall ecosystem productivity. This study highlights the significance of spatial-temporal deep learning (DL) techniques within the DSM framework. A novel architecture is proposed, incorporating spatial information using a base convolutional neural network (CNN) model and spatial attention mechanism, along with climate temporal information using a long short-term memory (LSTM) network, for SOC prediction across Europe. The model utilizes a comprehensive set of environmental features, including Landsat-8 images, topography, remote sensing indices, and climate time series, as input features. Results demonstrate that the proposed framework outperforms conventional ML approaches like random forest commonly used in DSM, yielding lower root mean square error (RMSE). This model is a robust tool for predicting SOC and could be applied to other soil properties, thereby contributing to the advancement of DSM techniques and facilitating land management and decision-making processes based on accurate information.
- [85] arXiv:2309.17371 (replaced) [pdf, ps, html, other]
-
Title: Adversarial Imitation Learning from Visual Observations using Latent InformationSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source. The challenges of this framework include the absence of expert actions and the partial observability of the environment, as the ground-truth states can only be inferred from pixels. To tackle this problem, we first conduct a theoretical analysis of imitation learning in partially observable environments. We establish upper bounds on the suboptimality of the learning agent with respect to the divergence between the expert and the agent latent state-transition distributions. Motivated by this analysis, we introduce an algorithm called Latent Adversarial Imitation from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations. In experiments on high-dimensional continuous robotic tasks, we show that our model-free approach in latent space matches state-of-the-art performance. Additionally, we show how our method can be used to improve the efficiency of reinforcement learning from pixels by leveraging expert videos. To ensure reproducibility, we provide free access to our code.
- [86] arXiv:2310.03679 (replaced) [pdf, ps, html, other]
-
Title: Role of Spatial Coherence in Diffractive Optical Neural NetworksComments: 9 pages, 3 figuresSubjects: Optics (physics.optics); Image and Video Processing (eess.IV)
Diffractive optical neural networks (DONNs) have emerged as a promising optical hardware platform for ultra-fast and energy-efficient signal processing for machine learning tasks, particularly in computer vision. Previous experimental demonstrations of DONNs have only been performed using coherent light. However, many real-world DONN applications require consideration of the spatial coherence properties of the optical signals. Here, we study the role of spatial coherence in DONN operation and performance. We propose a numerical approach to efficiently simulate DONNs under incoherent and partially coherent input illumination and discuss the corresponding computational complexity. As a demonstration, we train and evaluate simulated DONNs on the MNIST dataset of handwritten digits to process light with varying spatial coherence.
- [87] arXiv:2310.10107 (replaced) [pdf, ps, html, other]
-
Title: Posterior Sampling-based Online Learning for Episodic POMDPsComments: 32 pages, 4 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY); Machine Learning (stat.ML)
Learning in POMDPs is known to be significantly harder than MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes, matching the lower bound, and is polynomial in the other parameters. In a general setting, its regret scales exponentially in the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret.
- [88] arXiv:2311.01479 (replaced) [pdf, ps, html, other]
-
Title: Detecting Out-of-Distribution Through the Lens of Neural CollapseSubjects: Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex Equiangular Tight Framework, which nicely explains the prevalent observation that ID features reside further from the origin than OOD features. Taking both insights from Neural Collapse into consideration, we propose to leverage feature proximity to weight vectors for OOD detection and further complement this perspective by using feature norms to filter OOD samples. Extensive experiments on off-the-shelf models demonstrate the efficiency and effectiveness of our method across diverse classification tasks and model architectures, enhancing the generalization capability of OOD detection.
- [89] arXiv:2311.10701 (replaced) [pdf, ps, html, other]
-
Title: SpACNN-LDVAE: Spatial Attention Convolutional Latent Dirichlet Variational Autoencoder for Hyperspectral Pixel UnmixingComments: Accepted at IGARSS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
The hyperspectral pixel unmixing aims to find the underlying materials (endmembers) and their proportions (abundances) in pixels of a hyperspectral image. This work extends the Latent Dirichlet Variational Autoencoder (LDVAE) pixel unmixing scheme by taking into account local spatial context while performing pixel unmixing. The proposed method uses an isotropic convolutional neural network with spatial attention to encode pixels as a dirichlet distribution over endmembers. We have evaluated our model on Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21 datasets. Our model also leverages the transfer learning paradigm for Cuprite Dataset, where we train the model on synthetic data and evaluate it on the real-world data. The results suggest that incorporating spatial context improves both endmember extraction and abundance estimation.
- [90] arXiv:2312.04610 (replaced) [pdf, ps, other]
-
Title: Data-driven Semi-supervised Machine Learning with Surrogate Safety Measures for Abnormal Driving Behavior DetectionComments: 22 pages, 10 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under third round review by Transportation Research Record: Journal of the Transportation Research BoardSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Other Statistics (stat.OT)
Detecting abnormal driving behavior is critical for road traffic safety and the evaluation of drivers' behavior. With the advancement of machine learning (ML) algorithms and the accumulation of naturalistic driving data, many ML models have been adopted for abnormal driving behavior detection. Most existing ML-based detectors rely on (fully) supervised ML methods, which require substantial labeled data. However, ground truth labels are not always available in the real world, and labeling large amounts of data is tedious. Thus, there is a need to explore unsupervised or semi-supervised methods to make the anomaly detection process more feasible and efficient. To fill this research gap, this study analyzes large-scale real-world data revealing several abnormal driving behaviors (e.g., sudden acceleration, rapid lane-changing) and develops a Hierarchical Extreme Learning Machines (HELM) based semi-supervised ML method using partly labeled data to accurately detect the identified abnormal driving behaviors. Moreover, previous ML-based approaches predominantly utilize basic vehicle motion features (such as velocity and acceleration) to label and detect abnormal driving behaviors, while this study seeks to introduce Surrogate Safety Measures (SSMs) as the input features for ML models to improve the detection performance. Results from extensive experiments demonstrate the effectiveness of the proposed semi-supervised ML model with the introduced SSMs serving as important features. The proposed semi-supervised ML method outperforms other baseline semi-supervised or unsupervised methods regarding various metrics, e.g., delivering the best accuracy at 99.58% and the best F-1 measure at 0.9913. The ablation study further highlights the significance of SSMs for advancing detection performance.
- [91] arXiv:2402.05967 (replaced) [pdf, ps, html, other]
-
Title: The last Dance : Robust backdoor attack via diffusion models and bayesian approachComments: Preprint (Last update): audio backdoor attack on Hugging Face's Transformer pre-trained models. This attack incorporates state-of-the-art Bayesian techniques, a modified Fokker-Planck equation (via Yang-Mills), and a diffusion model approachSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Signal Processing (eess.SP)
Diffusion models are state-of-the-art deep learning generative models that are trained on the principle of learning forward and backward diffusion processes via the progressive addition of noise and denoising. In this paper, we aim to fool audio-based DNN models, such as those from the Hugging Face framework, primarily those that focus on audio, in particular transformer-based artificial intelligence models, which are powerful machine learning models that save time and achieve results faster and more efficiently. We demonstrate the feasibility of backdoor attacks (called `BacKBayDiffMod`) on audio transformers derived from Hugging Face, a popular framework in the world of artificial intelligence research. The backdoor attack developed in this paper is based on poisoning model training data uniquely by incorporating backdoor diffusion sampling and a Bayesian approach to the distribution of poisoned data.
- [92] arXiv:2402.15659 (replaced) [pdf, ps, html, other]
-
Title: DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing DataComments: This paper has been accepted in IJCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals (SDGs), such as poverty estimation, urban sustainable development, and carbon emission. However, existing NTL observations often suffer from pervasive degradation and inconsistency, limiting their utility for computing the indicators defined by the SDGs. In this study, we propose a novel approach to reconstruct high-resolution NTL images using multi-modal remote sensing data. To support this research endeavor, we introduce DeepLightMD, a comprehensive dataset comprising data from five heterogeneous sensors, offering fine spatial resolution and rich spectral information at a national scale. Additionally, we present DeepLightSR, a calibration-aware method for building bridges between spatially heterogeneous modality data in the multi-modality super-resolution. DeepLightSR integrates calibration-aware alignment, an auxiliary-to-main multi-modality fusion, and an auxiliary-embedded refinement to effectively address spatial heterogeneity, fuse diversely representative features, and enhance performance in $8\times$ super-resolution (SR) tasks. Extensive experiments demonstrate the superiority of DeepLightSR over 8 competing methods, as evidenced by improvements in PSNR (2.01 dB $ \sim $ 13.25 dB) and PIQE (0.49 $ \sim $ 9.32). Our findings underscore the practical significance of our proposed dataset and model in reconstructing high-resolution NTL data, supporting efficiently and quantitatively assessing the SDG progress.
- [93] arXiv:2403.11732 (replaced) [pdf, ps, html, other]
-
Title: Hallucination in Perceptual Metric-Driven Speech Enhancement NetworksComments: Accepted for EUSIPCO 2024Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Within the area of speech enhancement, there is an ongoing interest in the creation of neural systems which explicitly aim to improve the perceptual quality of the processed audio. In concert with this is the topic of non-intrusive (i.e. without clean reference) speech quality prediction, for which neural networks are trained to predict human-assigned quality labels directly from distorted audio. When combined, these areas allow for the creation of powerful new speech enhancement systems which can leverage large real-world datasets of distorted audio, by taking inference of a pre-trained speech quality predictor as the sole loss function of the speech enhancement system. This paper aims to identify a potential pitfall with this approach, namely hallucinations which are introduced by the enhancement system `tricking' the speech quality predictor.
- [94] arXiv:2403.15405 (replaced) [pdf, ps, other]
-
Title: Predicting Parkinson's disease trajectory using clinical and functional MRI features: a reproduction and replication studyElodie Germani (EMPENN, LACODAM), Nikhil Baghwat, Mathieu Dugré (CSE), Rémi Gau, Albert Montillo, Kevin Nguyen, Andrzej Sokolowski (CSE), Madeleine Sharp, Jean-Baptiste Poline, Tristan Glatard (CSE)Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Parkinson's disease (PD) is a common neurodegenerative disorder with a poorly understood physiopathology and no established biomarkers for the diagnosis of early stages and for prediction of disease progression. Several neuroimaging biomarkers have been studied recently, but these are susceptible to several sources of variability. In this context, an evaluation of the robustness of such biomarkers is essential. This study is part of a larger project investigating the replicability of potential neuroimaging biomarkers of PD. Here, we attempt to reproduce (same data, same method) and replicate (different data or method) the models described in Nguyen et al., 2021 to predict individual's PD current state and progression using demographic, clinical and neuroimaging features (fALFF and ReHo extracted from resting-state fMRI). We use the Parkinson's Progression Markers Initiative dataset (PPMI, this http URL), as in Nguyen et al.,2021 and aim to reproduce the original cohort, imaging features and machine learning models as closely as possible using the information available in the paper and the code. We also investigated methodological variations in cohort selection, feature extraction pipelines and sets of input features. The success of the reproduction was assessed using different criteria. Notably, we obtained significantly better than chance performance using the analysis pipeline closest to that in the original study (R2 > 0), which is consistent with its findings. The challenges encountered while reproducing and replicating the original work are likely explained by the complexity of neuroimaging studies, in particular in clinical settings. We provide recommendations to further facilitate the reproducibility of such studies in the future.
- [95] arXiv:2404.08986 (replaced) [pdf, ps, other]
-
Title: Airship Formations for Animal Motion Capture and Behavior AnalysisComments: Accepted for presentation at the 2nd International Conference on Design and Engineering of Lighter-Than-Air systems (DELTAS2024)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Using UAVs for wildlife observation and motion capture offers manifold advantages for studying animals in the wild, especially grazing herds in open terrain. The aerial perspective allows observation at a scale and depth that is not possible on the ground, offering new insights into group behavior. However, the very nature of wildlife field-studies puts traditional fixed wing and multi-copter systems to their limits: limited flight time, noise and safety aspects affect their efficacy, where lighter than air systems can remain on station for many hours. Nevertheless, airships are challenging from a ground handling perspective as well as from a control point of view, being voluminous and highly affected by wind. In this work, we showcase a system designed to use airship formations to track, follow, and visually record wild horses from multiple angles, including airship design, simulation, control, on board computer vision, autonomous operation and practical aspects of field experiments.
- [96] arXiv:2404.09466 (replaced) [pdf, ps, html, other]
-
Title: Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano TranscriptionComments: Fixed TyposSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed intervals tied to specific event types. The neural semi-CRF approach requires an interval scoring matrix that assigns a score for every candidate interval. However, designing an efficient and expressive architecture for scoring intervals is not trivial. In this paper, we introduce a simple method for scoring intervals using scaled inner product operations that resemble how attention scoring is done in transformers. We show theoretically that, due to the special structure from encoding the non-overlapping intervals, under a mild condition, the inner product operations are expressive enough to represent an ideal scoring matrix that can yield the correct transcription result. We then demonstrate that an encoder-only non-hierarchical transformer backbone, operating only on a low-time-resolution feature map, is capable of transcribing piano notes and pedals with high accuracy and time precision. The experiment shows that our approach achieves the new state-of-the-art performance across all subtasks in terms of the F1 measure on the Maestro dataset.
- [97] arXiv:2405.13168 (replaced) [pdf, ps, html, other]
-
Title: Modeling and Simulation of Charge-Induced Signals in Photon-Counting CZT Detectors for Medical Imaging ApplicationsManuel Ballester, Jaromir Kaspar, Francesc Massanes, Srutarshi Banerjee, Alexander Hans Vija, Aggelos K. KatsaggelosSubjects: Instrumentation and Detectors (physics.ins-det); Image and Video Processing (eess.IV)
Photon-counting detectors based on CZT are essential in nuclear medical imaging, particularly for SPECT applications. Although CZT detectors are known for their precise energy resolution, defects within the CZT crystals significantly impact their performance. These defects result in inhomogeneous material properties throughout the bulk of the detector. The present work introduces an efficient computational model that simulates the operation of semiconductor detectors, accounting for the spatial variability of the crystal properties. Our simulator reproduces the charge-induced pulse signals generated after the X/gamma-rays interact with the detector. The performance evaluation of the model shows an RMSE in the signal below 0.70%. Our simulator can function as a digital twin to accurately replicate the operation of actual detectors. Thus, it can be used to mitigate and compensate for adverse effects arising from crystal impurities.
- [98] arXiv:2405.14598 (replaced) [pdf, ps, html, other]
-
Title: Visual Echoes: A Simple Unified Transformer for Audio-Visual GenerationShiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki MitsufujiComments: 10 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models. Instead of designing another giant model for audio-visual generation, in this paper we take a step back showing a simple and lightweight generative transformer, which is not fully investigated in multi-modal generation, can achieve excellent results on image2audio generation. The transformer operates in the discrete audio and visual Vector-Quantized GAN space, and is trained in the mask denoising manner. After training, the classifier-free guidance could be deployed off-the-shelf achieving better performance, without any extra training or modification. Since the transformer model is modality symmetrical, it could also be directly deployed for audio2image generation and co-generation. In the experiments, we show that our simple method surpasses recent image2audio generation methods. Generated audio samples can be found at this https URL