IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022

List of Accepted Papers

Following is the list of accepted ICASSP 2022 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at papers@2022.ieeeicassp.org.

Paper Number Paper Title
19293D CROSS-SCALE FEATURE TRANSFORMER NETWORK FOR BRAIN MR IMAGE SUPER-RESOLUTION
44293D TEXTURE SUPER RESOLUTION VIA THE RENDERING LOSS
45974D CONVOLUTIONAL NEURAL NETWORKS FOR MULTI-SPECTRAL AND MULTI-TEMPORAL REMOTE SENSING DATA CLASSIFICATION
4012A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder
8962A BENCHMARK OF STATE-OF-THE-ART SOUND EVENT DETECTION SYSTEMS EVALUATED ON SYNTHETIC SOUNDSCAPES
3369A BERT based Joint Learning Model with Feature Gated Mechanism for Spoken Language Understanding
4579A BRIDGE BETWEEN FEATURES AND EVIDENCE FOR BINARY ATTRIBUTE-DRIVEN PERFECT PRIVACY
5995A Byzantine-resilient Dual Subgradient Method for Vertical Federated Learning
6464A CHANNEL ATTENTION BASED MLP-MIXER NETWORK FOR MOTOR IMAGERY DECODING WITH EEG
3372A CHARACTER-LEVEL SPAN-BASED MODEL FOR MANDARIN PROSODIC STRUCTURE PREDICTION
8736A CLOSER LOOK AT AUTOENCODERS FOR UNSUPERVISED ANOMALY DETECTION
8757A CLUSTERING-BASED ML SCHEME FOR CAPACITY APPROACHING SOFT LEVEL SENSING IN 3D TLC NAND
9143A COMMONSENSE KNOWLEDGE ENHANCED NETWORK WITH RETROSPECTIVE LOSS FOR EMOTION RECOGNITION IN SPOKEN DIALOG
8790A COMMUNICATION EFFICIENT QUASI-NEWTON METHOD FOR LARGE-SCALE DISTRIBUTED MULTI-AGENT OPTIMIZATION
4532A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION
4193A COMPLEX SPECTRAL MAPPING WITH INPLACE CONVOLUTION RECURRENT NEURAL NETWORKS FOR ACOUSTIC ECHO CANCELLATION
2525A Configurable Multilingual Model is All You Need to Recognize All Languages
4411A CONVEX FORMULATION FOR THE ROBUST ESTIMATION OF MULTIVARIATE EXPONENTIAL POWER MODELS
1164A CRLB ANALYSIS OF AOA ESTIMATION USING BLUETOOTH 5
4653A DATA-DRIVEN APPROACH FOR ACOUSTIC PARAMETER SIMILARITY ESTIMATION OF SPEECH RECORDING
1857A DATA-DRIVEN COGNITIVE SALIENCE MODEL FOR OBJECTIVE PERCEPTUAL AUDIO QUALITY ASSESSMENT
4546A DATA-DRIVEN QUANTIZATION DESIGN FOR DISTRIBUTED TESTING AGAINST INDEPENDENCE WITH COMMUNICATION CONSTRAINTS
4253A DIFFERENTIABLE OPTIMISATION FRAMEWORK FOR THE DESIGN OF INDIVIDUALISED DNN-BASED HEARING-AID STRATEGIES
1087A DILATED RESIDUAL VISION TRANSFORMER FOR ATRIAL FIBRILLATION DETECTION FROM STACKED TIME-FREQUENCY ECG REPRESENTATIONS
4068A DNN BASED POST-FILTER TO ENHANCE THE QUALITY OF CODED SPEECH IN MDCT DOMAIN
8853A domain transfer based data augmentation method for automated respiratory classification
4665A DYNAMIC REWEIGHTING STRATEGY FOR FAIR FEDERATED LEARNING
4313A FAST AND EFFICIENT NETWORK FOR SINGLE IMAGE SHADOW DETECTION
2491A Few-sample Strategy for Guitar Tablature Transcription Based on Inharmonicity Analysis and Playability Constraints
8818A FRAME LOSS OF MULTIPLE INSTANCE LEARNING FOR WEAKLY SUPERVISED SOUND EVENT DETECTION
5035A Framework for Private Communication with Secret Block Structure
2386A FREE LUNCH FROM VIT: ADAPTIVE ATTENTION MULTI-SCALE FUSION TRANSFORMER FOR FINE-GRAINED VISUAL RECOGNITION
3033A Gaussian Mixture Model for Dialogue Generation with Dynamic Parameter Sharing Strategy
9279A GENERAL FRAMEWORK FOR DISTRIBUTED INFERENCE WITH UNCERTAIN MODELS
1055A GENERAL FRAMEWORK FOR INCOMPLETE CROSS-MODAL RETRIEVAL WITH MISSING LABELS AND MISSING MODALITIES
1733A GENERALIZED HIERARCHICAL NONNEGATIVE TENSOR DECOMPOSITION
1579A Generalized Kernel Risk Sensitive Loss for Robust Two-dimensional Singular Value Decomposition
3613A GENERIC METHOD TO ESTIMATE CAMERA EXTRINSIC PARAMETERS
4860A glance-and-gaze network for respiratory sound classification
9225A Global to Local Guiding Network for Missing Data Imputation
3000A GRAPH ATTENTION INTERACTIVE REFINE FRAMEWORK WITH CONTEXTUAL REGULARIZATION FOR JOINTING INTENT DETECTION AND SLOT FILLING
3430A HYBRID APPROACH TO COMBINE WIRELESS AND EARCUP MICROPHONES FOR ANC HEADPHONES WITH ERROR SEPARATION MODULE
6023A HYBRID LEARNING FRAMEWORK FOR DEEP SPIKING NEURAL NETWORKS WITH ONE-SPIKE TEMPORAL CODING
1847A KNOWLEDGE/DATA ENHANCED METHOD FOR JOINT EVENT AND TEMPORAL RELATION EXTRACTION
1831A LIGHT WEIGHT MODEL FOR VIDEO SHOT OCCLUSION DETECTION
4718A LIGHTWEIGHT INSTRUMENT-AGNOSTIC MODEL FOR POLYPHONIC NOTE TRANSCRIPTION AND MULTIPITCH ESTIMATION
2302A LIGHTWEIGHT SELF-SUPERVISED TRAINING FRAMEWORK FOR MONOCULAR DEPTH ESTIMATION
4710A likelihood ratio based domain adaptation method for E2E models
3087A LOW-PARAMETRIC MODEL FOR BIT-RATE ESTIMATION OF VVC RESIDUAL CODING
4292A Maximal Correlation Approach to Imposing Fairness in Machine Learning
5377A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS
4919A METHOD FOR DETECTING CORONARY ARTERY DISEASE USING NOISY ULTRASHORT ELECTROCARDIOGRAM RECORDINGS
2280A METHOD FOR ESTIMATING THE GROUPING OF PARTICIPANTS IN CLASSROOM GROUP WORK USING ONLY AUDIO INFORMATION
3236A METHOD TO REVEAL SPEAKER IDENTITY IN DISTRIBUTED ASR TRAINING, AND HOW TO COUNTER IT
8754A MINIMALLY SUPERVISED APPROACH FOR MEDICAL IMAGE QUALITY ASSESSMENT IN DOMAIN SHIFT SETTINGS
5891A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT
1800A MULTI DOMAIN KNOWLEDGE ENHANCED MATCHING NETWORK FOR RESPONSE SELECTION IN RETRIEVAL-BASED DIALOGUE SYSTEMS
2301A MULTI-RESOLUTION LOW-RANK TENSOR DECOMPOSITION
2765A MULTISCALE GRADIENT-BACKPROPAGATION OPTIMIZATION FRAMEWORK FOR DEFORMABLE CONVOLUTION BASED COMPRESSED VIDEO ENHANCEMENT
2255A MULTI-TASK LEARNING FRAMEWORK FOR CHINESE MEDICAL PROCEDURE ENTITY NORMALIZATION
2198A MULTITASK LEARNING FRAMEWORK FOR SPEAKER CHANGE DETECTION WITH CONTENT INFORMATION FROM UNSUPERVISED SPEECH DECOMPOSITION
4053A MULTI-TASK LEARNING METHOD FOR WEAKLY SUPERVISED SOUND EVENT DETECTION
2105A MUTUAL LEARNING FRAMEWORK FOR FEW-SHOT SOUND EVENT DETECTION
1475A NEURAL NETWORK-BASED HOWLING DETECTION METHOD FOR REAL-TIME COMMUNICATION APPLICATIONS
2636A NEURAL PROSODY ENCODER FOR END-TO-END DIALOGUE ACT CLASSIFICATION
1028A NEW COPRIME-ARRAY-BASED CONFIGURATION WITH AUGMENTED DEGREES OF FREEDOM AND REDUCED MUTUAL COUPLING
3518A NEW DATA AUGMENTATION METHOD FOR INTENT CLASSIFICATION ENHANCEMENT AND ITS APPLICATION ON SPOKEN CONVERSATION DATASETS
5733A NEW DEEP LEARNING METHOD FOR MULTISPECTRAL IMAGE TIME SERIES COMPLETION USING HYPERSPECTRAL DATA
2305A NEW FRAMEWORK FOR MULTIPLE DEEP CORRELATION FILTERS BASED OBJECT TRACKING
1930A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
3967A NON-CONVEX PROXIMAL APPROACH FOR CENTROID-BASED CLASSIFICATION
1255A NON-HIERARCHICAL ATTENTION NETWORK WITH MODALITY DROPOUT FOR TEXTUAL RESPONSE GENERATION IN MULTIMODAL DIALOGUE SYSTEMS
3525A NONLINEAR STEERABLE COMPLEX WAVELET DECOMPOSITION OF IMAGES
4336A NOTE ON TOTALLY SYMMETRIC EQUI-ISOCLINIC TIGHT FUSION FRAMES
6126A NOVEL 1D STATE SPACE FOR EFFICIENT MUSIC RHYTHMIC ANALYSIS
3734A NOVEL ANGULAR ESTIMATION METHOD IN THE PRESENCE OF NONUNIFORM NOISE
3267A NOVEL CONVOLUTIONAL NEURAL NETWORK BASED ON ADAPTIVE MULTI-SCALE AGGREGATION AND BOUNDARY-AWARE FOR LATERAL VENTRICLE SEGMENTATION ON MR IMAGES
2051A NOVEL LIGHTWEIGHT NETWORK FOR FAST MONOCULAR DEPTH ESTIMATION
1645A NOVEL MICRO-EXPRESSION RECOGNITION APPROACH USING ATTENTION-BASED MAGNIFICATION-ADAPTIVE NETWORKS
1252A NOVEL NEGATIVE L1 PENALTY APPROACH FOR MULTIUSER ONE-BIT MASSIVE MIMO DOWNLINK WITH PSK SIGNALING
2573A NOVEL PART FEATURE INTEGRATION AND FUSION METHOD FOR FINE-GRAINED VEHICLE RECOGNITION
6301A NOVEL SEQUENTIAL MONTE CARLO FRAMEWORK FOR PREDICTING AMBIGUOUS EMOTION STATES
1470A NOVEL UNSUPERVISED AUTOENCODER-BASED HFOS DETECTOR IN INTRACRANIAL EEG SIGNALS
6378A PERFORMANCE ANALYSIS FOR MULTI-RIS-ASSISTED FULL DUPLEX WIRELESS COMMUNICATION SYSTEM
5669A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION
5237A PRIORI SNR ESTIMATION FOR SPEECH ENHANCEMENT BASED ON PESQ-INDUCED REINFORCEMENT LEARNING
3175A QUESTION-ORIENTED PROPAGATION NETWORK FOR NEWS READING COMPREHENSION
5298A REMEDY FOR DISTRIBUTIONAL SHIFTS THROUGH EXPECTED DOMAIN TRANSLATION
4180A ROBUST CONTRASTIVE ALIGNMENT METHOD FOR MULTI-DOMAIN TEXT CLASSIFICATION
2490A ROBUST DEEP AUDIO SPLICING DETECTION METHOD VIA SINGULARITY DETECTION FEATURE
3690A ROBUST OBJECT SEGMENTATION NETWORK FOR UNDERWATER SCENES
1949A SELF-SUPERVISED PRE-TRAINING FRAMEWORK FOR VISION-BASED SEIZURE CLASSIFICATION
1506A SEMI-HANDCRAFTED KEYPOINT DETECTOR WITH DISCRIMINATIVE FEATURE ENCODING
2990A set-theoretic approach to MIMO detection
5146A SIMPLE FORMULA FOR THE MOMENTS OF UNITARILY INVARIANT MATRIX DISTRIBUTIONS
3408A SIMPLE GRAPH NEURAL NETWORK VIA LAYER SNIFFER
4045A SIMPLE HYBRID FILTER PRUNING FOR EFFICIENT EDGE INFERENCE
4112A SLIDE-SAVE BASED FRAMEWORK FOR MULTI-SOURCE DOA EXTRACTION WITH SPATIAL CLOSELY SEPARATED SOURCES
2416A STIMULI-RELEVANT DIRECTED DEPENDENCY INDEX FOR TIME SERIES
5143A STUDY OF DESIGNING COMPACT AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM BASED ON ITERATIVE FINE-TUNING IN NEURAL NETWORK PRUNING
5429A STUDY OF THE ROBUSTNESS OF RAW WAVEFORM BASED SPEAKER EMBEDDINGS UNDER MISMATCHED CONDITIONS
2962A study on the efficacy of model pre-training in developing neural text-to-speech system
2308A STYLE TRANSFER MAPPING AND FINE-TUNING SUBJECT TRANSFER FRAMEWORK USING CONVOLUTIONAL NEURAL NETWORKS FOR SURFACE ELECTROMYOGRAM PATTERN RECOGNITION
2855A TEST FOR CONDITIONAL CORRELATION BETWEEN RANDOM VECTORS BASED ON WEIGHTED U-STATISTICS
2354A TIME DOMAIN PROGRESSIVE LEARNING APPROACH WITH SNR CONSTRICTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION
6234A TIME ENCODING APPROACH TO TRAINING SPIKING NEURAL NETWORKS
5618A TRAINABLE BOUNDED DENOISER USING DOUBLE TIGHT FRAME NETWORK FOR SNAPSHOT COMPRESSIVE IMAGING
4816A TRAINING FRAMEWORK FOR STEREO-AWARE SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS
3260A TRANSFER LEARNING APPROACH FOR PRONUNCIATION SCORING
3732A TWO-STAGE CONTRASTIVE LEARNING FRAMEWORK FOR IMBALANCED AERIAL SCENE RECOGNITION
1116A TWO-STAGE U-NET FOR HIGH-FIDELITY DENOISING OF HISTORICAL RECORDINGS
4081A TWO-STEP APPROACH TO LEVERAGE CONTEXTUAL DATA: SPEECH RECOGNITION IN AIR-TRAFFIC COMMUNICATION
5800A TWO-STEP BACKWARD COMPATIBLE FULLBAND SPEECH ENHANCEMENT SYSTEM
2608A TWO-STREAM INFORMATION FUSION APPROACH TO ABNORMAL EVENT DETECTION IN VIDEO
9165A unified two-stage model for separating superimposed images
1693A UNIVERSAL ORDINAL REGRESSION FOR ASSESSING PHONEME-LEVEL PRONUNCIATION
2182A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer
4370A WAVELET-BASED DUAL-STREAM NETWORK FOR UNDERWATER IMAGE ENHANCEMENT
2372AASIST: AUDIO ANTI-SPOOFING USING INTEGRATED SPECTRO-TEMPORAL GRAPH ATTENTION NETWORKS
4195ACCELERATED INTRAVASCULAR ULTRASOUND IMAGING USING DEEP REINFORCEMENT LEARNING
4297ACCELERATING ILL-CONDITIONED ROBUST LOW-RANK TENSOR REGRESSION
5867ACCESS CONTROL FOR PRIVACY-PRESERVING GAUSSIAN PROCESS REGRESSION
5360ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS
1029ACCURATE INSTANCE SEGMENTATION VIA COLLABORATIVE LEARNING
5620Accurate Multiscale Selective Fusion of CT and Video Images for Real-Time Endoscopic Camera 3D Tracking in Robotic Surgery
5149ACOUSTIC APPLICATION OF PHASE RECONSTRUCTION ALGORITHMS IN OPTICS
1473ACOUSTIC COMPARISON OF PHYSICAL VOCAL TRACT MODELS WITH HARD AND SOFT WALLS
4456ACOUSTIC IMAGING ABOARD THE INTERNATIONAL SPACE STATION (ISS): CHALLENGES AND PRELIMINARY RESULTS
3558ACOUSTIC-TO-ARTICULATORY INVERSION BASED ON SPEECH DECOMPOSITION AND AUXILIARY FEATURE
3746ACP: ADAPTIVE CHANNEL PRUNING FOR EFFICIENT NEURAL NETWORKS
8560Ada-JSR: SAMPLE EFFICIENT ADAPTIVE JOINT SUPPORT RECOVERY FROM EXTREMELY COMPRESSED MEASUREMENT VECTORS
3037AdaPID: An Adaptive PID Optimizer for Training Deep Neural Networks
4129ADAPTING SPEECH SEPARATION TO REAL-WORLD MEETINGS USING MIXTURE INVARIANT TRAINING
1162Adaptive Actor-Critic Bilateral Filter
3170ADAPTIVE ATTENTION GRAPH CAPSULE NETWORK
2870ADAPTIVE DIFFUSION WITH COMPRESSED COMMUNICATION
8878ADAPTIVE DISCOUNTING OF IMPLICIT LANGUAGE MODELS IN RNN-TRANSDUCERS
3157ADAPTIVE GROUP TESTING WITH MISMATCHED MODELS
1097ADAPTIVE IDENTIFICATION OF UNDERWATER ACOUSTIC CHANNEL WITH A MIX OF STATIC AND TIME-VARYING PARAMETERS
2013ADAPTIVE INTRA-GROUP AGGREGATION FOR CO-SALIENCY DETECTION
1342ADAPTIVE MATCHING STRATEGY FOR MULTI-TARGET MULTI-CAMERA TRACKING
4405ADAPTIVE NODE PARTICIPATION FOR STRAGGLER-RESILIENT FEDERATED LEARNING
2197Adaptive Pseudo Labeling for Source-Free Domain Adaptation in Medical Image Segmentation
9319Adaptive Rank Selection for Tensor Ring Decomposition
5736ADAPTIVE VARIATIONAL NONLINEAR CHIRP MODE DECOMPOSITION
1799ADAPTIVE WEIGHTED NETWORK WITH EDGE ENHANCEMENT MODULE FOR MONOCULAR SELF-SUPERVISED DEPTH ESTIMATION
5552ADAPTIVE WIRELESS POWER ALLOCATION WITH GRAPH NEURAL NETWORKS
4804ADA-STNET: A DYNAMIC ADABOOST SPATIO-TEMPORAL NETWORK FOR TRAFFIC FLOW PREDICTION
2889ADA-VAD: UNPAIRED ADVERSARIAL DOMAIN ADAPTATION FOR NOISE-ROBUST VOICE ACTIVITY DETECTION
1700ADDERIC: TOWARDS LOW COMPUTATION COST IMAGE COMPRESSION
3941ADIMA: ABUSE DETECTION IN MULTILINGUAL AUDIO
8803Adjacency Pairs-Aware Hierarchical Attention Networks for Dialogue Intent Classification
4154ADMM-DAD NET: A DEEP UNFOLDING NETWORK FOR ANALYSIS COMPRESSED SENSING
4989ADT: ANTI-DEEPFAKE TRANSFORMER
2666ADVANCING MOMENTUM PSEUDO-LABELING WITH CONFORMER AND INITIALIZATION STRATEGY
8031ADVERFACIAL: PRIVACY-PRESERVING UNIVERSAL ADVERSARIAL PERTURBATION AGAINST FACIAL MICRO-EXPRESSION LEAKAGES
5617ADVERSARIAL AUDIO SYNTHESIS USING A HARMONIC-PERCUSSIVE DISCRIMINATOR
4794ADVERSARIAL EXAMPLES DETECTION BASED ON ERROR LEVEL ANALYSIS AND SPACE MAPPING
2495ADVERSARIAL EXAMPLES FOR IMAGE CROPPING IN SOCIAL MEDIA
4923ADVERSARIAL INPUT ABLATION FOR AUDIO-VISUAL LEARNING
1250ADVERSARIAL LEARNING ENHANCEMENT FOR 3D HUMAN POSE AND SHAPE ESTIMATION
3077ADVERSARIAL LEARNING IN TRANSFORMER BASED NEURAL NETWORK IN RADIO SIGNAL CLASSIFICATION
7814ADVERSARIAL LINEAR QUADRATIC REGULATOR UNDER FALSIFIED ACTIONS
3301ADVERSARIAL MASK TRANSFORMER FOR SEQUENTIAL LEARNING
1600ADVERSARIAL ROBUSTNESS BY DESIGN THROUGH ANALOG COMPUTING AND SYNTHETIC GRADIENTS
3316Adversarial sample detection for speaker verification by neural vocoders
9296Adversarially-Trained Nonnegative Matrix Factorization
2258ADVERSARY DISTILLATION FOR ONE-SHOT ATTACKS ON 3D TARGET TRACKING
5063ADVERSPARSE: AN ADVERSARIAL ATTACK FRAMEWORK FOR DEEP SPATIAL-TEMPORAL GRAPH NEURAL NETWORKS
4830ADVIN: AUTOMATICALLY DISCOVERING NOVEL DOMAINS AND INTENTS FROM USER TEXT UTTERANCES
3758AECMOS: A SPEECH QUALITY ASSESSMENT METRIC FOR ECHO IMPAIRMENT
1269AERIAL BASE STATION PLACEMENT LEVERAGING RADIO TOMOGRAPHIC MAPS
2336AGCYCLEGAN: ATTENTION-GUIDED CYCLEGAN FOR SINGLE UNDERWATER IMAGE RESTORATION
2635AIMNET: ADAPTIVE IMAGE-TAG MERGING NETWORK FOR AUTOMATIC MEDICAL REPORT GENERATION
1725Airborne MIMO Radar Transmit-Receive Design Under Spectral Constraint in Signal-Dependent Clutter
5090AISHELL-NER: NAMED ENTITY RECOGNITION FROM CHINESE SPEECH
1825ALARM SOUND DETECTION USING TOPOLOGICAL SIGNAL PROCESSING
5537Alignment-Learning based single-step decoding for accurate and fast non-autoregressive speech recognition
6330Alleviating the Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement
2036ALL-NEURAL BEAMFORMER FOR CONTINUOUS SPEECH SEPARATION
8762ALSNET: A DILATED 1-D CNN FOR IDENTIFYING ALS FROM RAW EMG SIGNAL
8938AMBIGUITY MODELLING WITH LABEL DISTRIBUTION LEARNING FOR MUSIC CLASSIFICATION
1619AMICABLE EXAMPLES FOR INFORMED SOURCE SEPARATION
3341AN ACCELERATED RANK-(L,L,1,1) BLOCK TERM DECOMPOSITION OF MULTI-SUBJECT FMRI DATA UNDER SPATIAL ORTHONORMALITY CONSTRAINT
3766AN ADAPTER BASED PRE-TRAINING FOR EFFICIENT AND SCALABLE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING
9257AN ADAPTIVE ALL-PASS FILTER FOR TIME-VARYING DELAY ESTIMATION
3529AN ADAPTIVE ORIENTATIONAL BEAMFORMING TECHNIQUE FOR NARROWBAND INTERFERENCE REJECTION
8976An Anomaly Detection Method Based on Self-supervised Learning With Soft Label Assignment for Defect Visual Inspection
3965AN APPROACH TO MISPRONUNCIATION DETECTION AND DIAGNOSIS WITH ACOUSTIC, PHONETIC AND LINGUISTIC (APL) EMBEDDINGS
3795AN ASYMPTOTICALLY OPTIMAL APPROXIMATION OF THE CONDITIONAL MEAN CHANNEL ESTIMATOR BASED ON GAUSSIAN MIXTURE MODELS
3343AN AUDIO-SALIENCY MASKING TRANSFORMER FOR AUDIO EMOTION CLASSIFICATION IN MOVIEs
3431An effective steganalysis for robust steganography with repetitive JPEG compression
1384AN EFFICIENT DP-SGD MECHANISM FOR LARGE SCALE NLU MODELS
2067An Efficient Framework for Detection and Recognition of Numerical Traffic Signs
3116AN EFFICIENT METHOD FOR GENERIC DSP IMPLEMENTATION OF DILATED CONVOLUTION
4210AN EFFICIENT METHOD FOR MODEL PRUNING USING KNOWLEDGE DISTILLATION WITH FEW SAMPLES
1469An Embarrassingly Simple Model for Dialogue Relation Extraction
4096AN END-TO-END CHINESE TEXT NORMALIZATION MODEL BASED ON RULE-GUIDED FLAT-LATTICE TRANSFORMER
4715AN END-TO-END DEEP LEARNING FRAMEWORK FOR MULTIPLE AUDIO SOURCE SEPARATION AND LOCALIZATION
6604AN END-TO-END DEEP LEARNING SPEECH CODING AND DENOISING STRATEGY FOR COCHLEAR IMPLANTS
4222AN ENHANCED DEEP LEARNING APPROACH FOR TECTONIC FAULT AND FRACTURE EXTRACTION IN VERY HIGH RESOLUTION OPTICAL IMAGES
5421AN ERROR CORRECTION SCHEME FOR IMPROVED AIR-TISSUE BOUNDARY IN REAL-TIME MRI VIDEO FOR SPEECH PRODUCTION
4905An Experimental Study on Transferring Data-driven Image Compressive Sensing to Bioelectric Signal
5048AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERION
9130AN IMPLICIT GRADIENT-TYPE METHOD FOR LINEARLY CONSTRAINED BILEVEL PROBLEMS
2257AN INFORMATION MAXIMIZATION BASED BLIND SOURCE SEPARATION APPROACH FOR DEPENDENT AND INDEPENDENT SOURCES
4942AN INVESTIGATION OF STREAMING NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION
4236AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION
2678AN ONLINE THROUGHPUT MAXIMIZATION ALGORITHM FOR GREEN COORDINATED MULTI-POINT SYSTEMS
2887AN OVERVIEW OF THE FIRST ICASSP SPECIAL SESSION ON COMPUTER AUDITION FOR HEALTHCARE
4366ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION
7179ANNIHILATION FILTER APPROACH FOR ESTIMATING GRAPH DYNAMICS FROM DIFFUSION PROCESSES
2025ANNO-MI: A DATASET OF EXPERT-ANNOTATED COUNSELLING DIALOGUES
4773ANOMALOUS SOUND DETECTION USING SPECTRAL-TEMPORAL INFORMATION FUSION
1735A-PIXELHOP: A GREEN, ROBUST AND EXPLAINABLE FAKE-IMAGE DETECTOR
5147APPLADE: ADJUSTABLE PLUG-AND-PLAY AUDIO DECLIPPER COMBINING DNN WITH SPARSE OPTIMIZATION
3960APPLYING DEEP LEARNING TO KNOWN-PLAINTEXT ATTACK ON CHAOTIC IMAGE ENCRYPTION SCHEMES
1362APPLYING DIFFERENTIAL PRIVACY TO TENSOR COMPLETION
8999APPROACHES TOWARD PHYSICAL AND GENERAL VIDEO ANOMALY DETECTION
8775APPROXIMATING THE LIKELIHOOD RATIO IN LINEAR-GAUSSIAN STATE-SPACE MODELS FOR CHANGE DETECTION
5094ARCHITECTURE FOR VARIABLE BITRATE NEURAL SPEECH CODEC WITH CONFIGURABLE COMPUTATION COMPLEXITY
9141Are GAN-based Morphs Threatening Face Recognition?
8824ARM 4-BIT PQ: SIMD-BASED ACCELERATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH ON ARM
4699ASD-TRANSFORMER: EFFICIENT ACTIVE SPEAKER DETECTION USING SELF AND MULTIMODAL TRANSFORMERS
9115ASR ERROR CORRECTION WITH DUAL-CHANNEL SELF-SUPERVISED LEARNING
2604ASR-AWARE END-TO-END NEURAL DIARIZATION
2733ASSEM-VC: REALISTIC VOICE CONVERSION BY ASSEMBLING MODERN SPEECH SYNTHESIS TECHNIQUES
3528ATOMIC NORM BASED LOCALIZATION AND ORIENTATION ESTIMATION FOR MILLIMETER-WAVE MIMO OFDM SYSTEMS
4270ATTACHMENT RECOGNITION IN SCHOOL-AGE CHILDREN: A MULTIMODAL APPROACH BASED ON LANGUAGE AND PARALANGUAGE ANALYSIS
2742ATTENTION BACK-END FOR AUTOMATIC SPEAKER VERIFICATION WITH MULTIPLE ENROLLMENT UTTERANCES
3548ATTENTION GUIDED INVARIANCE SELECTION FOR LOCAL FEATURE DESCRIPTORS
3671ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD
2507Attentional Gated Res2Net for Multivariate Time Series Classification
1594ATTENTION-BASED ADVERSARIAL PARTIAL DOMAIN ADAPTATION
2863ATTENTION-BASED DUAL-STREAM VISION TRANSFORMER FOR RADAR GAIT RECOGNITION
4568ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN
3553AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism
1160ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION
1668ATTENUATION OF ACOUSTIC EARLY REFLECTIONS IN TELEVISION STUDIOS USING PRETRAINED SPEECH SYNTHESIS NEURAL NETWORK
6345ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS
1833Attribute-conditioned Face swapping Network for Low-Resolution images
4333AUDIO PEAK REDUCTION USING A SYNCED ALLPASS FILTER
9244Audio scene monitoring using redundant ad-hoc microphone arrays
3167AUDIO SIGNAL PROCESSING FOR TELEPRESENCE BASED ON WEARABLE ARRAY IN NOISY AND DYNAMIC SCENES
8362AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO
3404AUDIO-TEXT RETRIEVAL IN CONTEXT
1472Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning
2062AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
4177AUDIO-VISUAL SCENE-AWARE DIALOG AND REASONING USING AUDIO-VISUAL TRANSFORMERS WITH JOINT STUDENT-TEACHER LEARNING
4694Audio-Visual Tracking of Multiple Speakers via a PMBM Filter
4580AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
3256AUGMENTATION STRATEGY OPTIMIZATION FOR LANGUAGE UNDERSTANDING
4439AUGMENTING MOLECULAR DEEP GENERATIVE MODELS WITH TOPOLOGICAL DATA ANALYSIS REPRESENTATIONS
1906Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
4395AUTOMATED PROSODY CLASSIFICATION FOR ORAL READING FLUENCY WITH QUADRATIC KAPPA LOSS AND ATTENTIVE X-VECTORS
2530Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors
3510AUTOMATIC DEPRESSION DETECTION: AN EMOTIONAL AUDIO-TEXTUAL CORPUS AND A GRU/BILSTM-BASED MODEL
2692AUTOMATIC DEPRESSION LEVEL ASSESSMENT FROM SPEECH BY LONG-TERM GLOBAL INFORMATION EMBEDDING
5365AUTOMATIC DJ TRANSITIONS WITH DIFFERENTIABLE AUDIO EFFECTS AND GENERATIVE ADVERSARIAL NETWORKS
6099AUTOMATIC RESPIRATORY SOUND CLASSIFICATION VIA MULTI-BRANCH TEMPORAL CONVOLUTIONAL NETWORK
8917AUTOREGRESSIVE VARIATIONAL AUTOENCODER WITH A HIDDEN SEMI-MARKOV MODEL-BASED STRUCTURED ATTENTION FOR SPEECH SYNTHESIS
4629AuxFormer: Robust Approach to Audiovisual Emotion Recognition
5415AUXILIARY LOSS OF TRANSFORMER WITH RESIDUAL CONNECTION FOR END-TO-END SPEAKER DIARIZATION
5046AVQVC: One-shot Voice Conversion by Vector Quantization with Applying Contrastive Learning
3961AXONAL DELAY AS A SHORT-TERM MEMORY FOR FEED FORWARD DEEP SPIKING NEURAL NETWORKS
9008BALANCED RANKING AND SORTING FOR CLASS INCREMENTAL OBJECT DETECTION
8536BALANCED STRIPE-WISE PRUNING IN THE FILTER
3958BAYESIAN CONTINUAL IMPUTATION AND PREDICTION FOR IRREGULARLY SAMPLED TIME SERIES DATA
9231BAYESIAN POPT-MODEL-SELECTION ESTIMATION
4144BEING GREEDY DOES NOT HURT: SAMPLING STRATEGIES FOR END-TO-END SPEECH RECOGNITION
2653BEST OF BOTH WORLDS: MULTI-TASK AUDIO-VISUAL AUTOMATIC SPEECH RECOGNITION AND ACTIVE SPEAKER DETECTION
5056BI-DIRECTIONAL MODALITY FUSION NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
5990BI-DIRECTIONAL NORMALIZATION AND COLOR ATTENTION-GUIDED GENERATIVE ADVERSARIAL NETWORK FOR IMAGE ENHANCEMENT
3318BILEVEL LEARNING OF L1 REGULARIZERS WITH CLOSED-FORM GRADIENTS (BLORC)
2591BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS
2092BINARY DENSE PREDICTORS FOR HUMAN POSE ESTIMATION BASED ON DYNAMIC THRESHOLDS AND FILTERING
9316BINAURAL REPRODUCTION BASED ON BILATERAL AMBISONICS AND EAR-ALIGNED HRTFS
1444BIP-NET: BIDIRECTIONAL PERSPECTIVE STRATEGY BASED ARBITRARY-SHAPED TEXT DETECTION NETWORK
3011BLIND EQUALIZATION OF MOVING AVERAGE CHANNELS OVER GALOIS FIELDS
4317BLIND EXTRACTION OF EQUITABLE PARTITIONS FROM GRAPH SIGNALS
9313BLIND LOCALIZATION OF EARLY ROOM REFLECTIONS USING PHASE ALIGNED SPATIAL CORRELATION
2589BLIND MODULO ANALOG-TO-DIGITAL CONVERSION OF VECTOR PROCESSES
2861BLIND REVERBERATION TIME ESTIMATION IN DYNAMIC ACOUSTIC CONDITIONS
2570BLIND SEPARATION OF LINEAR-QUADRATIC MIXTURES OF MUTUALLY INDEPENDENT AND AUTOCORRELATED SOURCES
6903BLIND SOURCE SEPARATION VIA A WEAK EXCLUSION PRINCIPLE
4390BLIND UNMIXING USING A DOUBLE DEEP IMAGE PRIOR
1033BLOCK-ACTIVATED ALGORITHMS FOR MULTICOMPONENT FULLY NONSMOOTH MINIMIZATION
1294BLOCK-COORDINATE FRANK-WOLFE ALGORITHM AND CONVERGENCE ANALYSIS FOR SEMI-RELAXED OPTIMAL TRANSPORT PROBLEM
8691BLOCK-SPARSE ADVERSARIAL ATTACK TO FOOL TRANSFORMER-BASED TEXT CLASSIFIERS
8737BLOOM-NET: BLOCKWISE OPTIMIZATION FOR MASKING NETWORKS TOWARD SCALABLE AND EFFICIENT SPEECH ENHANCEMENT
3563BNU: A BALANCE-NORMALIZATION-UNCERTAINTY MODEL FOR INCREMENTAL EVENT DETECTION
3770BONA FIDE RIESZ PROJECTIONS FOR DENSITY ESTIMATION
9029Boost Ensemble Learning for Classification of CTG Signals
3556BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL
2483BOUNDED SIMPLEX-STRUCTURED MATRIX FACTORIZATION
3912BOUNDING BOX DISTRIBUTION LEARNING AND CENTER POINT CALIBRATION FOR ROBUST VISUAL TRACKING
1923BSOLO: BOUNDARY-AWARE ONE-STAGE INSTANCE SEGMENTATION SOLO
4709BUILDING ROBUST SPOKEN LANGUAGE UNDERSTANDING BY CROSS ATTENTION BETWEEN PHONEME SEQUENCE AND ASR HYPOTHESIS
1265BUNDLE ICP WITH VIRTUAL DEPTH FOR HAND-HELD 3D SCANNER
8734BYTECOVER2: TOWARDS DIMENSIONALITY REDUCTION OF LATENT EMBEDDING FOR EFFICIENT COVER SONG IDENTIFICATION
2273BYZANTINE-RESILIENT DECENTRALIZED COLLABORATIVE LEARNING
2756Byzantine-resilient Decentralized Resource Allocation
1439Byzantine-Robust Aggregation with Gradient Difference Compression and Stochastic Variance Reduction for Federated Learning
1697BYZANTINE-ROBUST AND COMMUNICATION-EFFICIENT DISTRIBUTED NON-CONVEX LEARNING OVER NON-IID DATA
4074BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
9294CAA-NET: CONDITIONAL ATROUS CNNS WITH ATTENTION FOR EXPLAINABLE DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION
5784CACHE: MODELING CONTRIBUTION-AWARE CONTEXT HIERARCHICALLY FOR LONG-RANGE DIALOGUE STATE TRACKING
4464CACHING NETWORKS: CAPITALIZING ON COMMON SPEECH FOR ASR
4246CALL-SIGN RECOGNITION AND UNDERSTANDING FOR NOISY AIR-TRAFFIC TRANSCRIPTS USING SURVEILLANCE INFORMATION
3448Camera Calibration through Camera Projection Loss
3893CAN AUDIO CAPTIONS BE EVALUATED WITH IMAGE CAPTION METRICS?
1310CAPITALIZATION NORMALIZATION FOR LANGUAGE MODELING WITH AN ACCURATE AND EFFICIENT HIERARCHICAL RNN MODEL
1972CARINA – A CORPUS OF ALIGNED GERMAN READ SPEECH INCLUDING ANNOTATIONS
4499CASCADE MULTI-CHANNEL NOISE REDUCTION AND ACOUSTIC FEEDBACK CANCELLATION
1044CASCADING BANDIT UNDER DIFFERENTIAL PRIVACY
1075CATEGORY-ADAPTED SOUND EVENT ENHANCEMENT WITH WEAKLY LABELED DATA
8856Category-Adaptive Domain Adaptation for Semantic Segmentation
3207CAUSAL LINEAR TOPOLOGICAL FILTERS OVER A 2-SIMPLEX
5689CDMA: CROSS-DOMAIN DISTANCE METRIC ADAPTATION FOR SPEAKER VERIFICATION
5996CDX-Net: Cross-Domain Multi-Feature Fusion Modeling via Deep Neural Networks for Multivariate Time Series Forecasting in AIOps
8301CELL-FREE MASSIVE MIMO: EXPLOITING THE WAX DECOMPOSITION
2873CF-NET: COMPLEMENTARY FUSION NETWORK FOR ROTATION INVARIANT POINT CLOUD COMPLETION
2291CHANNEL REDUNDANCY AND OVERLAP IN CONVOLUTIONAL NEURAL NETWORKS WITH CHANNEL-WISE NNK GRAPHS
3287Characterizing the adversarial vulnerability of speech self-supervised learning
8300CHINESE SPELLING TEXT GENERATION OF MATHEMATICAL FORMULAS
2329CHUNKFUSION: A LEARNING-BASED RGB-D 3D RECONSTRUCTION FRAMEWORK VIA CHUNK-WISE INTEGRATION
8912CLASSICAL-TO-QUANTUM TRANSFER LEARNING FOR SPOKEN COMMAND RECOGNITION BASED ON QUANTUM NEURAL NETWORKS
2773CLIMATE AND WEATHER: INSPECTING DEPRESSION DETECTION VIA EMOTION RECOGNITION
5907CLIPCAM: A Simple Baseline for Zero-shot Text-guided Object and Action Localization
2918Cloning one's voice using very limited data in the wild
3810Closed-form single source direction-of-arrival estimator using first-order relative harmonic coefficients
4368CLOSING THE SIM-TO-REAL GAP IN GUIDED WAVE DAMAGE DETECTION WITH ADVERSARIAL TRAINING OF VARIATIONAL AUTO-ENCODERS
4123CLSEG: Contrastive Learning of Story Ending Generation
4870CLUSTERING AND SEPARATING SIMILARITIES FOR DEEP UNSUPERVISED HASHING
5968CLUSTERING COMPLEX SUBSPACES IN LARGE DIMENSIONS
3359cMRI2SPEC: Cine MRI Sequence to Spectrogram Synthesis via a Pairwise Heterogeneous Translator
3225CNN-AIDED FACTOR GRAPHS WITH ESTIMATED MUTUAL INFORMATION FEATURES FOR SEIZURE DETECTION
1957CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION
6003COARRAY MANIFOLD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN FOR ENHANCED SOURCE LOCALIZATION
2090COARSE-TO-FINE UNSUPERVISED CHANGE DETECTION FOR REMOTE SENSING IMAGES VIA OBJECT-BASED MRF AND INCEPTION UNET
1271CO-ATTENTION-GUIDED BILINEAR MODEL FOR ECHO-BASED DEPTH ESTIMATION
9270Cognitive Antenna Selection for Automotive Radar Using Bobrovsky-Zakai Bound
4847COGNITIVE CODING OF SPEECH
2609COLLABORATIVE OBJECT DETECTORS ADAPTIVE TO BANDWIDTH AND COMPUTATION
2134Combating False Sense of Security: Breaking the Defense of Adversarial Training via Non-Gradient Adversarial Attack
1465COMBINING MULTIPLE STYLE TRANSFER NETWORKS AND TRANSFER LEARNING FOR LGE-CMR SEGMENTATION
4891COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION
1393Communication-Efficient Distributed MAX-VAR Generalized CCA via Error Feedback-Assisted Quantization
2468Communication-Efficient Online Federated Learning Framework for Nonlinear Regression
8629COMPARISON OF BOUNDARY ARTIFACT REMOVAL METHODS IN CODING OF GENERALIZED CUBEMAP PROJECTION USING VVC
2294COMPETITIVE MULTI-AGENT REINFORCEMENT LEARNING WITH SELF-SUPERVISED REPRESENTATION
3597COMPLEX IRM-AWARE TRAINING FOR VOICE ACTIVITY DETECTION USING ATTENTION MODEL
1989COMPLEX-VALUED SPATIAL AUTOENCODERS FOR MULTICHANNEL SPEECH ENHANCEMENT
2707COMPOSING GRAPHICAL MODELS WITH GENERATIVE ADVERSARIAL NETWORKS FOR EEG SIGNAL MODELING
3931COMPRESSED DATA SHARING BASED ON INFORMATION BOTTLENECK MODEL
9304Compressed Super-Resolution of Positive Sources
2769Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation
4214COMPRESSION-AWARE PROJECTION WITH GREEDY DIMENSION REDUCTION FOR CONVOLUTIONAL NEURAL NETWORK ACTIVATIONS
8776COMPRESSIVE PHASE RETRIEVAL BASED ON SPARSE LATENT GENERATIVE PRIORS
2566Compressive Scanning Transmission Electron Microscopy
8837COMPUTATIONALLY EFFICIENT FIXED-FILTER ANC FOR SPEECH BASED ON LONG-TERM PREDICTION FOR HEADPHONE APPLICATIONS
4849CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT
5714CONDITIONALLY FACTORIZED VARIATIONAL BAYES WITH IMPORTANCE SAMPLING
1959ConeFace: Approximate Pairwise Loss for Face Recognition
3389CONFIDENCE ESTIMATION FOR SPEECH EMOTION RECOGNITION BASED ON THE RELATIONSHIP BETWEEN EMOTION CATEGORIES AND PRIMITIVES
3985CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION
4938CONFORMER-BASED HYBRID ASR SYSTEM FOR SWITCHBOARD DATASET
3268CONFORMER-BASED SELF-SUPERVISED LEARNING FOR NON-SPEECH AUDIO TASKS
6843CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTRÖM ATTENTION AND ROTARY POSITION EMBEDDING
1458Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array
1425CONNECTING TARGETS VIA LATENT TOPICS AND CONTRASTIVE LEARNING: A UNIFIED FRAMEWORK FOR ROBUST ZERO-SHOT AND FEW-SHOT STANCE DETECTION
1263Considering user agreement in learning to predict the aesthetic quality
1673CONSISTENT TRAINING AND DECODING FOR END-TO-END SPEECH RECOGNITION USING LATTICE-FREE MMI
9184CONSTANT Q CEPSTRAL COEFFICIENTS FOR CLASSIFICATION OF NORMAL VS. PATHOLOGICAL INFANT CRY
4513CONTENT PRESERVING SCALE SPACE NETWORK FOR FAST IMAGE RESTORATION FROM NOISY-BLURRY PAIRS
2444CONTEXT MODELING WITH EVIDENCE FILTER FOR MULTIPLE CHOICE QUESTION ANSWERING
3137Context-Adaptive Document-Level Neural Machine Translation
4853CONTEXT-AWARE GRAPH-BASED SELF-SUPERVISED LEARNING OF WHOLE SLIDE IMAGES
3040CONTEXT-AWARE MASK PREDICTION NETWORK FOR END-TO-END TEXT-BASED SPEECH EDITING
4480CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS
4903Continual learning using lattice-free MMI for speech recognition
5163CONTINUAL SELF-TRAINING WITH BOOTSTRAPPED REMIXING FOR SPEECH ENHANCEMENT
4434CONTINUOUS SPEECH SEPARATION WITH RECURRENT SELECTIVE ATTENTION NETWORK
1193CONTINUOUS STREAMING MULTI-TALKER ASR WITH DUAL-PATH TRANSDUCERS
2265CONTRASTIVE HEARTBEATS: CONTRASTIVE LEARNING FOR SELF-SUPERVISED ECG REPRESENTATION AND PHENOTYPING
1127CONTRASTIVE KNOWLEDGE GRAPH ATTENTION NETWORK FOR REQUEST-BASED RECIPE RECOMMENDATION
7416CONTRASTIVE PREDICTION STRATEGIES FOR UNSUPERVISED SEGMENTATION AND CATEGORIZATION OF PHONEMES AND WORDS
2448CONTRASTIVE PREDICTIVE CODING FOR ANOMALY DETECTION OF FETAL HEALTH FROM THE CARDIOTOCOGRAM
3130CONTRASTIVE SENSOR TRANSFORMER FOR PREDICTIVE MAINTENANCE OF INDUSTRIAL ASSETS
4575CONTRASTIVE SIAMESE NETWORK FOR SEMI-SUPERVISED SPEECH RECOGNITION
9036CONTRASTIVE TRANSLATION LEARNING FOR MEDICAL IMAGE SEGMENTATION
2669Contrastive-Mixup Learning for Improved Speaker Verification
5160CONTROLLABLE SPEECH REPRESENTATION LEARNING VIA VOICE CONVERSION AND AIC LOSS
8866CONTROLLED SENSING AND ANOMALY DETECTION VIA SOFT ACTOR-CRITIC REINFORCEMENT LEARNING
2029CONTROLLING SMART PROPAGATION ENVIRONMENTS: LONG-TERM VERSUS SHORT-TERM PHASE SHIFT OPTIMIZATION
3300CONTROLLING THE FRÉCHET VARIANCE IMPROVES BATCH NORMALIZATION ON THE SYMMETRIC POSITIVE DEFINITE MANIFOLD
4029Conversational Speech Recognition by Learning Conversation-level Characteristics
2797CONVEX CLUSTERING FOR AUTOCORRELATED TIME SERIES
1546CONVMIXER: FEATURE INTERACTIVE CONVOLUTION WITH CURRICULUM LEARNING FOR SMALL FOOTPRINT AND NOISY FAR-FIELD KEYWORD SPOTTING
4539CONVOLUATIONAL TRANSFORMER WITH ADAPTIVE POSITION EMBEDDING FOR COVID-19 DETECTION FROM COUGH SOUNDS
2171CONVOLUTIONAL BEAMSPACE USING IIR FILTERS
7095CONVOLUTIONAL FILTERING IN SIMPLICIAL COMPLEXES
1510CONVOLUTIONAL ISTA NETWORK WITH TEMPORAL CONSISTENCY CONSTRAINTS FOR VIDEO RECONSTRUCTION FROM EVENT CAMERAS
4750CONVOLUTIONAL WEIGHTED MINIMUM MEAN SQUARE ERROR FILTER FOR JOINT SOURCE SEPARATION AND DEREVERBERATION
4967COUGHTRIGGER: EARBUDS IMU BASED COUGH DETECTION ACTIVATOR USING AN ENERGY-EFFICIENT SENSITIVITY-PRIORITIZED TIME SERIES CLASSIFIER
2001Counting the number of different scaling exponents in multivariate scale-free dynamics: Clustering by bootstrap in the wavelet domain
1345COUPLED FEATURE LEARNING VIA STRUCTURED CONVOLUTIONAL SPARSE CODING FOR MULTIMODAL IMAGE FUSION
3873CPD computation via recursive eigenspace decompositions
4892CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION
4790CRAMER-RAO BOUND ANALYSIS OF DISTRIBUTED DOA ESTIMATION EXPLOITING MIXED-PRECISION COVARIANCE MATRIX
5044CRAMÉR-RAO BOUND AND ANTENNA SELECTION OPTIMIZATION FOR DUAL RADAR-COMMUNICATION DESIGN
9246CRAMÉR-RAO BOUND FOR ESTIMATION AFTER MODEL SELECTION AND ITS APPLICATION TO SPARSE VECTOR ESTIMATION
2228CRAMER-RAO BOUND FOR THE TIME-VARYING POISSON
9230CROSS-CORPUS SPEECH EMOTION RECOGNITION BASED ON FEW-SHOT LEARNING AND DOMAIN ADAPTATION
1756CROSS-DOMAIN FEW-SHOT LEARNING FOR RARE-DISEASE SKIN LESION SEGMENTATION
3286CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE
9264CROSS-EPOCH LEARNING FOR WEAKLY SUPERVISED ANOMALY DETECTION IN SURVEILLANCE VIDEOS
5781CROSS-LAYER AGGREGATION WITH TRANSFORMERS FOR MULTI-LABEL IMAGE CLASSIFICATION
3230CROSS-MODAL KNOWLEDGE DISTILLATION FOR VISION-TO-SENSOR ACTION RECOGNITION
3942CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION
2971Cross-speaker style transfer for text-to-speech using data augmentation
4179CROSS-TARGET STANCE DETECTION VIA REFINED META-LEARNING
1363CRPN: DISTINGUISH NOVEL CATEGORIES VIA CLASS-RELEVANT REGION PROPOSAL NETWORK FOR FEW-SHOT OBJECT DETECTION
1432CSENET: COMPLEX SQUEEZE-AND-EXCITATION NETWORK FOR SPEECH DEPRESSION LEVEL PREDICTION
2177CS-GRESNET: A SIMPLE AND HIGHLY EFFICIENT NETWORK FOR FACIAL EXPRESSION RECOGNITION
3645CSI CLUSTERING WITH VARIATIONAL AUTOENCODING
2279CS-REP: MAKING SPEAKER VERIFICATION NETWORKS EMBRACING RE-PARAMETERIZATION
3337CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
4793CUSTOM ATTRIBUTION LOSS FOR IMPROVING GENERALIZATION AND INTERPRETABILITY OF DEEPFAKE DETECTION
8991CUSTOMER SATISFACTION ESTIMATION USING UNSUPERVISED REPRESENTATION LEARNING WITH MULTI-FORMAT PREDICTION LOSS
4103CUSTOMIZABLE END-TO-END OPTIMIZATION OF ONLINE NEURAL NETWORK-SUPPORTED DEREVERBERATION FOR HEARING DEVICES
1636Cut and Continuous Paste Towards Real-time Deep Fall Detection
6461CYBER-THREAT PROPAGATION OVER NETWORK-SLICING ARCHITECTURES
5717DAM-GAN : IMAGE INPAINTING USING DYNAMIC ATTENTION MAP BASED ON FAKE TEXTURE DETECTION
2204Data Agnostic Filter Gating for Efficient Deep Networks
2574DATA AUGMENTATION FOR LONG-TAILED AND IMBALANCED POLYPHONE DISAMBIGUATION IN MANDARIN
2046DATA EFFICIENT SUPPORT VECTOR MACHINE TRAINING USING THE MINIMUM DESCRIPTION LENGTH PRINCIPLE
3765DATA INCUBATION — SYNTHESIZING MISSING DATA FOR HANDWRITING RECOGNITION
8358Data Shapley Value for Handling Noisy Labels: An application in Screening COVID-19 Pneumonia from Chest CT Scans
3521DATA-DRIVEN ALGORITHMS FOR GAUSSIAN MEASUREMENT MATRIX DESIGN IN COMPRESSIVE SENSING
2243DATA-DRIVEN APPROACH FOR THE FLOQUET PROPAGATOR INVERSE PROBLEM SOLUTION
3911Data-driven Optimization for Zero-delay Lossy Source Coding with Side Information
1618DATA-DRIVEN SPATIALLY DEPENDENT PDE IDENTIFICATION
1772DCNGAN: A DEFORMABLE CONVOLUTION-BASED GAN WITH QP ADAPTATION FOR PERCEPTUAL QUALITY ENHANCEMENT OF COMPRESSED VIDEO
3699DCSN: Deformable Convolutional Semantic Segmentation Neural Network for Non-Rigid Scenes
3345DECENTRALIZED BILEVEL OPTIMIZATION FOR PERSONALIZED CLIENT LEARNING
1267DECENTRALIZED LEARNING IN THE PRESENCE OF LOW-RANK NOISE
5111DEEP ACTOR-CRITIC FOR CONTINUOUS 3D MOTION CONTROL IN MOBILE RELAY BEAMFORMING NETWORKS
2915DEEP ADAPTATION CONTROL FOR ACOUSTIC ECHO CANCELLATION
4526DEEP ADAPTIVE AEC: HYBRID OF DEEP LEARNING AND ADAPTIVE ACOUSTIC ECHO CANCELLATION
1254DEEP AUGMENTED MUSIC ALGORITHM FOR DATA-DRIVEN DOA ESTIMATION
9307Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation
4587DEEP DETERMINISTIC INDEPENDENT COMPONENT ANALYSIS FOR HYPERSPECTRAL UNMIXING
4514DEEP HASHING WITH HASH CENTER UPDATE FOR EFFICIENT IMAGE RETRIEVAL
3265DEEP IMPULSE RESPONSES: ESTIMATING AND PARAMETERIZING FILTERS WITH DEEP NETWORKS
3250DEEP INITIALIZATION FOR GUARANTEED UNIMODULAR QUADRATIC PROGRAMMING
4369DEEP ITERATIVE PHASE RETRIEVAL FOR PTYCHOGRAPHY
3329DEEP JOINT SOURCE-CHANNEL CODING FOR WIRELESS IMAGE TRANSMISSION WITH ADAPTIVE RATE CONTROL
5062DEEP KERNEL LEARNING NETWORKS WITH MULTIPLE LEARNING PATHS
8829DEEP LEARNING BASED OFF-ANGLE IRIS RECOGNITION
3438DEEP LEARNING BASED PASSIVE BEAMFORMING FOR IRS-ASSISTED MONOSTATIC BACKSCATTER SYSTEMS
1305DEEP LEARNING FOR LOCATION BASED BEAMFORMING WITH NLOS CHANNELS
4443DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN'S READ SPEECH
4623DEEP LEARNING ON THE SPHERE FOR MULTI-MODEL ENSEMBLING OF SIGNIFICANT WAVE HEIGHT
6298Deep Markov Clustering For Panoptic Segmentation
3773DEEP NEURAL NETWORK (DNN) AUDIO CODER USING A PERCEPTUALLY IMPROVED TRAINING METHOD
1375DEEP OBJECT DETECTION WITH EXAMPLE ATTRIBUTE BASED PREDICTION MODULATION
2805DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS
1676DEEP PIECEWISE HASHING FOR EFFICIENT HAMMING SPACE RETRIEVAL
9158DEEP PROXIMAL UNFOLDING FOR IMAGE RECOVERY FROM UNDER-SAMPLED CHANNEL DATA IN INTRAVASCULAR ULTRASOUND
3697DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL
2900DEEP RESIDUAL ECHO SUPPRESSION AND NOISE REDUCTION: A MULTI-INPUT FCRN APPROACH IN A HYBRID SPEECH ENHANCEMENT SYSTEM
2785DEEP SCALE-AWARE IMAGE SMOOTHING
4689DEEP SEQUENTIAL BEAMFORMER LEARNING FOR MULTIPATH CHANNELS IN MMWAVE COMMUNICATION SYSTEMS
1424Deep Spatio-Temporal Wind Power Forecasting
9032DEEP TEMPORAL INTERPOLATION OF RADAR-BASED PRECIPITATION
3107DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISION
3677DEEP VIDEO INPAINTING LOCALIZATION USING SPATIAL AND TEMPORAL TRACES
2132DEEPCHORUS: A HYBRID MODEL OF MULTI-SCALE CONVOLUTION AND SELF-ATTENTION FOR CHORUS DETECTION
3940DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH
9166DEEPFILTERNET: A LOW COMPLEXITY SPEECH ENHANCEMENT FRAMEWORK FOR FULL-BAND AUDIO BASED ON DEEP FILTERING
5168DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation
5382DEEPHULL: FAST CONVEX HULL APPROXIMATION IN HIGH DIMENSIONS
4298DEEP-LEARNING-ASSISTED CONFIGURATION OF RECONFIGURABLE INTELLIGENT SURFACES IN DYNAMIC RICH-SCATTERING ENVIRONMENTS
4854DEEP-MLE: FUSION BETWEEN A NEURAL NETWORK AND MLE FOR A SINGLE SNAPSHOT DOA ESTIMATION
1771DEFENDING AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING WITH DIFFERENTIAL PRIVACY
1484Defending Against Universal Attack via Curvature-aware Category Adversarial Training
9265DEFENSIVE COMPRESSIVE TIME DELAY ESTIMATION USING INFORMATION BOTTLENECK
9091Deformable Convolution Dense Network for Compressed Video Quality Enhancement
3331Deformable VisTR: Spatio temporal deformable attention for video instance segmentation
3083DELAY-ORIENTED DISTRIBUTED SCHEDULING USING GRAPH NEURAL NETWORKS
4497DELIBERATION OF STREAMING RNN-TRANSDUCER BY NON-AUTOREGRESSIVE DECODING
8854DELTA DISTANCING: A LIFTING APPROACH TO LOCALIZING ITEMS FROM USER COMPARISONS
2136DEMENTIA DETECTION BY FUSING SPEECH AND EYE-TRACKING REPRESENTATION
2630DEMON: IMPROVED NEURAL NETWORK TRAINING WITH MOMENTUM DECAY
3868DENOISING-GUIDED DEEP REINFORCEMENT LEARNING FOR SOCIAL RECOMMENDATION
1369DENOISING-ORIENTED DEEP HIERARCHICAL REINFORCEMENT LEARNING FOR NEXT-BASKET RECOMMENDATION
2753DEPTH PRUNING WITH AUXILIARY NETWORKS FOR TINYML
1699DEPTH REMOVAL DISTILLATION FOR RGB-D SEMANTIC SEGMENTATION
6429DEPTH-BASED ENSEMBLE LEARNING NETWORK FOR FACE ANTI-SPOOFING
8783Deriving Explainable Discriminative Attributes Using Confusion About Counterfactual Class
1880DESIGN OF REAL-TIME SYSTEM BASED ON MACHINE LEARNING FOR SNORING AND OSA DETECTION
3415DESIGNING A QAM SIGNAL DETECTOR FOR MASSIVE MIMO SYSTEMS VIA PS-ADMM APPROACH
8753DETAIL GENERATION AND FUSION NETWORKS FOR IMAGE INPAINTING
5713DETECTING ANOMALY IN CHEMICAL SENSORS VIA REGULARIZED CONTRASTIVE LEARNING
1926DETECTING BACKDOOR ATTACKS AGAINST POINT CLOUD CLASSIFIERS
5656Detection of COPD exacerbation from speech: comparison of acoustic features and deep learning based speech breathing models
4901DETECTION OF COVID-19 FROM JOINT TIME AND FREQUENCY ANALYSIS OF SPEECH, BREATHING AND COUGH AUDIO
3875DETERMINING JOINT PERIODICITIES IN MULTI-TIME DATA WITH SAMPLING UNCERTAINTIES
3517DETERMINING THE BEST ACOUSTIC FEATURES FOR SMOKER IDENTIFICATION
5660DETERMINISTIC TRANSFORM BASED WEIGHT MATRICES FOR NEURAL NETWORKS
2233DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION
9071DHWP: LEARNING HIGH-QUALITY SHORT HASH CODES VIA WEIGHT PRUNING
5606DICTIONARY LEARNING WITH UNIFORM SPARSE REPRESENTATIONS FOR ANOMALY DETECTION
3646DIFFERENTIABLE DIGITAL SIGNAL PROCESSING MIXTURE MODEL FOR SYNTHESIS PARAMETER EXTRACTION FROM MIXTURE OF HARMONIC SOUNDS
1912DIFFERENTIABLE PROGRAMMING A LA MOREAU
1889DIFFERENTIABLE WAVETABLE SYNTHESIS
4536DIFFERENTIATE-AND-FIRE TIME-ENCODING OF FINITE-RATE-OF-INNOVATION SIGNALS
3918DIFFICULTY-AWARE NEURAL BAND-TO-PIANO SCORE ARRANGEMENT BASED ON NOTE- AND STATISTIC-LEVEL CRITERIA
9237DIGRAPH SIGNAL PROCESSING WITH GENERALIZED BOUNDARY CONDITIONS
1629DILATED CONVOLUTIONAL NEURAL NETWORK-BASED DEEP REFERENCE PICTURE GENERATION FOR VIDEO COMPRESSION
4347DIRECT DESIGN OF BIQUAD FILTER CASCADES WITH DEEP LEARNING BY SAMPLING RANDOM POLYNOMIALS
3121DIRECT LOCALIZATION: AN ISING MODEL APPROACH
5347DIRECT NOISY SPEECH MODELING FOR NOISY-TO-NOISY VOICE CONVERSION
4703DISCOURSE-LEVEL PROSODY MODELING WITH A VARIATIONAL AUTOENCODER FOR NON-AUTOREGRESSIVE EXPRESSIVE SPEECH SYNTHESIS
3111DISCRETE MULTI-KERNEL K-MEANS WITH DIVERSE AND OPTIMAL KERNEL LEARNING
2739DISENTANGLED FEATURE-GUIDED MULTI-EXPOSURE HIGH DYNAMIC RANGE IMAGING
3580DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
1888DISENTANGLING CONTENT AND FINE-GRAINED PROSODY INFORMATION VIA HYBRID ASR BOTTLENECK FEATURES FOR VOICE CONVERSION
6570DISPEECH: A SYNTHETIC TOY DATASET FOR SPEECH DISENTANGLING
2821DISTILHUBERT: SPEECH REPRESENTATION LEARNING BY LAYER-WISE DISTILLATION OF HIDDEN-UNIT BERT
3388DISTRIBUTED AUDIO-VISUAL PARSING BASED ON MULTIMODAL TRANSFORMER AND DEEP JOINT SOURCE CHANNEL CODING
4760DISTRIBUTED GRAPH LEARNING WITH SMOOTH DATA PRIORS
9189DISTRIBUTED HYBRID BEAMFORMING FOR MMWAVE CELL-FREE MASSIVE MIMO
5681DISTRIBUTED IMAGE TRANSMISSION USING DEEP JOINT SOURCE-CHANNEL CODING
3947DISTRIBUTED LABEL DEQUANTIZED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR MULTI-VIEW DATA INTEGRATION
9027DISTRIBUTED LINK SPARSIFICATION FOR SCALABLE SCHEDULING USING GRAPH NEURAL NETWORKS
1508DISTRIBUTED PARTICLE FILTERS FOR STATE TRACKING ON THE STIEFEL MANIFOLD USING TANGENT SPACE STATISTICS
4412DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH
8848DISTRIBUTION LEARNING FOR AGE ESTIMATION FROM SPEECH
2440DIVERGENCE-GUIDED FEATURE ALIGNMENT FOR CROSS-DOMAIN OBJECT DETECTION
4642DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING
4795DIVERSITY-CONTROLLABLE AND ACCURATE AUDIO CAPTIONING BASED ON NEURAL CONDITION
1514DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION
1485DNN BASED MULTIFRAME SINGLE-CHANNEL NOISE REDUCTION FILTERS
2086DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
1239DO YOU LIVE A HEALTHY LIFE? ANALYZING LIFESTYLE BY VISUAL LIFE LOGGING
1914DOA ESTIMATION VIA COARRAY TENSOR COMPLETION WITH MISSING SLICES
1243DOA M-ESTIMATION USING SPARSE BAYESIAN LEARNING
1211DOCUMENT-LEVEL EVENT EXTRACTION VIA HUMAN-LIKE READING PROCESS
9312DOMAIN ADAPTATION FOR FOOD INTAKE CLASSIFICATION WITH TEACHER/STUDENT LEARNING
4825Domain Adaptation for Speaker Recognition in Singing and Spoken Voice
4863DOMAIN ADAPTATION VIA MUTUAL INFORMATION MAXIMIZATION FOR HANDWRITING RECOGNITION
8459DOMAIN DECOMPOSITION ALGORITHMS FOR REAL-TIME HOMOGENEOUS DIFFUSION INPAINTING IN 4K
1435DOMAIN GENERALIZED FEW-SHOT IMAGE CLASSIFICATION VIA META REGULARIZATION NETWORK
4020DOMAIN ROBUST DEEP EMBEDDING LEARNING FOR SPEAKER RECOGNITION
3224DOMAIN-AGNOSTIC META-LEARNING FOR CROSS-DOMAIN FEW-SHOT CLASSIFICATION
1552DomainDesc: Learning Local Descriptors with Domain Adaptation
1534DOMAIN-INVARIANT FEATURE LEARNING FOR CROSS CORPUS SPEECH EMOTION RECOGNITION
2998DOMAIN-INVARIANT REPRESENTATION LEARNING FROM EEG WITH PRIVATE ENCODERS
3310Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization
8811DON'T SPEAK TOO FAST: THE IMPACT OF DATA BIAS ON SELF-SUPERVISED SPEECH MODELS
1395DOUBLE CLOSED-LOOP NETWORK FOR IMAGE DEBLURRING
1779DOUBLE NOISE MEAN TEACHER SELF-ENSEMBLING MODEL FOR SEMI-SUPERVISED TUMOR SEGMENTATION
2934DOUBLE-RIS VERSUS SINGLE-RIS AIDED SYSTEMS: TENSOR-BASED MIMO CHANNEL ESTIMATION AND DESIGN PERSPECTIVES
3462DOWNSTREAM AUGMENTATION GENERATION FOR CONTRASTIVE LEARNING
1390DPCCN: DENSELY-CONNECTED PYRAMID COMPLEX CONVOLUTIONAL NETWORK FOR ROBUST SPEECH SEPARATION AND EXTRACTION
2211DP-DWA: DUAL-PATH DYNAMIC WEIGHT ATTENTION NETWORK WITH STREAMING DFSMN-SAN FOR AUTOMATIC SPEECH RECOGNITION
2333DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT
3587DRC-NET: DENSELY CONNECTED RECURRENT CONVOLUTIONAL NEURAL NETWORK FOR SPEECH DEREVERBERATION
5039DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
2459Dual Active Noise Control with Common Sensors
8900DUAL ATTENTION POOLING NETWORK FOR RECORDING DEVICE CLASSIFICATION USING NEUTRAL AND WHISPERED SPEECH
2702Dual Graph Cross-domain Few-shot Learning for Hyperspectral Image Classification
6020DUAL PATH GRAPH CONVOLUTIONAL NETWORKS
2106DUAL-ATTENTION NETWORK FOR FEW-SHOT SEGMENTATION
1098Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement
1919DUAL-DOMAIN LOW-RANK FUSION DEEP METRIC LEARNING FOR OFF-THE-PERSON ECG BIOMETRICS
5414DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING
2295Dynamic Binary Neural Network by learning channel-wise thresholds
2242DYNAMIC MULTI-SCALE LOSS BALANCE FOR OBJECT DETECTION
5010Dynamic Point Cloud Interpolation
3748DYNAMIC PORTFOLIO CUTS: A SPECTRAL APPROACH TO GRAPH-THEORETIC DIVERSIFICATION
4643DYNAMIC RESOURCE OPTIMIZATION FOR ADAPTIVE FEDERATED LEARNING EMPOWERED BY RECONFIGURABLE INTELLIGENT SURFACES
5032DYNAMIC SLIDING WINDOW FOR REALTIME DENOISING NETWORKS
5984DYNAMIC TEXTURE RECOGNITION USING PDV HASHING AND DICTIONARY LEARNING ON MULTI-SCALE VOLUME LOCAL BINARY PATTERN
3081DYNAMICALLY PRUNING SEGFORMER FOR EFFICIENT SEMANTIC SEGMENTATION
2650DYNIMP: DYNAMIC IMPUTATION FOR WEARABLE SENSING DATA THROUGH SENSORY AND TEMPORAL RELATEDNESS
3819DynSNN: A Dynamic Approach to Reduce Redundancy in Spiking Neural Networks
1427DYSFLUENCY CLASSIFICATION IN STUTTERED SPEECH USING DEEP LEARNING FOR REAL-TIME APPLICATIONS
1680EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION
7989ECHO-AWARE ADAPTATION OF SOUND EVENT LOCALIZATION AND DETECTION IN UNKNOWN ENVIRONMENTS
3784ECO-FEDSPLIT: FEDERATED LEARNING WITH ERROR-COMPENSATED COMPRESSION
2383ECONOMICS OF SEMANTIC COMMUNICATION SYSTEM IN WIRELESS POWERED INTERNET OF THINGS
2793EDGE SAMPLING OF GRAPHS BASED ON EDGE SMOOTHNESS
3891EFFECT OF NOISE SUPPRESSION LOSSES ON SPEECH DISTORTION AND ASR PERFORMANCE
3351EFFECTIVE AND INCONSPICUOUS OVER-THE-AIR ADVERSARIAL EXAMPLES WITH ADAPTIVE FILTERING
3803EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION
8880EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING
4877EFFICIENT IDENTITY-BASED CHAMELEON HASH FOR MOBILE DEVICES
9269EFFICIENT IMAGE-WARPING FRAMEWORK FOR CONTENT-ADAPTIVE SUPERPIXELS GENERATION
3618EFFICIENT MONAURAL SPEECH SEPARATION WITH MULTISCALE TIME-DELAY SAMPLING
4616EFFICIENT SEQUENCE TRAINING OF ATTENTION MODELS USING APPROXIMATIVE RECOMBINATION
3395EFFICIENT TWO-STAGE BEAM TRAINING AND CHANNEL ESTIMATION FOR RIS-AIDED MMWAVE SYSTEMS VIA FAST ALTERNATING LEAST SQUARES
2015EFFICIENT UNIVERSAL SHUFFLE ATTACK FOR VISUAL OBJECT TRACKING
3491EFFICIENTLY AND GLOBALLY SOLVING JOINT BEAMFORMING AND COMPRESSION PROBLEM IN THE COOPERATIVE CELLULAR NETWORK VIA LAGRANGIAN DUALITY
1213Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement
4167EMBEDDING SIGNALS ON GRAPHS WITH UNBALANCED DIFFUSION EARTH MOVER’S DISTANCE
3881EMGSE: ACOUSTIC/EMG FUSION FOR MULTIMODAL SPEECH ENHANCEMENT
3550EMOQ-TTS: EMOTION INTENSITY QUANTIZATION FOR FINE-GRAINED CONTROLLABLE EMOTIONAL TEXT-TO-SPEECH
2209EMOTIONFLOW: CAPTURE THE DIALOGUE LEVEL EMOTION TRANSITIONS
2057ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT
1215ENCRYPTED IMAGE VISUAL SECURITY INDEX VIA NON-LOCAL RECOGNIZABLE DEGREE EVALUATION
4244ENCRYPTION RESISTANT DEEP NEURAL NETWORK WATERMARKING
3327Endpoint Detection for Streaming End-to-End Multi-talker ASR
2993End-to-end Alexa Device Arbitration
8908END-TO-END ASR-ENHANCED NEURAL NETWORK FOR ALZHEIMER’S DISEASE DIAGNOSIS
1740END-TO-END COMPLEX-VALUED MULTIDILATED CONVOLUTIONAL NEURAL NETWORK FOR JOINT ACOUSTIC ECHO CANCELLATION AND NOISE SUPPRESSION
1133END-TO-END DEEP LEARNING-BASED ADAPTATION CONTROL FOR FREQUENCY-DOMAIN ADAPTIVE SYSTEM IDENTIFICATION
2523END-TO-END KEYWORD SPOTTING USING NEURAL ARCHITECTURE SEARCH AND QUANTIZATION
3001END-TO-END LOW RESOURCE KEYWORD SPOTTING THROUGH CHARACTER RECOGNITION AND BEAM-SEARCH RE-SCORING
3835End-to-end multi-modal speech recognition with air and bone conducted speech
5943END-TO-END MUSIC REMASTERING SYSTEM USING SELF-SUPERVISED AND ADVERSARIAL TRAINING
4077END-TO-END NETWORK BASED ON TRANSFORMER FOR AUTOMATIC DETECTION OF COVID-19
1719End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline
5220END-TO-END NEURAL SPEECH CODING FOR REAL-TIME COMMUNICATIONS
2738End-to-End Speech Recognition from Federated Acoustic Models
2387END-TO-END SPEECH RECOGNITION WITH JOINT DEREVERBERATION OF SUB-BAND AUTOREGRESSIVE ENVELOPES
4743END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
3382ENERGY ALIGNMENT FOR BIAS RECTIFICATION IN CLASS INCREMENTAL LEARNING
4802ENHANCE RNNLMS WITH HIERARCHICAL MULTI-TASK LEARNING FOR ASR
6829ENHANCING AFFECTIVE REPRESENTATIONS OF MUSIC-INDUCED EEG THROUGH MULTIMODAL SUPERVISION AND LATENT DOMAIN ADAPTATION
1560ENHANCING AND DISSECTING CROWD COUNTING BY SYNTHETIC DATA
1599ENHANCING CLASS UNDERSTANDING VIA PROMPT-TUNING FOR ZERO-SHOT TEXT CLASSIFICATION
8540ENHANCING CONTEXTUAL ENCODING WITH STAGE-CONFUSION AND STAGE-TRANSITION ESTIMATION FOR EEG-BASED SLEEP STAGING
3187ENHANCING CONTRASTIVE LEARNING WITH TEMPORAL COGNIZANCE FOR AUDIO-VISUAL REPRESENTATION GENERATION
4721Enhancing Privacy Through Domain Adaptive Noise Injection for Speech Emotion Recognition
1541ENHANCING PROTOTYPICAL FEW-SHOT LEARNING BY LEVERAGING THE LOCAL-LEVEL STRATEGY
8728ENHANCING SPEAKING STYLES IN CONVERSATIONAL TEXT-TO-SPEECH SYNTHESIS WITH GRAPH-BASED MULTI-MODAL CONTEXT MODELING
5512ENHANCING UTILITY IN THE WATCHDOG PRIVACY MECHANISM
2407ENRICH FEATURES FOR FEW-SHOT POINT CLOUD CLASSIFICATION
3110ENTRAINMENT ANALYSIS FOR ASSESSMENT OF AUTISTIC SPEECH PROSODY USING BOTTLENECK FEATURES OF DEEP NEURAL NETWORK
2361ENVIRONMENTAL SOUND EXTRACTION USING ONOMATOPOEIC WORDS
9271EPIGRAPHICAL RELAXATION FOR MINIMIZING LAYERED MIXED NORMS
4730Epileptic Spike Detection by Recurrent Neural Networks with Self-Attention Mechanism
8911EQUAL LOSS: A SIMPLE LOSS FUNCTION FOR NOISE ROBUST LEARNING
1829ER-PIQA: A TASK-GUIDED PEDESTRIAN IMAGE QUALITY ASSESSMENT VIA EMBEDDING RECONSTRUCTION
5372ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET
2903ESTIMATING THE CONFIDENCE OF SPEECH SPOOFING COUNTERMEASURE
9155ESTIMATION OF CHANNELS IN SYSTEMS WITH INTELLIGENT REFLECTING SURFACES
5509ESTIMATION OF THE ADMITTANCE MATRIX IN POWER SYSTEMS UNDER LAPLACIAN AND PHYSICAL CONSTRAINTS
4805EVALUATION OF ORTHOGONAL CHIRP DIVISION MULTIPLEXING FOR AUTOMOTIVE INTEGRATED SENSING AND COMMUNICATIONS
4067EVALUATION OF VIDEO CODING FOR MACHINES WITHOUT GROUND TRUTH
2337EVENT-BASED MULTIMODAL SPIKING NEURAL NETWORK WITH ATTENTION MECHANISM
3231EVOLUTIONARY NEURAL ARCHITECTURE DESIGN OF LIQUID STATE MACHINE FOR IMAGE CLASSIFICATION
2554EXACT PARTITIONING OF HIGH-ORDER PLANTED MODELS WITH A TENSOR NUCLEAR NORM CONSTRAINT
2679EXACT SPARSE SUPER-RESOLUTION VIA MODEL AGGREGATION
2999EXPECTATION CONSISTENT PLUG-AND-PLAY FOR MRI
3663EXPERIMENTAL INVESTIGATION ON STFT PHASE REPRESENTATIONS FOR DEEP LEARNING-BASED DYSARTHRIC SPEECH DETECTION
8808EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES
5131EXPLAINABLE ARTIFICIAL INTELLIGENCE FOR AUTHORSHIP ATTRIBUTION ON SOCIAL MEDIA
3152EXPLAINABLE FACT-CHECKING THROUGH QUESTION ANSWERING
4273EXPLAINING DEEP LEARNING MODELS FOR SPOOFING AND DEEPFAKE DETECTION WITH SHAPLEY ADDITIVE EXPLANATIONS
1378Explicitly Modeling Importance and Coherence for Timeline Summarization
4748EXPLOITING ANNOTATORS’ TYPED DESCRIPTION OF EMOTION PERCEPTION TO MAXIMIZE UTILIZATION OF RATINGS FOR SPEECH EMOTION RECOGNITION
3184EXPLOITING CAPTION DIVERSITY FOR UNSUPERVISED VIDEO SUMMARIZATION
2191EXPLOITING CROSS DOMAIN ACOUSTIC-TO-ARTICULATORY INVERTED FEATURES FOR DISORDERED SPEECH RECOGNITION
3093EXPLOITING HYBRID MODELS OF TENSOR-TRAIN NETWORKS FOR SPOKEN COMMAND RECOGNITION
9309Exploiting Information About the Structure of Signals of Opportunity for Passive Radar Performance Increase
1659Exploiting Language Model for Efficient Linguistic Steganalysis
9290EXPLOITING TEMPORAL CONTEXT IN CNN BASED MULTISOURCE DOA ESTIMATION
6888EXPLORING AUDITORY ACOUSTIC FEATURES FOR THE DIAGNOSIS OF COVID-19
2435EXPLORING CATEGORY CONSISTENCY FOR WEAKLY SUPERVISED SEMANTIC SEGMENTATION
4004EXPLORING COMPLEMENTARITY OF GLOBAL AND LOCAL SPATIOTEMPORAL INFORMATION FOR FAKE FACE VIDEO DETECTION
3673EXPLORING DEEPER GRAPH CONVOLUTIONS FOR SEMI-SUPERVISED NODE CLASSIFICATION
8708EXPLORING DEMENTIA DETECTION FROM SPEECH: CROSS CORPUS ANALYSIS
1364Exploring Dual Stream Global Information for Image Captioning
3916EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
4685EXPLORING HETEROGENEOUS CHARACTERISTICS OF LAYERS IN ASR MODELS FOR MORE EFFICIENT TRAINING
1750Exploring Machine Speech Chain for Domain Adaptation
3637EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS
5291EXPLORING THE EFFECT OF L0/L2 REGULARIZATION IN NEURAL NETWORK PRUNING USING THE LC TOOLKIT
1962EXPLORING TRANSFERABILITY MEASURES AND DOMAIN SELECTION IN CROSS-DOMAIN SLOT FILLING
3449EXPLORING TRANSFORMER’S POTENTIAL ON AUTOMATIC PIANO TRANSCRIPTION
9259EXPONENTIAL HYPERBOLIC COSINE ROBUST ADAPTIVE FILTERS FOR AUDIO SIGNAL PROCESSING
4842EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR
9129EXTENDING THE USE OF MDL FOR HIGH-DIMENSIONAL PROBLEMS: VARIABLE SELECTION, ROBUST FITTING, AND ADDITIVE MODELING
8521EXTRACTING AND DISTILLING DIRECTION-ADAPTIVE KNOWLEDGE FOR LIGHTWEIGHT OBJECT DETECTION IN REMOTE SENSING IMAGES
8865EXTREME-POINT PURSUIT FOR UNIT-MODULUS OPTIMIZATION
5141EYES TELL ALL: IRREGULAR PUPIL SHAPES REVEAL GAN-GENERATED FACES
1509Factorized Neural Transducer for Efficient Language Model Adaptation
4761FAIRNESS-AWARE SELECTIVE SAMPLING ON ATTRIBUTED GRAPHS
9233Fast Adaptive Active Noise Control Based on Modified Model-Agnostic Meta-Learning Algorithm
4829FAST AND STABLE CONVERGENCE OF ONLINE SGD FOR CV@R-BASED RISK-AWARE LEARNING
3282Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
5743FAST FAULT DIAGNOSIS METHOD OF ROLLING BEARINGS IN MULTI-SENSOR MEASUREMENT ENVIROMENT
9267Fast Graph Filters for Decentralized Subspace Projection
5050FAST GRAPH SAMPLING FOR SHORT VIDEO SUMMARIZATION USING GERSHGORIN DISC ALIGNMENT
2937FAST LEARNING OF FAST TRANSFORMS, WITH GUARANTEES
4287FAST LOW RANK COLUMN-WISE COMPRESSIVE SENSING FOR ACCELERATED DYNAMIC MRI
4291FAST MULTISCALE DIFFUSION ON GRAPHS
3707FAST TASK-SPECIFIC ADAPTATION IN SPOKEN LANGUAGE ASSESSMENT WITH META-LEARNING
2922FAST VIDEO OBJECT SEGMENTATION VIA DYNAMIC YOLACT
4589FastAudio: A Learnable Audio Front-End for Spoof Speech Detection
1736FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR
2625FAST-SLOW TRANSFORMER FOR VISUALLY GROUNDING SPEECH
1747FAZ-BV: A DIABETIC MACULAR ISCHEMIA GRADING FRAMEWORK COMBINING FAZ ATTENTION NETWORK AND BLOOD VESSEL ENHANCEMENT FILTERS
1581FDSNET: AN ACCURATE REAL-TIME SURFACE DEFECT SEGMENTATION NETWORK
9303FEASIBILITY OF JOINT POWER OPTIMIZATION OF MULTIPLE SOURCE-DESTINATIONS IN AN AF RELAY NETWORK
8768FEATURE AUGMENTATION LEARNING FOR FEW-SHOT PALMPRINT IMAGE RECOGNITION WITH UNCONSTRAINED ACQUISITION
4754Feature Imitating Networks
1246FEATURE SPACE MESSAGE PASSING NETWORK FOR MEDICAL IMAGE SEMANTIC SEGMENTATION
4788FEATURE-BASED SENSING MATRIX DESIGN FOR ANALOG TO INFORMATION CONVERTERS
2921FedClean: A Defense Mechanism Against Parameter Poisoning Attacks in Federated Learning
2647Federated Learning Challenges and Opportunities: An Outlook
5203FEDERATED MULTI-ARMED BANDIT VIA UNCOORDINATED EXPLORATION
4784FEDERATED OVER-AIR ROBUST SUBSPACE TRACKING FROM MISSING DATA
4970FEDERATED SELF-SUPERVISED LEARNING FOR ACOUSTIC EVENT CLASSIFICATION
2802FEDERATED SELF-TRAINING FOR DATA-EFFICIENT AUDIO RECOGNITION
1445FEDERATED STOCHASTIC GRADIENT DESCENT BEGETS SELF-INDUCED MOMENTUM
5243FEW-SHOT GAZE ESTIMATION WITH MODEL OFFSET PREDICTORS
2713FEW-SHOT GENERATION BY MODELING STEREOSCOPIC PRIORS
4026Few-shot learning with improved local representations via bias rectify module
4232FEW-SHOT MUSICAL SOURCE SEPARATION
4130FEW-SHOT OBJECT DETECTION WITH LOCAL CORRESPONDENCE RPN and ATTENTIVE HEAD
9187FEW-SHOT ONE-CLASS DOMAIN ADAPTATION BASED ON FREQUENCY FOR IRIS PRESENTATION ATTACK DETECTION
9326FifthNet: Structured Compact Neural Networks for Automatic Chord Recognition
3589FilterAugment: An Acoustic Environmental Data Augmentation Method
1059FIND THE WAY BACK: INVERTIBLE KERNEL ESTIMATOR FOR BLIND IMAGE SUPER-RESOLUTION
6873Fine-Grained Dynamic Loss for Accurate Single-Image Super-Resolution
4787FINE-GRAINED STYLE CONTROL IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS
1826FINE-TUNING WAV2VEC2 FOR SPEAKER RECOGNITION
9175FINT: FIELD-AWARE INTERACTION NEURAL NETWORK FOR CLICK-THROUGH RATE PREDICTION
2747FLDP: Flexible strategy for local differential privacy
4814Floor Plan Reconstruction with High-Precision RF-based Tracking
9073FLOW-BASED FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION
2235FLOW-BASED POINT CLOUD COMPLETION NETWORK WITH ADVERSARIAL REFINEMENT
3217FLOWDT: A FLOW-AWARE DIGITAL TWIN FOR COMPUTER NETWORKS
4716FORENSIC ANALYSIS AND LOCALIZATION OF MULTIPLY COMPRESSED MP3 AUDIO USING TRANSFORMERS
1298FOSTERING THE ROBUSTNESS OF WHITE-BOX DEEP NEURAL NETWORK WATERMARKS BY NEURON ALIGNMENT
1677FOV-BASED CODING OPTIMIZATION FOR 360-DEGREE VIRTUAL REALITY VIDEOS
2464FRACTURE DETECTION AND LOCALIZATION IN CHEST X-RAYS USING SEMI-SUPERVISED LEARNING WITH DYNAMIC SHARPENING
2381FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals
1945FREE LUNCH FOR CROSS-DOMAIN OCCLUDED FACE RECOGNITION WITHOUT SOURCE DATA
3545FRE-GAN 2: FAST AND EFFICIENT FREQUENCY-CONSISTENT AUDIO SYNTHESIS
9262Frequency Domain Long-Term Prediction for Low Delay General Audio Coding
3474FREQUENCY-SPECIFIC NON-LINEAR GRANGER CAUSALITY IN A NETWORK OF BRAIN SIGNALS
2205FROM BOTTOM-UP TO TOP-DOWN: CHARACTERIZATION OF TRAINING PROCESS IN GAZE MODELING
2584FROM SHALLOW TO DEEP: COMPOSITIONAL REASONING OVER GRAPHS FOR VISUAL QUESTION ANSWERING
5900FRONTEND ATTRIBUTES DISENTANGLEMENT FOR SPEECH EMOTION RECOGNITION
1748FSM: FEATURE SAMPLING MODULE FOR OBJECT DETECTION
2074FSOINET: FEATURE-SPACE OPTIMIZATION-INSPIRED NETWORK FOR IMAGE COMPRESSIVE SENSING
3022FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement
6017FUSING ASR OUTPUTS IN JOINT TRAINING FOR SPEECH EMOTION RECOGNITION
3118FUSION AND ORTHOGONAL PROJECTION FOR IMPROVED FACE-VOICE ASSOCIATION
1734FUSION OF MODULATION SPECTRAL AND SPECTRAL FEATURES WITH SYMPTOM METADATA FOR IMPROVED SPEECH-BASED COVID-19 DETECTION
3261FUSION-ID: A PHOTOPLETHYSMOGRAPHY AND MOTION SENSOR FUSION BIOMETRIC AUTHENTICATOR WITH FEW-SHOT ON-BOARDING
9095GAN-BASED JOINT ACTIVITY DETECTION AND CHANNEL ESTIMATION FOR GRANT-FREE RANDOM ACCESS
9058GANET: UNARY ATTENTION REACHES PAIRWISE ATTENTION VIA IMPLICIT GROUP CLUSTERING IN LIGHT-WEIGHT CNNS
4702GATED MULTIMODAL FUSION WITH CONTRASTIVE LEARNING FOR TURN-TAKING PREDICTION IN HUMAN-ROBOT DIALOGUE
4850GAZEATTENTIONNET: GAZE ESTIMATION WITH ATTENTIONS
3694GENERALIZATION ABILITY OF MOS PREDICTION NETWORKS
1371GENERALIZED AUTOCORRELATION ANALYSIS FOR MULTI-TARGET DETECTION
1707GENERALIZED FACE ANTI-SPOOFING VIA CROSS-ADVERSARIAL DISENTANGLEMENT WITH MIXING AUGMENTATION
4489GENERALIZED MATCHING PURSUITS FOR THE SPARSE OPTIMIZATION OF SEPARABLE OBJECTIVES
3608GENERALIZED SLICED PROBABILITY METRICS
4641GENERALIZED TIME DOMAIN VELOCITY VECTOR
9042GENERALIZED ZERO-SHOT LEARNING USING CONDITIONAL WASSERSTEIN AUTOENCODER
9240GENERALIZING AUC OPTIMIZATION TO MULTICLASS CLASSIFICATION FOR AUDIO SEGMENTATION WITH LIMITED TRAINING DATA
2248GENERATING DISENTANGLED ARGUMENTS WITH PROMPTS: A SIMPLE EVENT EXTRACTION FRAMEWORK THAT WORKS
1916GENERATION FOR UNSUPERVISED DOMAIN ADAPTATION: A GAN-BASED APPROACH FOR OBJECT CLASSIFICATION WITH 3D POINT CLOUD DATA
2815GENERATION OF PERSONAL SOUND FIELDS IN REVERBERANT ENVIRONMENTS USING INTERFRAME CORRELATION
3843GENERATIVE ADVERSARIAL NETWORK INCLUDING REFERRING IMAGE SEGMENTATION FOR TEXT-GUIDED IMAGE MANIPULATION
5505GENRE-CONDITIONED ACOUSTIC MODELS FOR AUTOMATIC LYRICS TRANSCRIPTION OF POLYPHONIC MUSIC
5710Genre-Conditioned Long-Term 3D Dance Generation Driven by Music
3400GEOMETRIC LOW-RANK TENSOR APPROXIMATION FOR REMOTELY SENSED HYPERSPECTRAL AND MULTISPECTRAL IMAGERY FUSION
3571GLASSOFORMER: A QUERY-SPARSE TRANSFORMER FOR POST-FAULT POWER GRID VOLTAGE PREDICTION
3365GLOBAL EVOLUTION NEURAL NETWORK FOR SEGMENTATION OF REMOTE SENSING IMAGES
3543GLOBAL OPTIMIZATION SOLUTION FOR DYNAMIC ADAPTIVE 360-DEGREE STREAMING
3433GLOBAL-LOCAL FEATURE ENHANCEMENT NETWORK FOR ROBUST OBJECT DETECTION USING MMWAVE RADAR AND CAMERA
8952GOAL-ORIENTED COMMUNICATION FOR EDGE LEARNING BASED ON THE INFORMATION BOTTLENECK
2528GOS: A LARGE-SCALE ANNOTATED OUTDOOR SCENE SYNTHETIC DATASET
8984GPU-ACCELERATED FORWARD-BACKWARD ALGORITHM WITH APPLICATION TO LATTICE-FREE MMI
4756GRADIENT STALENESS IN ASYNCHRONOUS OPTIMIZATION UNDER RANDOM COMMUNICATION DELAYS
1308GRADIENT VARIANCE LOSS FOR STRUCTURE-ENHANCED IMAGE SUPER-RESOLUTION
4570Gradient-weighted Class Activation Mapping for spatio temporal graph convolutional network
3762GRADUAL SURROGATE GRADIENT LEARNING IN DEEP SPIKING NEURAL NETWORKS
2854GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
2282GRAPH CONVOLUTION FOR RE-RANKING IN PERSON RE-IDENTIFICATION
3490GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA
4634GRAPH CONVOLUTIONAL NETWORKS WITH AUTOENCODER-BASED COMPRESSION AND MULTI-LAYER GRAPH LEARNING
2856Graph Fine-Grained Contrastive Representation Learning
1796GRAPH LEARNING BASED AUTOENCODER FOR HYPERSPECTRAL BAND SELECTION
3269Graph Learning from Multivariate Dependent Time Series via a Multi-Attribute Formulation
6022GRAPH LEARNING INFORMATION CRITERION
9268GRAPH SIGNAL PROCESSING: VERTEX MULTIPLICATION
5577GRAPH-BASED POINT CLOUD DENOISING USING SHAPE-AWARE CONSISTENCY FOR FREE-VIEWPOINT VIDEO
3275GRAPHON-AIDED JOINT ESTIMATION OF MULTIPLE GRAPHS
3537GRAPH-STRUCTURED SPARSE REGULARIZATION VIA CONVEX OPTIMIZATION
4133Grassmannian Dimensionality Reduction Using Triplet Margin Loss for UME Classification of 3D Point Clouds
9248GRIDLESS DOA ESTIMATION AND ROOT-MUSIC FOR NON-UNIFORM LINEAR ARRAYS
1616Gridless DOA Estimation Under the Multi-frequency Model
5752Group-wise Feature Selection for Supervised Learning
3919HALF INVERTED NESTED ARRAYS WITH LARGE HOLE-FREE FOURTH-ORDER DIFFERENCE CO-ARRAYS
5258Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism
5144HARMONIC AND PERCUSSIVE SOUND SEPARATION BASED ON MIXED PARTIAL DERIVATIVE OF PHASE SPECTROGRAM
5158HARMONICITY PLAYS A CRITICAL ROLE IN DNN BASED VERSUS IN BIOLOGICALLY-INSPIRED MONAURAL SPEECH SEGREGATION SYSTEMS
9252HARMONIC-TEMPORAL FACTOR DECOMPOSITION FOR UNSUPERVISED MONAURAL SEPARATION OF HARMONIC SOUNDS
4907HARVESTING PARTIALLY-DISJOINT TIME-FREQUENCY INFORMATION FOR IMPROVING DEGENERATE UNMIXING ESTIMATION TECHNIQUE
5695HAVE BEST OF BOTH WORLDS: TWO-PASS HYBRID AND E2E CASCADING FRAMEWORK FOR SPEECH RECOGNITION
1639HBP: AN EFFICIENT BLOCK PERMUTATION SOLVER USING HUNGARIAN ALGORITHM AND SPECTROGRAM INPAINTING FOR MULTICHANNEL AUDIO SOURCE SEPARATION
1986HEART RATE AND OXYGEN SATURATION ESTIMATION FROM FACIAL VIDEO WITH MULTIMODAL PHYSIOLOGICAL DATA GENERATION
9117Heterogeneous Graph Node Classification with Multi-Hops Relation Features
8799HEURISTIC DROPOUT: AN EFFICIENT REGULARIZATION METHOD FOR MEDICAL IMAGE SEGMENTATION MODELS
3955HGCN: HARMONIC GATED COMPENSATION NETWORK FOR SPEECH ENHANCEMENT
1876HIERARCHICAL AND MULTI-VIEW DEPENDENCY MODELLING NETWORK FOR CONVERSATIONAL EMOTION RECOGNITION
1333HIERARCHICAL CLASSIFICATION OF SINGING ACTIVITY, GENDER, AND TYPE IN COMPLEX MUSIC RECORDINGS
1980HIERARCHICAL CONDITIONAL END-TO-END ASR WITH CTC AND MULTI-GRANULAR SUBWORD UNITS
2649HIERARCHICAL DEEP LEARNING MODEL WITH INERTIAL AND PHYSIOLOGICAL SENSORS FUSION FOR WEARABLE-BASED HUMAN ACTIVITY RECOGNITION
3519HIERARCHICAL FEATURE AGGREGATION NETWORK FOR DEEP IMAGE COMPRESSION
5946Hierarchical Graph-based Neural Network for Singing Melody Extraction
4490HIERARCHICAL PROSODY MODELING AND CONTROL IN NON-AUTOREGRESSIVE PARALLEL NEURAL TTS
4181Hierarchical Signal Fusion Network for Pulsar Detection with Phase-Correlation and Signal Attentions
2127HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS
2776HIFI-SVC: FAST HIGH FIDELITY CROSS-DOMAIN SINGING VOICE CONVERSION
4913HIGH-DIMENSIONAL SPARSE BAYESIAN LEARNING WITHOUT COVARIANCE MATRICES
8090High-fidelity Portrait Editing via Exploring Differentiable Guided Sketches from the Latent Space
2393HIGH-QUALITY SELF-SUPERVISED SNAPSHOT HYPERSPECTRAL IMAGING
3410HIRL: Hybrid Image Restoration based on Hierarchical Deep Reinforcement Learning via Two-Step Analysis
3815HISTOGRAM-GUIDED SEMANTIC-AWARE COLORIZATION
3031HISTOKT: CROSS KNOWLEDGE TRANSFER IN COMPUTATIONAL PATHOLOGY
2601HODGELETS: LOCALIZED SPECTRAL REPRESENTATIONS OF FLOWS ON SIMPLICIAL COMPLEXES
1359HOLISTIC SEMI-SUPERVISED APPROACHES FOR EEG REPRESENTATION LEARNING
9194HOQRI: Higher-order QR Iteration for Scalable Tucker Decomposition
4726HOW CAN A COGNITIVE RADAR MASK ITS COGNITION?
5261HOW NEURAL PROCESSES IMPROVE GRAPH LINK PREDICTION
5252HOW SECURE ARE THE ADVERSARIAL EXAMPLES THEMSELVES?
4739HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION
1608Human Decision Making with Bounded Rationality
8946Human emotion recognition using multi-modal biological signals based on time lag-considered correlation maximization
4769HYBRID ATTENTION-BASED PROTOTYPICAL NETWORKS FOR FEW-SHOT SOUND CLASSIFICATION
5278HYBRID RNN-T/ATTENTION-BASED STREAMING ASR WITH TRIGGERED CHUNKWISE ATTENTION AND DUAL INTERNAL LANGUAGE MODEL INTEGRATION
8997Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages
3023Hybrid Weighting Loss for Precipitation Nowcasting from Radar Images
2957HYPERGRAPH-BASED REINFORCEMENT LEARNING FOR STOCK PORTFOLIO SELECTION
2082HYPERGRAPHS WITH EDGE-DEPENDENT VERTEX WEIGHTS: SPECTRAL CLUSTERING BASED ON THE 1-LAPLACIAN
4353HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON CO-LEARNING THROUGH DUAL-ARCHITECTURE ENSEMBLE
4141HYPERSPECTRAL IMAGE SUPER-RESOLUTION WITH DEEP PRIORS AND DEGRADATION MODEL INVERSION
9243Identification of Edge Disconnections in Networks Based on Graph Filter Outputs
4704IDENTIFICATION OF PULSE STREAMS OF UNKNOWN SHAPE FROM TIME ENCODING MACHINE SAMPLES
4070IMAGE DENOISING WITH DEEP UNFOLDING AND NORMALIZING FLOWS
9285Image Restoration via Reconciliation of Group Sparsity and Low-Rank Models
4245IMAGE STEGANALYSIS WITH CONVOLUTIONAL VISION TRANSFORMER
2111IMAGE-TEXT ALIGNMENT AND RETRIEVAL USING LIGHT-WEIGHT TRANSFORMER
2640IMAGE-TO-GRAPH TRANSFORMERS FOR CHEMICAL STRUCTURE RECOGNITION
1148IMAGE-TO-VIDEO RE-IDENTIFICATION VIA MUTUAL DISCRIMINATIVE KNOWLEDGE TRANSFER
3502Importance of switch optimization criterion in Switching WPE dereverberation
4561IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION
8989IMPORTANTAUG: A DATA AUGMENTATION AGENT FOR SPEECH
2565IMPQ: REDUCED COMPLEXITY NEURAL NETWORKS VIA GRANULAR PRECISION ASSIGNMENT
2590IMPROVE FEW-SHOT VOICE CLONING USING MULTI-MODAL LEARNING
3456IMPROVE IMAGE CAPTIONING VIA RELATION MODELING
9178IMPROVED BEAMFORMING ENCODING FOR JOINT RADAR AND COMMUNICATION
5081IMPROVED LANGUAGE IDENTIFICATION THROUGH CROSS-LINGUAL SELF-SUPERVISED LEARNING
5826IMPROVED META LEARNING FOR LOW RESOURCE SPEECH RECOGNITION
4605IMPROVED REPRESENTATION LEARNING FOR ACOUSTIC EVENT CLASSIFICATION USING TREE-STRUCTURED ONTOLOGY
4601IMPROVED SIMULATION OF REALISTICALLY-SPATIALISED SIMULTANEOUS SPEECH USING MULTI-CAMERA ANALYSIS IN THE CHIME-5 DATASET
4732IMPROVED SINGING VOICE SEPARATION WITH CHROMAGRAM-BASED PITCH-AWARE REMIXING
8789IMPROVING ACTOR-CRITIC REINFORCEMENT LEARNING VIA HAMILTONIAN MONTE CARLO METHOD
1453IMPROVING ADVERSARIAL WAVEFORM GENERATION BASED SINGING VOICE CONVERSION WITH HARMONIC SIGNALS
1013IMPROVING ANOMALY DETECTION WITH A SELF-SUPERVISED TASK BASED ON GENERATIVE ADVERSARIAL NETWORK
3270IMPROVING BCI-BASED COLOR VISION ASSESSMENT USING GAUSSIAN PROCESS REGRESSION
1303IMPROVING BIOMEDICAL NAMED ENTITY RECOGNITION WITH A UNIFIED MULTI-TASK MRC FRAMEWORK
4473IMPROVING BIRD CLASSIFICATION WITH UNSUPERVISED SOUND SEPARATION
5783IMPROVING BRAIN DECODING METHODS AND EVALUATION
8838Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models
2646IMPROVING CLASS ACTIVATION MAP FOR WEAKLY SUPERVISED OBJECT LOCALIZATION
3210IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION
2365Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents
2148IMPROVING CROSS-LINGUAL SPEECH SYNTHESIS WITH TRIPLET TRAINING SCHEME
1758IMPROVING CROSS-MODAL UNDERSTANDING IN VISUAL DIALOG VIA CONTRASTIVE LEARNING
2725IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS
2931IMPROVING DIALOGUE GENERATION VIA PROACTIVELY QUERYING GROUNDED KNOWLEDGE
1537IMPROVING DUAL-MICROPHONE SPEECH ENHANCEMENT BY LEARNING CROSS-CHANNEL FEATURES WITH MULTI-HEAD ATTENTION
2970IMPROVING DYNAMIC GRAPH CONVOLUTIONAL NETWORK WITH FINE-GRAINED ATTENTION MECHANISM
1837IMPROVING EMOTIONAL SPEECH SYNTHESIS BY USING SUS-CONSTRAINED VAE AND TEXT ENCODER AGGREGATION
1498IMPROVING END-TO-END CONTEXTUAL SPEECH RECOGNITION WITH FINE-GRAINED CONTEXTUAL KNOWLEDGE SELECTION
3228IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING
2902IMPROVING END-TO-END SPEECH TRANSLATION MODEL WITH BERT-BASED CONTEXTUAL INFORMATION
4647Improving Factored Hybrid HMM Acoustic Modeling without State Tying
2064IMPROVING FAIRNESS IN SPEAKER VERIFICATION VIA GROUP-ADAPTED FUSION NETWORK
2914IMPROVING FASTSPEECH TTS WITH EFFICIENT SELF-ATTENTION AND COMPACT FEED-FORWARD NETWORK
2806IMPROVING FEATURE GENERALIZABILITY WITH MULTITASK LEARNING IN CLASS INCREMENTAL LEARNING
2516IMPROVING INFERENCE FOR SPATIAL SIGNALS BY CONTEXTUAL FALSE DISCOVERY RATES
1521Improving Joint Sparse Hyperspectral Unmixing by Simultaneously Clustering Pixels According to their Mixtures
3126IMPROVING LYRICS ALIGNMENT THROUGH JOINT PITCH DETECTION
8615Improving Maximum Likelihood Difference Scaling method to measure inter content scale
4455IMPROVING NOISE ROBUSTNESS OF CONTRASTIVE SPEECH REPRESENTATION LEARNING WITH SPEECH RECONSTRUCTION
4920IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS
5216IMPROVING PHASE-RECTIFIED SIGNAL AVERAGING FOR FETAL HEART RATE ANALYSIS
2475IMPROVING PHONETIC REALIZATIONS IN TTS BY USING PHONEME-ALIGNED GRAPHEMES
2076IMPROVING PSEUDO-LABEL TRAINING FOR END-TO-END SPEECH RECOGNITION USING GRADIENT MASK
1846IMPROVING RECOGNITION-SYNTHESIS BASED ANY-TO-ONE VOICE CONVERSION WITH CYCLIC TRAINING
3621IMPROVING REFERENCE-BASED IMAGE COLORIZATION FOR LINE ARTS VIA FEATURE AGGREGATION AND CONTRASTIVE LEARNING
3002IMPROVING SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION
2368IMPROVING SEPARATION-BASED SPEAKER DIARIZATION VIA ITERATIVE MODEL REFINEMENT AND SPEAKER EMBEDDING BASED POST-PROCESSING
2720IMPROVING SOURCE SEPARATION BY EXPLICITLY MODELING DEPENDENCIES BETWEEN SOURCES
1222IMPROVING SPOKEN LANGUAGE UNDERSTANDING BY ENHANCING TEXT REPRESENTATION
3278IMPROVING THE CLASSIFICATION OF PHONETIC SEGMENTS FROM RAW ULTRASOUND USING SELF-SUPERVISED LEARNING AND HARD EXAMPLE MINING
3970IMPROVING THE FUSION OF ACOUSTIC AND TEXT REPRESENTATIONS IN RNN-T
2065IMPROVING THE LATENCY AND QUALITY OF CASCADED ENCODERS
4572Improving Ultrasound Image Classification With Local Texture Quantisation
3328In Pursuit of Preserving the Fidelity of Adversarial Images
2668INCIPIENT FAULT SEVERITY ESTIMATION USING LOCAL MAHALANOBIS DISTANCE
1225INCOHERENT SYNTHESIS OF SPARSE BROADBAND ARRAYS BASED ON A PARAMETER-FREE SUBSPACE CLUSTERING
3486INCORPORATING END-TO-END FRAMEWORK INTO TARGET-SPEAKER VOICE ACTIVITY DETECTION
9221INCORPORATING GAZE BEHAVIOR USING JOINT EMBEDDING WITH SCENE CONTEXT FOR DRIVER TAKEOVER DETECTION
3509Increasing Loudness in Audio Signals: a perceptually motivated approach to preserve audio quality
4098INCREMENTAL CONTEXT AWARE ATTENTIVE KNOWLEDGE TRACING
9232INCREMENTAL TEXT-TO-SPEECH SYNTHESIS USING PSEUDO LOOKAHEAD WITH LARGE PRETRAINED LANGUAGE MODEL
5708INCREMENTAL USER EMBEDDING MODELING FOR PERSONALIZED TEXT CLASSIFICATION
4562Independent Vector Analysis Based Subgroup Identification from Multisubject fMRI data
9272INDEPENDENT VECTOR ANALYSIS VIA LOG-QUADRATICALLY PENALIZED QUADRATIC MINIMIZATION
1386INDIVIDUALIZED HEAR-THROUGH FOR ACOUSTIC TRANSPARENCY USING PCA-BASED SOUND PRESSURE ESTIMATION AT THE EARDRUM
5159INFANT CRYING DETECTION IN REAL-WORLD ENVIRONMENTS
4486INFERGRAD: IMPROVING DIFFUSION MODELS FOR VOCODER BY CONSIDERING INFERENCE IN TRAINING
2156Inferring Camera Intrinsics Based on Surfaces of Revolution: A Single Image Geometric Network Approach for Camera Calibration
5127INFORMATION THEORETIC LIMITS FOR STANDARD AND ONE-BIT COMPRESSED SENSING WITH GRAPH-STRUCTURED SPARSITY
2343Informative Attention Supervision for Grounded Video Description
7908INITIALIZATION-FREE IMPLICIT-FOCUSING (IF2) FOR WIDEBAND DIRECTION-OF-ARRIVAL ESTIMATION
9016INJECTING TEXT AND CROSS-LINGUAL SUPERVISION IN FEW-SHOT LEARNING FROM SELF-SUPERVISED MODELS
6360INSTANTANEOUS LINEAR DIMENSIONALITY REDUCTION OF MULTICHANNEL TIME-SERIES SIGNAL FOR ARRAY SIGNAL PROCESSING
2184INTEGER-ONLY ZERO-SHOT QUANTIZATION FOR EFFICIENT SPEECH RECOGNITION
8863Integrated Sensing and Communications via 5G NR Waveform: Performance Analysis
2997INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION
5232Integrating multiple ASR systems into NLP backend with attention fusion
2933INTEGRATING PRETRAINED LANGUAGE MODEL FOR DIALOGUE POLICY EVALUATION
4660INTEGRATING STATISTICAL UNCERTAINTY INTO NEURAL NETWORK-BASED SPEECH ENHANCEMENT
3193INTEGRATING TEXT INPUTS FOR TRAINING AND ADAPTING RNN TRANSDUCER ASR MODELS
1431INTEGRATION OF ANOMALY MACHINE SOUND DETECTION INTO ACTIVE NOISE CONTROL TO SHAPE THE RESIDUAL SOUND
2481INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
4372Intelligent Wi-Fi Based Child Presence Detection System
2783INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION
4654INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
5789INTERMIX: AN INTERFERENCE-BASED DATA AUGMENTATION AND REGULARIZATION TECHNIQUE FOR AUTOMATIC DEEP SOUND CLASSIFICATION
5614INTERNET STREAMING AUDIO BASED SPEECH RECEPTION THRESHOLD MEASUREMENT IN COCHLEAR IMPLANT USERS
4975INTERPRETABLE IMAGE CLASSIFICATION USING SPARSE OBLIQUE DECISION TREES
2994INTERPRETING INTERMEDIATE CONVOLUTIONAL LAYERS IN UNSUPERVISED ACOUSTIC WORD CLASSIFICATION
3168INVERSE IMAGING WITH GENERATIVE PRIORS VIA LANGEVIN DYNAMICS
8412INVESTIGATING ROBUSTNESS OF BIOLOGICAL VS. BACKPROP BASED LEARNING
9078INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION
8766INVESTIGATING SEQUENCE-LEVEL NORMALISATION FOR CTC-LIKE END-TO-END ASR
5588INVESTIGATING THE POTENTIAL OF AUXILIARY-CLASSIFIER GANS FOR IMAGE CLASSIFICATION IN LOW DATA REGIMES
5054INVESTIGATION AND COMPARISON OF OPTIMIZATION METHODS FOR VARIATIONAL AUTOENCODER-BASED UNDERDETERMINED MULTICHANNEL SOURCE SEPARATION
5182INVESTIGATION OF ROBUSTNESS OF HUBERT FEATURES FROM DIFFERENT LAYERS TO DOMAIN, ACCENT AND LANGUAGE VARIATIONS
1606INVISIBLE AND EFFICIENT BACKDOOR ATTACKS FOR COMPRESSED DEEP NEURAL NETWORKS
4619IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION?
2331ISDA: POSITION-AWARE INSTANCE SEGMENTATION WITH DEFORMABLE ATTENTION
5268ISOMETRIC MT: NEURAL MACHINE TRANSLATION FOR AUTOMATIC DUBBING
4283ISTFTNET: FAST AND LIGHTWEIGHT MEL-SPECTROGRAM VOCODER INCORPORATING INVERSE SHORT-TIME FOURIER TRANSFORM
3640ITERATIVE CHANNEL ESTIMATION AND DATA DETECTION ALGORITHM FOR OTFS MODULATION
2508Iterative Learning for Distorted Image Restoration
2540Iterative Re-weighted Least Squares Algorithms for Non-negative Sparse and Group-sparse Recovery
1385ITERATIVE SELF KNOWLEDGE DISTILLATION --- FROM POTHOLE CLASSIFICATION TO FINE-GRAINED AND COVID RECOGNITION
1780ITOWAVE: ITO STOCHASTIC DIFFERENTIAL EQUATION IS ALL YOU NEED FOR WAVE GENERATION
2119JE2Net: Joint Exploitation and Exploration in Reinforcement Learning Based Image Restoration
1670JMPNET: JOINT MOTION PREDICTION FOR LEARNING-BASED VIDEO COMPRESSION
5193JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS
1220JOINT BEAM SELECTION AND PRECODING BASED ON DIFFERENTIAL EVOLUTION FOR MILLIMETER-WAVE MASSIVE MIMO SYSTEMS
3791Joint calibration and mapping of satellite altimetry data using trainable variational models
2749JOINT CENTRALITY ESTIMATION AND GRAPH IDENTIFICATION FROM MIXTURE OF LOW PASS GRAPH SIGNALS
1324JOINT DUAL-DOMAIN MATRIX FACTORIZATION FOR ECG BIOMETRIC RECOGNITION
1460JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS
2414Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index
5079Joint Global-Local alignment for domain adaptive semantic segmentation
1910JOINT HYPOGLYCEMIA PREDICTION AND GLUCOSE FORECASTING VIA DEEP MULTI-TASK LEARNING
3198JOINT INFERENCE OF MULTIPLE GRAPHS WITH HIDDEN VARIABLES FROM STATIONARY GRAPH SIGNALS
7004JOINT LEARNING FOR ADDRESSEE SELECTION AND RESPONSE GENERATION IN MULTI-PARTY CONVERSATION
8931JOINT LEARNING OF FEATURE EXTRACTION AND COST AGGREGATION FOR SEMANTIC CORRESPONDENCE
1095Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement
3814JOINT MODEL ORDER ESTIMATION FOR MULTIPLE TENSORS WITH A COUPLED MODE AND APPLICATIONS TO THE JOINT DECOMPOSITION OF EEG, MEG MAGNETOMETER, AND GRADIOMETER TENSORS
4764JOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATION
1993JOINT MULTIPLE INTENT DETECTION AND SLOT FILLING VIA SELF-DISTILLATION
3644Joint Normality Test via Two-dimensional Projection
4109JOINT RADAR-COMMUNICATIONS PROCESSING FROM A DUAL-BLIND DECONVOLUTION PERSPECTIVE
4992JOINT SOURCE LOCALIZATION AND ASSOCIATION THROUGH OVERCOMPLETE REPRESENTATION UNDER MULTIPATH PROPAGATION ENVIRONMENT
9275Joint Source-Channel Coding for Semantics-Aware Grant-Free Radio Access in IoT Fog Networks
1991JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING
5563JOINT TEMPORAL CONVOLUTIONAL NETWORKS AND ADVERSARIAL DISCRIMINATIVE DOMAIN ADAPTATION FOR EEG-BASED CROSS-SUBJECT EMOTION RECOGNITION
6203JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR
3422KARASINGER: SCORE-FREE SINGING VOICE SYNTHESIS WITH VQ-VAE USING MEL-SPECTROGRAMS
2446K-Converter: An unsupervised Singing Voice Conversion System
1278KERNEL ESTIMATION NETWORK FOR BLIND SUPER-RESOLUTION
1841KEY-SPARSE TRANSFORMER FOR MULTIMODAL SPEECH EMOTION RECOGNITION
9308Kinship Verification Based on Cross-Generation Feature Interaction Learning
2534Knowledge Augmented BERT Mutual Network in Multi-turn Spoken Dialogues
7593KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS
1850KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH
2719KNOWLEDGE TRANSFER FROM LARGE-SCALE PRETRAINED LANGUAGE MODELS TO END-TO-END SPEECH RECOGNIZERS
9287KRYLOV-LEVENBERG-MARQUARDT ALGORITHM FOR STRUCTURED TUCKER TENSOR DECOMPOSITIONS
5206LABEL PROPAGATION ACROSS GRAPHS: NODE CLASSIFICATION USING GRAPH NEURAL TANGENT KERNELS
4203LABEL-AWARE RANKED LOSS FOR ROBUST PEOPLE COUNTING USING AUTOMOTIVE IN-CABIN RADAR
2269LABEL-OCCURRENCE-BALANCED MIXUP FOR LONG-TAILED RECOGNITION
1773LANGUAGE ADAPTIVE CROSS-LINGUAL SPEECH REPRESENTATION LEARNING WITH SPARSE SHARING SUB-NETWORKS
1909LARGE-SCALE ASR DOMAIN ADAPTATION USING SELF- AND SEMI-SUPERVISED LEARNING
1587LARGE-SCALE INDEPENDENT COMPONENT ANALYSIS BY SPEEDING UP LIE GROUP TECHNIQUES
4276LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION
5526LATENT SPACE SLICING FOR ENHANCED ENTROPY MODELING IN LEARNING-BASED POINT CLOUD GEOMETRY COMPRESSION
2973LATTENTION: LATTICE-ATTENTION IN ASR RESCORING
2325LATTICE RESCORING BASED ON LARGE ENSEMBLE OF COMPLEMENTARY NEURAL LANGUAGE MODELS
8432LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION
3172LDNET: UNIFIED LISTENER DEPENDENT MODELING IN MOS PREDICTION FOR SYNTHETIC SPEECH
4099Learnable Hypergraph Laplacian for Hypergraph Learning
4479LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION
6265Learnable Wavelet Packet Transform for Data-Adapted Spectrograms
5008LEARNED ACOUSTIC RECONSTRUCTION USING SYNTHETIC APERTURE FOCUSING
3653LEARNING ACOUSTIC FRAME LABELING FOR PHONEME SEGMENTATION WITH REGULARIZED ATTENTION MECHANISM
2710LEARNING ADJUSTABLE IMAGE RESCALING WITH JOINT OPTIMIZATION OF PERCEPTION AND DISTORTION
4900LEARNING APPROACH FOR FAST APPROXIMATE MATRIX FACTORIZATIONS
5602Learning Common Dependency Structure for Unsupervised Cross-Domain NER
7049LEARNING CONTINUOUS REPRESENTATION OF AUDIO FOR ARBITRARY SCALE SUPER RESOLUTION
3850LEARNING CORRELATION FOR ONLINE MULTIPLE OBJECT TRACKING
3924LEARNING DECOUPLING FEATURES THROUGH ORTHOGONALITY REGULARIZATION
6015LEARNING DEEP PATHOLOGICAL FEATURES FOR WSI-LEVEL CERVICAL CANCER GRADING
3867LEARNING DOMAIN-INVARIANT TRANSFORMATION FOR SPEAKER VERIFICATION
2437Learning Expanding Graphs for Signal Interpolation
8610LEARNING FILTERBANKS FOR END-TO-END ACOUSTIC BEAMFORMING
3215LEARNING GAUSSIAN GRAPHICAL MODELS WITH DIFFERING PAIRWISE SAMPLE SIZES
5106LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
3776LEARNING MONOCULAR MESH RECOVERY OF MULTIPLE BODY PARTS VIA SYNTHETICS
5230LEARNING MULTIPLE EXPLAINABLE AND GENERALIZABLE CUES FOR FACE ANTI-SPOOFING
4530LEARNING MUSIC AUDIO REPRESENTATIONS VIA WEAK LANGUAGE SUPERVISION
1401Learning Music Sequence Representation from Text Supervision
4359LEARNING SEMANTIC-ALIGNED FEATURE REPRESENTATION FOR TEXT-BASED PERSON SEARCH
4727LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES
4483LEARNING SPARSE GRAPHS WITH A CORE-PERIPHERY STRUCTURE
5459LEARNING STRUCTURED SPARSITY FOR TIME-FREQUENCY RECONSTRUCTION
4674LEARNING SUBJECT-INVARIANT REPRESENTATIONS FROM SPEECH-EVOKED EEG USING VARIATIONAL AUTOENCODERS
1948LEARNING TASK-SPECIFIC REPRESENTATION FOR VIDEO ANOMALY DETECTION WITH SPATIAL-TEMPORAL ATTENTION
2634LEARNING TO ENHANCE OR NOT: NEURAL NETWORK-BASED SWITCHING OF ENHANCED AND OBSERVED SIGNALS FOR OVERLAPPING SPEECH RECOGNITION
2801LEARNING TO FUSE HETEROGENEOUS FEATURES FOR LOW-LIGHT IMAGE ENHANCEMENT
5629LEARNING TO INTEGRATE VISION DATA INTO ROAD NETWORK DATA
2834LEARNING TO PREDICT SPEECH IN SILENT VIDEOS VIA AUDIOVISUAL ANALOGY
1835LEARNING TO SAMPLE FOR SPARSE SIGNALS
9298LEARNING YOUR HEART ACTIONS FROM PULSE: ECG WAVEFORM RECONSTRUCTION FROM PPG
1615Learning-aided initialization for variational Bayesian DOA estimation
2798LEARNING-BASED PERSONAL SPEECH ENHANCEMENT FOR TELECONFERENCING BY EXPLOITING SPATIAL-SPECTRAL FEATURES
7953LEARNING-BASED RESOURCE ALLOCATION WITH DYNAMIC DATA RATE CONSTRAINTS
4632LEARNINGS FROM FEDERATED LEARNING IN THE REAL WORLD
9119LERPS: LIGHTING ESTIMATION AND RELIGHTING FOR PHOTOMETRIC STEREO
1336LETR: A LIGHTWEIGHT AND EFFICIENT TRANSFORMER FOR KEYWORD SPOTTING
1943LEVERAGING BILINEAR ATTENTION TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING
3194LEVERAGING LOCAL TEMPORAL INFORMATION FOR MULTIMODAL SCENE CLASSIFICATION
3437Leveraging Sparse Coding for EEG Based Emotion Recognition in Shooting
3032LIGHTPOSE: A LIGHTWEIGHT AND EFFICIENT MODEL WITH TRANSFORMER FOR HUMAN POSE ESTIMATION
4683Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition
4545LINEAR-TIME SAMPLING ON SIGNED GRAPHS VIA GERSHGORIN DISC PERFECT ALIGNMENT
1422LIPREADING MODEL BASED ON WHOLE-PART COLLABORATIVE LEARNING
5180LISTEN, KNOW AND SPELL: KNOWLEDGE-INFUSED SUBWORD MODELING FOR IMPROVING ASR PERFORMANCE OF OOV NAMED ENTITIES
3420LiteHAR: LIGHTWEIGHT HUMAN ACTIVITY RECOGNITION FROM WIFI SIGNALS WITH RANDOM CONVOLUTION KERNELS
3024LMS AND NLMS ALGORITHMS FOR THE IDENTIFICATION OF IMPULSE RESPONSES WITH INTRINSIC SYMMETRIC OR ANTISYMMETRIC PROPERTIES
2270LOCAL AND GLOBAL ALIGNMENTS FOR GENERALIZABLE SENSOR-BASED HUMAN ACTIVITY RECOGNITION
2126LOCAL CONTEXT INTERACTION-AWARE GLYPH-VECTORS FOR CHINESE SEQUENCE TAGGING
4076LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
2681LOCAL-GLOBAL FEATURE AGGREGATION FOR LIGHT FIELD IMAGE SUPER-RESOLUTION
3186LOCALIZATION BASED SEQUENTIAL GROUPING FOR CONTINUOUS SPEECH SEPARATION
9147LOCALIZING MORE SOURCES THAN SENSORS IN PRESENCE OF COHERENT SOURCES
5387Locate This, Not That: Class-Conditioned Sound Event DOA Estimation
2626LOCATION-BASED TRAINING FOR MULTI-CHANNEL TALKER-INDEPENDENT SPEAKER SEPARATION
3216LOCUNET: FAST URBAN POSITIONING USING RADIO MAPS AND DEEP LEARNING
1985LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION
3120LOW COMPLEX ACCURATE MULTI-SOURCE RTF ESTIMATION
8901LOW COMPLEXITY EQUALIZATION FOR AFDM IN DOUBLY DISPERSIVE CHANNELS
4817Low Precision Local Learning for Hardware-friendly Neuromorphic Visual Recognition
4323LOW RESOURCES ONLINE SINGLE-MICROPHONE SPEECH ENHANCEMENT WITH HARMONIC EMPHASIS
2463LOW-COMPLEXITY ATTENTION MODELLING VIA GRAPH TENSOR NETWORKS
2116LOW-COMPLEXITY MULTI-MODEL CNN IN-LOOP FILTER FOR AVS3
4306LOW-LATENCY HUMAN-COMPUTER AUDITORY INTERFACE BASED ON REAL-TIME VISION ANALYSIS
1592LOW-LIGHT IMAGE ENHANCEMENT VIA FEATURE RESTORATION
3141LOW-RANK PHASE RETRIEVAL WITH STRUCTURED TENSOR MODELS
4713LPC AUGMENT: AN LPC-BASED ASR DATA AUGMENTATION ALGORITHM FOR LOW AND ZERO-RESOURCE CHILDREN’S DIALECTS
4055LRPD: LARGE REPLAY PARALLEL DATASET
3090L-SpEx: Localized Target Speaker Extraction
3526M2MeT: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE
9242MACRO: Multi-Attention Convolutional Recurrent Model for Subject-Independent ERP Detection
5662MAG+: AN EXTENDED MULTIMODAL ADAPTATION GATE FOR MULTIMODAL SENTIMENT ANALYSIS
4774MAGIC DUST FOR CROSS-LINGUAL ADAPTATION OF MONOLINGUAL WAV2VEC-2.0
1548MAKD:MULTIPLE AUXILIARY KNOWLEDGE DISTILLATION
5202MAKING THE UNKNOWN MORE CERTAIN: A STACKED ENSEMBLE CLASSIFIER FOR OPEN GESTURE RECOGNITION WITH A SOCIAL ROBOT
3787MA-NET: MULTI-SCALE ATTENTION-AWARE NETWORK FOR OPTICAL FLOW ESTIMATION
3099MANIFOLD LEARNING-SUPPORTED ESTIMATION OF RELATIVE TRANSFER FUNCTIONS FOR SPATIAL FILTERING
1063MANNER: MULTI-VIEW ATTENTION NETWORK FOR NOISE ERASURE
5279MANNET: A LARGE-SCALE MANIPULATED IMAGE DETECTION DATASET AND BASELINE EVALUATIONS
5657MAP: MULTISPECTRAL ADVERSARIAL PATCH TO ATTACK PERSON DETECTION
1709MASK-BASED ATTENTION PARALLEL NETWORK FOR IN-THE-WILD FACIAL EXPRESSION RECOGNITION
8427MASKED ACOUSTIC UNIT FOR MISPRONUNCIATION DETECTION AND CORRECTION
5061MASSIVE UNSOURCED RANDOM ACCESS BASED ON BILINEAR VECTOR APPROXIMATE MESSAGE PASSING
8995MASSIVELY MULTILINGUAL ASR: A LIFELONG LEARNING SOLUTION
9281MATCHED MANIFOLD DETECTION FOR GROUP-INVARIANT REGISTRATION AND CLASSIFICATION OF IMAGES
1595Matching Point Sets with Quantum Circuit Learning
5653MATERIAL-GUIDED SIAMESE FUSION NETWORK FOR HYPERSPECTRAL OBJECT TRACKING
4811MATRIX DECOMPOSITION ON GRAPHS: A SIMPLIFIED FUNCTIONAL VIEW
5164MAXIMIZING AUDIO EVENT DETECTION MODEL PERFORMANCE ON SMALL DATASETS THROUGH KNOWLEDGE TRANSFER, DATA AUGMENTATION, AND PRETRAINING: AN ABLATION STUDY
2637MAXIMUM BATCH FROBENIUS NORM FOR MULTI-DOMAIN TEXT CLASSIFICATION
9251MAXIMUM LIKELIHOOD SENSOR ARRAY CALIBRATION USING NON-APPROXIMATE HESSION MATRIX
9090MBA-RainGAN: A Multi-branch Attention Generative Adversarial Network for Mixture of Rain Removal
1200MBNET: A MULTI-RESOLUTION BRANCH NETWORK FOR SEMANTIC SEGMENTATION OF ULTRA-HIGH RESOLUTION IMAGES
5468MEJIGCLU: MORE EFFECTIVE JIGSAW CLUSTERING FOR UNSUPERVISED VISUAL REPRESENTATION LEARNING
4101MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH
3251MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition
2303Memory in Echo State Networks and the Controllability Matrix rank
4502MEMORY-BASED MESSAGE PASSING: DECOUPLING THE MESSAGE FOR PROPAGATION FROM DISCRIMINATION
1651Message Passing-based Cooperative Localization with embedded Particle Flow
1454META TALK: LEARNING TO DATA-EFFICIENTLY GENERATE AUDIO-DRIVEN LIP-SYNCHRONIZED TALKING FACE WITH HIGH DEFINITION
3783MetricBERT: Text Representation Learning via Self-Supervised Triplet Training
2867METRICGAN-U: UNSUPERVISED SPEECH ENHANCEMENT/ DEREVERBERATION BASED ONLY ON NOISY/ REVERBERATED SPEECH
3973MFA: TDNN WITH MULTI-SCALE FREQUENCY-CHANNEL ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION WITH SHORT UTTERANCES
8913MIMO Detection by Variational Posterior Inference
1627MINIMIZING RESIDUALS FOR NATIVE-NONNATIVE VOICE CONVERSION IN A SPARSE, ANCHOR-BASED REPRESENTATION OF SPEECH
8996MINIMUM WORD ERROR TRAINING FOR NON-AUTOREGRESSIVE TRANSFORMER-BASED CODE-SWITCHING ASR
4174MINING HARD SAMPLES LOCALLY AND GLOBALLY FOR IMPROVED SPEECH SEPARATION
2557MISMATCHED SUPERVISED LEARNING
1952Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition
1904MIXED IN TIME AND MODALITY: CURSE OR BLESSING? CROSS-INSTANCE DATA AUGMENTATION FOR WEAKLY SUPERVISED MULTIMODAL TEMPORAL FUSION
5465MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING
2919MIXED PRECISION DNN QUANTIZATION FOR OVERLAPPED SPEECH SEPARATION AND RECOGNITION
5270MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION
4242MIXER-TTS: NON-AUTOREGRESSIVE, FAST AND COMPACT TEXT-TO-SPEECH MODEL CONDITIONED ON LANGUAGE MODEL EMBEDDINGS
4887MIXTURE MODEL AUTO-ENCODERS: DEEP CLUSTERING THROUGH DICTIONARY LEARNING
3994MLP-SVNET : A MULTI-LAYER PERCEPTRONS BASED NETWORK FOR SPEAKER VERIFICATION
4078MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations
4506MMLATCH: BOTTOM-UP TOP-DOWN FUSION FOR MULTIMODAL SENTIMENT ANALYSIS
8832MODEL SELECTION VIA MISSPECIFIED CRAMER-RAO BOUND MINIMIZATION
1229MODEL-BASED APPROACH FOR MEASURING THE FAIRNESS IN ASR
2978MODEL-BASED ONLINE LEARNING FOR RESOURCE SHARING IN JOINT RADAR-COMMUNICATION SYSTEMS
4684MODEL-BASED RECONSTRUCTION FOR COLLIMATED BEAM ULTRASOUND SYSTEMS
3454MODELING BEATS AND DOWNBEATS WITH A TIME-FREQUENCY TRANSFORMER
6422MODELING HUMAN MEMORY IN MULTI-OBJECT TRACKING WITH TRANSFORMERS
2539Modeling Intention, Emotion and External World in Dialogue Systems
3104MODELING OF PRE-TRAINED NEURAL NETWORK EMBEDDINGS LEARNED FROM RAW WAVEFORM FOR COVID-19 INFECTION DETECTION
3335Modeling the Detection Capability of High-Speed Spiking Cameras
1397MODERNN: TOWARDS FINE-GRAINED MOTION DETAILS FOR SPATIOTEMPORAL PREDICTIVE LEARNING
9154MODULO EVENT-DRIVEN SAMPLING: SYSTEM IDENTIFICATION AND HARDWARE EXPERIMENTS
1084Monocular Vehicle 3D Bounding Box Estimation Using Homograhy and Geometry in Traffic Scene
4799Monotonic Generalized Nash Games with Application to the Management of Energy-Aware Aloha Networks
3977MOS Predictor for Synthetic Speech with I-vector Inputs
2079MOTIF-TOPOLOGY AND REWARD-LEARNING IMPROVED SPIKING NEURAL NETWORK FOR EFFICIENT MULTI-SENSORY INTEGRATION
9254Moving Source Localization in Passive Sensor Network With Location Uncertainty
4693MRI RECOVERY WITH A SELF-CALIBRATED DENOISER
1578MSDTRON: A HIGH-CAPABILITY MULTI-SPEAKER SPEECH SYNTHESIS SYSTEM FOR DIVERSE DATA USING CHARACTERISTIC INFORMATION
5989MS-ROCANET: MULTI-SCALE RESIDUAL ORTHOGONAL-CHANNEL ATTENTION NETWORK FOR SCENE TEXT DETECTION
8471MTAF: SHOPPING GUIDE MICRO-VIDEOS POPULARITY PREDICTION USING MULTIMODAL AND TEMPORAL ATTENTION FUSION APPROACH
4063MULTI-ACCDOA: LOCALIZING AND DETECTING OVERLAPPING SOUNDS FROM THE SAME CLASS WITH AUXILIARY DUPLICATING PERMUTATION INVARIANT TRAINING
3823MULTIBAND IMAGE FUSION WITH CONTROLLABLE ERROR GUARANTEES
1440MULTI-CHANNEL ATTENTIVE GRAPH CONVOLUTIONAL NETWORK WITH SENTIMENT FUSION FOR MULTIMODAL SENTIMENT ANALYSIS
1718MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES
5309MULTI-CHANNEL MULTI-SPEAKER ASR USING 3D SPATIAL FEATURE
2406MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
7414MULTICHANNEL NOISE REDUCTION USING DILATED MULTICHANNEL U-NET AND PRE-TRAINED SINGLE-CHANNEL NETWORK
1832MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS
4593MULTI-CHANNEL SPEAKER VERIFICATION WITH CONV-TASNET BASED BEAMFORMER
4723MULTI-CHANNEL SPEECH DENOISING FOR MACHINE EARS
5674Multichannel Speech Enhancement without Beamforming
1684MULTI-DOMAIN UNPAIRED ULTRASOUND IMAGE ARTIFACT REMOVAL USING A SINGLE CONVOLUTIONAL NEURAL NETWORK
5719MULTI-DOMAIN UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION WITH APPEARANCE ADAPTIVE CONVOLUTION
4403MULTI-FEATURE INTEGRATION FOR SPEAKER EMBEDDING EXTRACTION
3852Multi-Focus Guided Semantic Aggregation for Video Object Detection
2392MULTI-FRAME FULL-RANK SPATIAL COVARIANCE ANALYSIS FOR UNDERDETERMINED BSS IN REVERBERANT ENVIRONMENTS
3972Multi-frame super-resolution with raw images via modified deformable convolution
5178Multi-Head ReLU Implicit Neural Representation Networks
2384MULTI-HIERARCHY PROXY STRUCTURE FOR DEEP METRIC LEARNING
2102MULTI-LEVEL CONTRASTIVE LEARNING FOR CROSS-LINGUAL ALIGNMENT
5353MULTI-LEVEL RELATION AWARE NETWORK FOR PERSON RE-IDENTIFICATION
3005MULTI-LEVEL SPATIAL-TEMPORAL ADAPTATION NETWORK FOR MOTOR IMAGERY CLASSIFICATION
3413MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0
4823MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
5014MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS
2091MULTI-MODAL ACOUSTIC-ARTICULATORY FEATURE FUSION FOR DYSARTHRIC SPEECH RECOGNITION
9321MULTIMODAL DATA FUSION IN HIGH-DIMENSIONAL HETEROGENEOUS DATASETS VIA GENERATIVE MODELS
3649MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS
3721MULTI-MODAL EMOTION RECOGNITION WITH SELF-GUIDED MODALITY CALIBRATION
8858MULTIMODAL EMOTION RECOGNITION WITH SURGICAL AND FABRIC MASKS
8835MULTIMODAL EVALUATION METHOD FOR SOUND EVENT DETECTION
5999MULTIMODAL GRAPH SIGNAL DENOISING VIA TWOFOLD GRAPH SMOOTHNESS REGULARIZATION WITH DEEP ALGORITHM UNROLLING
6085MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA
4449MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
5352MULTI-MODAL RECURRENT FUSION FOR INDOOR LOCALIZATION
4952Multimodal Sentiment Analysis on Unaligned Sequences via Holographic Embedding
5937Multimodal Transformer With Learnable Frontend and Self Attention for Emotion Recognition
4941MULTIPLE INSTANCE LEARNING WITH TASK-SPECIFIC MULTI-LEVEL FEATURES FOR WEAKLY ANNOTATED HISTOPATHOLOGICAL IMAGE CLASSIFICATION
1565MULTIPLE KERNEL K-MEANS CLUSTERING WITH SIMULTANEOUS SPECTRAL ROTATION
4566MULTIPLE OFFSETS MULTILATERATION: A NEW PARADIGM FOR SENSOR NETWORK CALIBRATION WITH UNSYNCHRONIZED REFERENCE NODES
2812MULTIPLE PATCH-AWARE NETWORK FOR FASTER REAL-WORLD IMAGE DEHAZING
2907MULTIPLE TEMPORAL CONTEXT EMBEDDING NETWORKS FOR UNSUPERVISED TIME SERIES ANOMALY DETECTION
4890Multiplication-Avoiding Variant of Power Iteration with Applications
2762MULTI-POSE VIRTUAL TRY-ON VIA SELF-ADAPTIVE FEATURE FILTERING
6973MULTI-QUERY MULTI-HEAD ATTENTION POOLING AND INTER-TOPK PENALTY FOR SPEAKER VERIFICATION
4548MULTI-RELATION MESSAGE PASSING FOR MULTI-LABEL TEXT CLASSIFICATION
2442MULTI-ROLE EVENT ARGUMENT EXTRACTION AS MACHINE READING COMPREHENSION WITH ARGUMENT MATCH OPTIMIZATION
2108MULTI-SAMPLE SUBBAND WAVERNN VIA MULTIVARIATE GAUSSIAN
3426Multiscale attention aggregation network for 2D vessel segmentation
3920MULTISCALE CROWD COUNTING AND LOCALIZATION BY MULTITASK POINT SUPERVISION
2421MULTI-SCALE REINFORCEMENT LEARNING STRATEGY FOR OBJECT DETECTION
3759MULTI-SCALE SPEAKER EMBEDDING-BASED GRAPH ATTENTION NETWORKS FOR SPEAKER DIARISATION
9286Multi-Sensor Network Information for Linear-Gaussian Multi-Target Tracking Systems
1511MULTI-SPEAKER PITCH TRACKING VIA EMBODIED SELF-SUPERVISED LEARNING
1691MULTI-STAGE GRAPH REPRESENTATION LEARNING FOR DIALOGUE-LEVEL SPEECH EMOTION RECOGNITION
5745MULTISTREAM NEURAL ARCHITECTURES FOR CUED SPEECH RECOGNITION USING A PRE-TRAINED VISUAL FEATURE EXTRACTOR AND CONSTRAINED CTC DECODING
8061MULTISV: DATASET FOR FAR-FIELD MULTI-CHANNEL SPEAKER VERIFICATION
4388MULTI-TASK FMRI DATA FUSION USING IVA AND PARAFAC2
4648MULTI-TASK GAUSSIAN PROCESS REGRESSION FOR THE DETECTION OF SLEEP CYCLES IN PREMATURE INFANTS
1760Multitask Gaussian Process with Hierarchical Latent Interactions
8778MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION
1183MULTI-TASK LEARNING IMPROVES THE BRAIN STOKE LESION SEGMENTATION
2335MULTI-TASK RNN-T WITH SEMANTIC DECODER FOR STREAMABLE SPOKEN LANGUAGE UNDERSTANDING
1994MULTITASK SPARSE NEURAL NETWORK FOR HYPERSPECTRAL IMAGE DENOISING
2161MULTI-TASK VOICE ACTIVATED FRAMEWORK USING SELF-SUPERVISED LEARNING
8709MULTI-TURN INCOMPLETE UTTERANCE RESTORATION AS OBJECT DETECTION
2596MULTI-TURN RNN-T FOR STREAMING RECOGNITION OF MULTI-PARTY SPEECH
4332MULTIVARIATE MULTISCALE COSINE SIMILARITY ENTROPY
8466MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION
1682MULTI-VIEW DATA REPRESENTATION VIA DEEP AUTOENCODER-LIKE NONNEGATIVE MATRIX FACTORIZATION
2837MULTI-VIEW INFORMATION BOTTLENECK WITHOUT VARIATIONAL APPROXIMATION
5980MULTI-VIEW LEARNING BASED ON NON-REDUNDANT FUSION FOR ICU PATIENT MORTALITY PREDICTION
1483MULTIVIEW LONG-SHORT SPATIAL CONTRASTIVE LEARNING FOR 3D MEDICAL IMAGE ANALYSIS
4352MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
5139MUSIC ENHANCEMENT VIA IMAGE TRANSLATION AND VOCODING
9011Music Identification Using brain responses to Initial Snippets
2168MUSIC PHRASE INPAINTING USING LONG-TERM REPRESENTATION AND CONTRASTIVE LOSS
4304MUSIC SOURCE SEPARATION WITH DEEP EQUILIBRIUM MODELS
1352MUSICYOLO: A SIGHT-SINGING ONSET/OFFSET DETECTION FRAMEWORK BASED ON OBJECT DETECTION INSTEAD OF SPECTRUM FRAMES
4328Natural-looking Adversarial Examples from Freehand Sketches
1189NAVIGATING AUDIO-VISUAL EVENT DETECTION ACROSS MISMATCHED MODALITIES
3296NEAREST SUBSPACE SEARCH IN THE SIGNED CUMULATIVE DISTRIBUTION TRANSFORM SPACE FOR 1D SIGNAL CLASSIFICATION
9288Near-field Tracking with Large Antenna Arrays: Fundamental Limits and Practical Algorithms
1893NEARTRACKER: ACOUSTIC 2-D TARGET TRACKING WITH NEARBY REFLECTOR IN SISO SYSTEM
1675NEIGHBOR-AUGMENTED TRANSFORMER-BASED EMBEDDING FOR RETRIEVAL
2194NEUFA: NEURAL NETWORK BASED END-TO-END FORCED ALIGNMENT WITH BIDIRECTIONAL ATTENTION MECHANISM
3179Neural Architecture Search for Speech Emotion Recognition
4569NEURAL AUDIO-TO-SCORE MUSIC TRANSCRIPTION FOR UNCONSTRAINED POLYPHONY USING COMPACT OUTPUT REPRESENTATIONS
4551NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION
4758Neural Collapse in Deep Homogeneous Classifiers and the role of Weight Decay
9302NEURAL FULL-RANK SPATIAL COVARIANCE ANALYSIS FOR BLIND SOURCE SEPARATION
3826NEURAL GRAPHEME-TO-PHONEME CONVERSION WITH PRE-TRAINED GRAPHEME MODELS
1031NEURAL HMMS ARE ALL YOU NEED (FOR HIGH-QUALITY ATTENTION-FREE TTS)
5748NEURAL NETWORK-BASED COMPRESSION FRAMEWORK FOR DOA ESTIMATION EXPLOITING DISTRIBUTED ARRAY
4600Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet
5184NEURAL-FST CLASS LANGUAGE MODEL FOR END-TO-END SPEECH RECOGNITION
1728NEW IMPROVED CRITERION FOR MODEL SELECTION IN SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION MODELS
5690NEWS RECOMMENDATION VIA MULTI-INTEREST NEWS SEQUENCE MODELLING
8951NEX+: NOVEL VIEW SYNTHESIS WITH NEURAL REGULARISATION OVER MULTI-PLANE IMAGES
8773NFT-K: NON-FUNGIBLE TANGENT KERNELS
1745NN3A: NEURAL NETWORK SUPPORTED ACOUSTIC ECHO CANCELLATION, NOISE SUPPRESSION AND AUTOMATIC GAIN CONTROL FOR REAL-TIME COMMUNICATIONS
5059nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech
8841No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds
1882NODE SLICING BROAD LEARNING SYSTEM FOR TEXT CLASSIFICATION
2482NODE-SCREENING TESTS FOR THE L0-PENALIZED LEAST-SQUARES PROBLEM
1234NOISE SUPPRESSION FOR IMPROVED FEW-SHOT LEARNING
4340NOISE-ROBUST SPEECH RECOGNITION WITH 10 MINUTES UNPARALLELED IN-DOMAIN DATA
6103NON-AUTOREGRESSIVE ASR WITH SELF-CONDITIONED FOLDED ENCODERS
5005NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING
1114NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION
9235NON-BAYESIAN ESTIMATION FRAMEWORK FOR SIGNAL RECOVERY ON GRAPHS
8639NON-INVASIVE BLOOD PRESSURE MONITORING WITH MULTI-MODAL IN-EAR SENSING
9000Nonlinear signal decomposition based on block sparse approximation
2839NON-RIGID TRANSFORMATION BASED ADVERSARIAL ATTACK AGAINST 3D OBJECT TRACKING
4472NONVERBAL SOUND DETECTION FOR DISORDERED SPEECH
2350NO-REFERENCE QUALITY ASSESSMENT OF VARIABLE FRAME-RATE VIDEOS USING TEMPORAL BANDPASS STATISTICS
4780NOT ALL FEATURES ARE EQUAL: SELECTION OF ROBUST FEATURES FOR SPEECH EMOTION RECOGNITION IN NOISY ENVIRONMENTS
5707Novel Class Discovery: A Dependency Approach
4557NOVEL INSTANCE MINING WITH PSEUDO-MARGIN EVALUATION FOR FEW-SHOT OBJECT DETECTION
2020NVC-NET: END-TO-END ADVERSARIAL VOICE CONVERSION
4826OBJECT DETECTION AND TRACKING IN ULTRASOUND SCANS USING AN OPTICAL FLOW AND SEMANTIC SEGMENTATION FRAMEWORK BASED ON CONVOLUTIONAL NEURAL NETWORKS
3792OBJECT-ORIENTED BACKDOOR ATTACK AGAINST IMAGE CAPTIONING
3515OCCLUDED PERSON RE-IDENTIFICATION VIA RELATIONAL ADAPTIVE FEATURE CORRECTION LEARNING
2981OFF-THE-GRID COVARIANCE-BASED SUPER-RESOLUTION FLUCTUATION MICROSCOPY
2925OFF-THE-SHELF DEEP INTEGRATION FOR RESIDUAL-ECHO SUPPRESSION
5247OMNI-SPARSITY DNN: FAST SPARSITY OPTIMIZATION FOR ON-DEVICE STREAMING E2E ASR VIA SUPERNET
3234On Adversarial Robustness of Large-scale Audio Visual Learning
4875ON CONTINUOUS-DOMAIN INVERSE PROBLEMS WITH SPARSE SUPERPOSITIONS OF DECAYING SINUSOIDS AS SOLUTIONS
9314ON DATA AUGMENTATION FOR GAN TRAINING
3897ON FEDERATED LEARNING WITH ENERGY HARVESTING CLIENTS
2772ON IDENTIFIABLE POLYTOPE CHARACTERIZATION FOR POLYTOPIC MATRIX FACTORIZATION
4436On Language Model Integration for RNN Transducer based Speech Recognition
3636ON LOSS FUNCTIONS AND EVALUATION METRICS FOR MUSIC SOURCE SEPARATION
3720ON MINI-BATCH TRAINING WITH VARYING LENGTH TIME SERIES
2452ON SPECTRAL AND TEMPORAL SPARSIFICATION OF SPEECH SIGNALS FOR THE IMPROVEMENT OF SPEECH PERCEPTION IN CI LISTENERS
9256ON STABILITY AND CONVERGENCE OF DISTRIBUTED FILTERS
8839On Submodular Set Cover Problems For Near-Optimal Online Kernel Basis Selection
4381ON SYNCHRONIZATION OF WIRELESS ACOUSTIC SENSOR NETWORKS IN THE PRESENCE OF TIME-VARYING SAMPLING RATE OFFSETS AND SPEAKER CHANGES
2292ON THE ACQUISITION OF STATIONARY SIGNALS USING UNIFORM ADCS
4491ON THE CONVERGENCE OF ADAM-TYPE ALGORITHMS FOR SOLVING STRUCTURED SINGLE NODE AND DECENTRALIZED MIN-MAX SADDLE POINT GAMES
4650On the Effectiveness of Active Learning by Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data
2237On the false alarm probability of the Normalized Matched Filter for off-grid target detection
6132ON THE IMPACT OF NORMALIZATION STRATEGIES IN UNSUPERVISED ADVERSARIAL DOMAIN ADAPTATION FOR ACOUSTIC SCENE CLASSIFICATION
3902ON THE IMPORTANCE OF DIFFERENT FREQUENCY BINS FOR SPEAKER VERIFICATION
4652ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS
1122ON THE OBSERVABILITY IN VISUAL SLAM NETWORKS
1713ON THE POTENTIAL OF SPATIALLY-SPREAD ORTHOGONAL TIME FREQUENCY SPACE MODULATION FOR ISAC TRANSMISSIONS
4762ON THE PREDICTION OF THE FREQUENCY RESPONSE OF A WOODEN PLATE FROM ITS MECHANICAL PARAMETERS
1311ON THE RELAXATION OF ORTHOGONAL TENSOR RANK AND ITS NONCONVEX RIEMANNIAN OPTIMIZATION FOR TENSOR COMPLETION
9258ON THE SIZE AND REDUNDANCY OF THE FOURTH-ORDER DIFFERENCE CO-ARRAY
5208ON THE STABILITY OF LOW PASS GRAPH FILTER WITH A LARGE NUMBER OF EDGE REWIRES
3353ON THE USE OF COMPONENT STRUCTURAL CHARACTERISTICS FOR VOXEL SEGMENTATION IN SEMICON 3D IMAGES
2808ON THE USE OF GEODESIC TRIANGLES BETWEEN GAUSSIAN DISTRIBUTIONS FOR CLASSIFICATION PROBLEMS
1522ONE MODEL TO ENHANCE THEM ALL: ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT
3314ONE TTS ALIGNMENT TO RULE THEM ALL
9280ONE-CLASS LEARNING TOWARDS SYNTHETIC VOICE SPOOFING DETECTION
3954ONE-SHOT VOICE CONVERSION FOR STYLE TRANSFER BASED ON SPEAKER ADAPTATION
3969ONLINE CONTINUAL LEARNING USING ENHANCED RANDOM VECTOR FUNCTIONAL LINK NETWORKS
9030ONLINE DETECTION OF SCALP-INVISIBLE MESIAL-TEMPORAL BRAIN INTERICTAL EPILEPTIFORM DISCHARGES FROM EEG
2948ONLINE ECG BIOMETRICS VIA HADAMARD CODE
4725ONLINE LEARNING FOR LATENT YULE-SIMON PROCESSES
3457Online Learning with Probabilistic Feedback
9311ONLINE TRAINING OF STEREO SELF-CALIBRATION USING MONOCULAR DEPTH ESTIMATION
3143OPENFEAT: IMPROVING SPEAKER IDENTIFICATION BY OPEN-SET FEW-SHOT EMBEDDING ADAPTATION WITH TRANSFORMER
8813Operator Formulation for Linear Transformations and Signal Estimation in the Joint Spatial-Slepian Domain
2223OPTE: ONLINE PER-TITLE ENCODING FOR LIVE VIDEO STREAMING
1704OPTIMAL COMBINATION POLICIES FOR ADAPTIVE SOCIAL LEARNING
2730OPTIMAL QOS-AWARE NETWORK SLICING FOR SERVICE-ORIENTED NETWORKS WITH FLEXIBLE ROUTING
3176OPTIMAL RESOURCE ALLOCATION AND BEAMFORMING FOR TWO-USER MISO WPCNs FOR A NON-LINEAR CIRCUIT-BASED EH MODEL
4896Optimization Guarantees for ISTA and ADMM based Unfolded Networks
8975Optimization of a Fixed Virtual Sensing Feedback ANC Controller for In-Ear Headphones with Multiple Loudspeakers
8164Optimization of compressive light field display in dual-guided learning
5422OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN
4227OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING
2786Optimizing Latent Space Directions For GAN-based Local Image Editing
2533OPTIMIZING THE CONSUMPTION OF SPIKING NEURAL NETWORKS WITH ACTIVITY REGULARIZATION
4736OPTM3SEC: OPTIMIZING MULTICAST IRS-AIDED MULTIANTENNA DFRC SECRECY CHANNEL WITH MULTIPLE EAVESDROPPERS
2605ORCA-PARTY: AN AUTOMATIC KILLER WHALE SOUND TYPE SEPARATION TOOLKIT USING DEEP LEARNING
3427ORTHOGONAL NONNEGATIVE MATRIX TRI-FACTORIZATION FOR COMMUNITY DETECTION IN MULTIPLEX NETWORKS
2521OT CLEANER: LABEL CORRECTION AS OPTIMAL TRANSPORT
5245OUT-OF-DISTRIBUTION AS A TARGET CLASS IN SEMI-SUPERVISED LEARNING
5847OVER-PARAMETERIZED NETWORK SOLVES PHASE RETRIEVAL EFFECTIVELY
4888OVER-THE-AIR PERSONALIZED FEDERATED LEARNING
1544PAIR-LEVEL SUPERVISED CONTRASTIVE LEARNING FOR NATURAL LANGUAGE INFERENCE
1867PAMA-TTS: PROGRESSION-AWARE MONOTONIC ATTENTION FOR STABLE SEQ2SEQ TTS WITH ACCURATE PHONEME DURATION CONTROL
4006PANCHROMATIC IMAGERY COPY-PASTE LOCALIZATION THROUGH DATA-DRIVEN SENSOR ATTRIBUTION
1612Parallel Composition of Weighted Finite-State Transducers
3027Parameter Estimation in Sparse Inverse Problems using Bernoulli-Gaussian Prior
1047PARAMETER-FREE STYLE PROJECTION FOR ARBITRARY IMAGE STYLE TRANSFER
1526PARAMETRIC MODELING OF HUMAN WRIST FOR BIOIMPEDANCE-BASED PHYSIOLOGICAL SENSING
1664PARAMETRIC MODELS FOR DOA TRAJECTORY LOCALIZATION
6324PARTIAL ARITHMETIC CONSENSUS BASED DISTRIBUTED INTENSITY PARTICLE FLOW SMC-PHD FILTER FOR MULTI-TARGET TRACKING
4753PARTIAL VARIABLE TRAINING FOR EFFICIENT ON-DEVICE FEDERATED LEARNING
2577PARTIALLY RELAXED ORTHOGONAL LEAST SQUARES WEIGHTED SUBSPACE FITTING DIRECTION-OF-ARRIVAL ESTIMATION
2655PART-OF-SPEECH MODELS COMPRESSION METHODS FOR ON-DEVICE GRAPHEME-TO-PHONEME CONVERSION
1256PAS-MEF: MULTI-EXPOSURE IMAGE FUSION BASED ON PRINCIPAL COMPONENT ANALYSIS, ADAPTIVE WELL-EXPOSEDNESS AND SALIENCY MAP
4987PASSTRANS: AN IMPROVED PASSWORD REUSE MODEL BASED ON TRANSFORMER
1956PATCH STEGANALYSIS: A SAMPLING BASED DEFENSE AGAINST ADVERSARIAL STEGANOGRAPHY
1872PATH SIGNATURES FOR NON-INTRUSIVE LOAD MONITORING
1585PDD-NET: A PRECISE DEFECT DETECTION NETWORK BASED ON POINT SET REPRESENTATION
2809PEAR: Photographic Embedding for Aesthetic Rating
5873PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION
9234PERCEPTUAL-SIMILARITY-AWARE DEEP SPEAKER REPRESENTATION LEARNING FOR MULTI-SPEAKER GENERATIVE MODELING
1971PERFECT RECONSTRUCTION OF CLASSES OF NON-BANDLIMITED SIGNALS FROM PROJECTIONS WITH UNKNOWN ANGLES
3257Performance Optimization for Wireless Semantic Communications over Energy Harvesting Networks
1530Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition
3304PERSONALIZED AUTOMATIC SPEECH RECOGNITION TRAINED ON SMALL DISORDERED SPEECH DATASETS
9085Personalized PageRank Graph Attention Networks
2673PERSONALIZED SPEECH ENHANCEMENT: NEW MODELS AND COMPREHENSIVE EVALUATION
3288PGTRNET: TWO-PHASE WEAKLY SUPERVISED OBJECT DETECTION WITH PSEUDO GROUND TRUTH REFINEMENT
3664PHASE CONTINUITY: LEARNING DERIVATIVES OF PHASE SPECTRUM FOR SPEECH ENHANCEMENT
5548Phase Control of Parametric Array Loudspeaker by Optimizing Sideband Weights
4216PHASE SHIFTED BEDROSIAN FILTERBANK: AN INTERPRETABLE AUDIO FRONT-END FOR TIME-DOMAIN AUDIO SOURCE SEPARATION
4708Phase-Only Reconfigurable Sparse Array Beamforming using Deep Learning
2723PHONE-INFORMED REFINEMENT OF SYNTHESIZED MEL SPECTROGRAM FOR DATA AUGMENTATION IN SPEECH RECOGNITION
9292PHONEME LEVEL LYRICS ALIGNMENT AND TEXT-INFORMED SINGING VOICE SEPARATION
3723PHONEME MISPRONUNCIATION DETECTION BY JOINTLY LEARNING TO ALIGN
3232Phone-to-audio alignment without text: A Semi-supervised Approach
2007PHONOLOGY RECOGNITION IN AMERICAN SIGN LANGUAGE
5307PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE
4160PHOTON-LIMITED DEBLURRING USING ALGORITHM UNROLLING
1399PHYSICAL LAYER ANONYMOUS COMMUNICATIONS: AN ANONYMITY ENTROPY ORIENTED PRECODING DESIGN
4631PICKNET: REAL-TIME CHANNEL SELECTION FOR AD HOC MICROPHONE ARRAYS
9276PITCH ESTIMATION BY MULTIPLE OCTAVE DECODERS
2751PIXEL-LEVEL AND AFFINITY-LEVEL KNOWLEDGE DISTILLATION FOR UNSUPERVISED SEGMENTATION OF COVID-19 LESIONS
1731PIXINWAV: RESIDUAL STEGANOGRAPHY FOR HIDING PIXELS IN AUDIO
1312PLUG-AND-PLAY AND RELAY REGULARIZATIONS ON NOISY LOW RANK TENSOR COMPLETION FOR SNAPSHOT MULTISPECTRAL IMAGE RESTORATION
2894PMP-NET: RETHINKING VISUAL CONTEXT FOR SCENE GRAPH GENERATION
5359POINT CLOUD ATTRIBUTE COMPRESSION VIA CHROMA SUBSAMPLING
5572POINT CLOUD DENOISING USING NORMAL VECTOR-BASED GRAPH WAVELET SHRINKAGE
1958POINT-MASS FILTER WITH DECOMPOSITION OF TRANSIENT DENSITY
3440POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END
4431Polyphonic audio event detection: multi-label or multi-class multi-task classification problem?
1331POPO: PESSIMISTIC OFFLINE POLICY OPTIMIZATION
1538POSITION-INVARIANT ADVERSARIAL ATTACKS ON NEURAL MODULATION RECOGNITION
3582POSTGAN: A GAN-BASED POST-PROCESSOR TO ENHANCE THE QUALITY OF CODED SPEECH
4846Power allocation for wireless federated learning using graph neural networks
3827POWER-EFFICIENT HYBRID MIMO RECEIVER WITH TASK-SPECIFIC BEAMFORMING USING LOW-RESOLUTION ADCS
4256PREDICTING FLAT-FADING CHANNELS VIA META-LEARNED CLOSED-FORM LINEAR FILTERS AND EQUILIBRIUM PROPAGATION
2096PREDICTING HUMAN MOTION USING KEY SUBSEQUENCES
5251PREDICTING THE GENERALIZATION GAP IN DEEP MODELS USING ANCHORING
2633PRELIMINARY RESULTS ON THE GENERATION OF ARTIFICIAL HANDWRITING DATA USING A DECOMPOSITION-RECOMBINATION STRATEGY
9320PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks
8060PRESERVING TRAJECTORY PRIVACY IN DRIVING DATA RELEASE
5373PRIME KNOWLEDGE WITH LOCAL PATTERN CONSISTENCY FOR KNOWLEDGE DISTILLATION
3638PRIOR-BERT AND MULTI-TASK LEARNING FOR TARGET-ASPECT-SENTIMENT JOINT DETECTION
4690PRIVACY ATTACKS FOR AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS IN A FEDERATED LEARNING FRAMEWORK
2038PRIVACY PROTECTION IN LEARNING FAIR REPRESENTATIONS
2371PRIVACY SENSITIVE SPEECH ANALYSIS USING FEDERATED LEARNING TO ASSESS DEPRESSION
8929PRIVACY-AWARE COMMUNICATION OVER A WIRETAP CHANNEL WITH GENERATIVE NETWORKS
5960PRIVACY-ENHANCING APPLIANCE FILTERING FOR SMART METERS
2362PRIVACY-PRESERVING ACTION RECOGNITION
1974PRIVACY-PRESERVING DISTRIBUTED EXPECTATION MAXIMIZATION FOR GAUSSIAN MIXTURE MODEL USING SUBSPACE PERTURBATION
1318PRIVACY-PRESERVING FEDERATED MULTI-TASK LINEAR REGRESSION: A ONE-SHOT LINEAR MIXING APPROACH INSPIRED BY GRAPH REGULARIZATION
8961PRIVATE LEARNING VIA KNOWLEDGE TRANSFER WITH HIGH-DIMENSIONAL TARGETS
5513PROBABILISTIC FINE-GRAINED URBAN FLOW INFERENCE WITH NORMALIZING FLOWS
5234PROBABLY PLEASANT? A NEURAL-PROBABILISTIC APPROACH TO AUTOMATIC MASKER SELECTION FOR URBAN SOUNDSCAPE AUGMENTATION
2838PROGRESSIVE CONTINUAL LEARNING FOR SPOKEN KEYWORD SPOTTING
5126PROGRESSIVE IMAGE SUPER-RESOLUTION VIA NEURAL DIFFERENTIAL EQUATION
5460PROGRESSIVE MULTI-STAGE NEURAL AUDIO CODING WITH GUIDED REFERENCES
1638PROGRESSIVE TEACHER-STUDENT TRAINING FRAMEWORK FOR MUSIC TAGGING
1754Progressive-Granularity Retrieval via Hierarchical Feature Alignment for Person Re-Identification
4518PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH
1586ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
4243PROTOTYPE LEARNING FOR INTERPRETABLE RESPIRATORY SOUND ANALYSIS
4859PROTOTYPE-BASED INTER-CAMERA LEARNING FOR PERSON RE-IDENTIFICATION
5084PROVABLE SAMPLE COMPLEXITY GUARANTEES FOR LEARNING OF CONTINUOUS-ACTION GRAPHICAL GAMES WITH NONPARAMETRIC UTILITIES
1751Provable Second-order Riemannian Gauss-Newton Method for Low-rank Tensor Estimation
6676PROXIMAL-BASED ADAPTIVE SIMULATED ANNEALING FOR GLOBAL OPTIMIZATION
9324PRUNING BY TRAINING: A NOVEL DEEP NEURAL NETWORK COMPRESSION FRAMEWORK FOR IMAGE PROCESSING
1737PSEUDO STRONG LABELS FOR LARGE SCALE WEAKLY SUPERVISED AUDIO TAGGING
2003PSEUDO-INTERACTING GUIDED NETWORK FOR FEW-SHOT SEGMENTATION
5829PSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC
4873Pseudo-Labeling for Massively Multilingual Speech Recognition
9274PSLA: IMPROVING AUDIO TAGGING WITH PRETRAINING, SAMPLING, LABELING, AND AGGREGATION
9273PSYCHOACOUSTIC CALIBRATION OF LOSS FUNCTIONS FOR EFFICIENT END-TO-END NEURAL AUDIO CODING
7552PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
1337PU-REFINER: A GEOMETRY REFINER WITH ADVERSARIAL LEARNING FOR POINT CLOUD UPSAMPLING
3547PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION
4172PYRAMID FUSION ATTENTION NETWORK FOR SINGLE IMAGE SUPER-RESOLUTION
5506PYXIS: AN OPEN-SOURCE PERFORMANCE DATASET OF SPARSE ACCELERATORS
9180QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation
3295QRELATION: AN AGENT RELATION-BASED APPROACH FOR MULTI-AGENT REINFORCEMENT LEARNING VALUE FUNCTION FACTORIZATION
2261QUANTIFYING DISCRIMINABILITY BETWEEN NMF BASES
2457QUANTIZATION-AWARE PRECODING FOR MU-MIMO WITH LIMITED-CAPACITY FRONTHAUL
1100QUANTIZED WINOGRAD ACCELERATION FOR CONV1D EQUIPPED ASR MODELS ON MOBILE DEVICES
2300QUANTUM FEDERATED LEARNING WITH QUANTUM DATA
4001QUANTUM LONG SHORT-TERM MEMORY
4603QUICKEST DETECTION OF COMPOSITE AND NON-STATIONARY CHANGES WITH APPLICATION TO PANDEMIC MONITORING
9266RADAR TARGET DETECTION AIDED BY RECONFIGURABLE INTELLIGENT SURFACES
4159Randomized Smoothing Under Attack: How Good Is It In Practice?
2767RANGEINET: FAST LIDAR POINT CLOUD TEMPORAL INTERPOLATION
6369RANK-BASED LOSS FOR LEARNING HIERARCHICAL REPRESENTATIONS
4069RATE CODING OR DIRECT CODING: WHICH ONE IS BETTER FOR ACCURATE, ROBUST, AND ENERGY-EFFICIENT SPIKING NEURAL NETWORKS?
3346RATE CONTROL FOR LEARNED VIDEO COMPRESSION
1525RATIONAL ARRAYS FOR DOA ESTIMATION
5487RAW PLENOPTIC VIDEO CODING UNDER HEXAGONAL LATTICE RESOLUTION OF MOTION VECTORS
5599Raw source and filter modelling for dysarthric speech recognition
3103RAWBOOST: A RAW DATA BOOSTING AND AUGMENTATION METHOD APPLIED TO AUTOMATIC SPEAKER VERIFICATION ANTI-SPOOFING
3386RAWNEXT: SPEAKER VERIFICATION SYSTEM FOR VARIABLE-DURATION UTTERANCES WITH DEEP LAYER AGGREGATION AND EXTENDED DYNAMIC SCALING POLICIES
9289Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation
6885RCANET: ROW-COLUMN ATTENTION NETWORK FOR SEMANTIC SEGMENTATION
8885REAL ADDITIVE MARGIN SOFTMAX FOR SPEAKER VERIFICATION
2218REALISTIC MONOCULAR-TO-3D VIRTUAL TRY-ON VIA MULTI-SCALE CHARACTERISTICS CAPTURE
2324REAL-M: TOWARDS SPEECH SEPARATION ON REAL MIXTURES
9325Real-Time Audio-Guided Multi-Face Reenactment
4005REAL-TIME FALL DETECTION USING MMWAVE RADAR
1130Real-World Adversarial Examples via Makeup
1860REAL-WORLD ON-BOARD UAV AUDIO DATA SET FOR PROPELLER ANOMALIES
9310RECEIVER DESIGN WITH REDUCED DOF IN FREQUENCY DOMAIN FOR TARGET DETECTION UNDER GAUSSIAN CLUTTER
1517RECOGNITION OF SILENTLY SPOKEN WORD FROM EEG SIGNALS USING DENSE ATTENTION NETWORK (DAN).
9315RECONSTRUCTING SPEECH FROM CNN EMBEDDINGS
2748RECOVERY OF GRAPH SIGNALS FROM SIGN MEASUREMENTS
5214RECOVERY OF NOISY POOLED TESTS VIA LEARNED FACTOR GRAPHS WITH APPLICATION TO COVID-19 TESTING
2562Recurrent Design of Probing Waveform for Sparse Bayesian Learning Based DOA Estimation
2494REFEREE: TOWARDS REFERENCE-FREE CROSS-SPEAKER STYLE TRANSFER WITH LOW-QUALITY DATA FOR EXPRESSIVE SPEECH SYNTHESIS
4239REFERENCE MICROPHONE SELECTION AND LOW-RANK APPROXIMATION BASED MULTICHANNEL WIENER FILTER WITH APPLICATION TO SPEECH RECOGNITION
3134Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure
1967REGION-TO-REGION KERNEL INTERPOLATION OF ACOUSTIC TRANSFER FUNCTION WITH DIRECTIONAL WEIGHTING
4722REGRESSION ASSISTED MATRIX COMPLETION FOR RECONSTRUCTING A PROPAGATION FIELD WITH APPLICATION TO SOURCE LOCALIZATION
3527REGULARIZATION USING DENOISING: EXACT AND ROBUST SIGNAL RECOVERY
1838REGULARIZED LATENT SPACE EXPLORATION FOR DISCRIMINATIVE FACE SUPER-RESOLUTION
1516RELATION DISCOVERY IN NONLINEARLY RELATED LARGE-SCALE SETTINGS
2441RELATIVE VIEWPOINT ESTIMATION BASED ON STRUCTURED 3D REPRESENTATION ALIGNMENT
3900REMIX-CYCLE-CONSISTENT LEARNING ON ADVERSARIALLY LEARNED SEPARATOR FOR ACCURATE AND STABLE UNSUPERVISED SPEECH SEPARATION
5763REPEAT AFTER ME: SELF-SUPERVISED LEARNING OF ACOUSTIC-TO-ARTICULATORY MAPPING BY VOCAL IMITATION
4385REPETITION ASSESSMENT FOR SPEECH AND LANGUAGE DISORDERS: A STUDY OF THE LOGOPENIC VARIANT OF PRIMARY PROGRESSIVE APHASIA
4676REPRESENTATION LEARNING THROUGH CROSS-MODAL CONDITIONAL TEACHER-STUDENT TRAINING FOR SPEECH EMOTION RECOGNITION
5399RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT
3142RESIDUAL RECOVERY ALGORITHM FOR MODULO SAMPLING
3620RESIDUAL-GUIDED PERSONALIZED SPEECH SYNTHESIS BASED ON FACE IMAGE
9249RESOURCE ALLOCATION AND DITHERING OF BAYESIAN PARAMETER ESTIMATION USING MIXED-RESOLUTION DATA
2251RESTLESS MULTI-ARMED BANDITS UNDER EXOGENOUS GLOBAL MARKOV PROCESS
5265RETHINKING COMPUTER-AIDED PELVIS SEGMENTATION
2972Rethinking Two-B-Real Net for Real-Time Salient Object Detection
2520RETRIEVAL BIAS AWARE ENSEMBLE MODEL FOR CONDITIONAL SENTENCE GENERATION
1492RETRIEVAL ENHANCED SEGMENT GENERATION NEURAL NETWORK FOR TASK-ORIENTED DIALOGUE SYSTEMS
4645RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION
5215R-G2P: EVALUATING AND ENHANCING ROBUSTNESS OF GRAPHEME TO PHONEME CONVERSION BY CONTROLLED NOISE INTRODUCING AND CONTEXTUAL INFORMATION INCORPORATION
3668RIS-AIDED MONOSTATIC MIMO RADAR WITH CO-LOCATED ANTENNAS
4745r-LOCAL UNLABELED SENSING: IMPROVED ALGORITHM AND APPLICATIONS
2043ROBUST ADAPTIVE BEAMFORMING BASED ON POWER METHOD PROCESSING AND SPATIAL SPECTRUM MATCHING
4855Robust adaptive beamforming maximizing the worst-case SINR over distributional uncertainty sets for random INC matrix and signal steering vector
4146ROBUST ADAPTIVE NOISE CANCELLER ALGORITHM WITH SNR-BASED STEPSIZE CONTROL and NOISE-PATH GAIN COMPENSATION
2831ROBUST AND EFFICIENT UNCERTAINTY AWARE BIOSIGNAL CLASSIFICATION VIA EARLY EXIT ENSEMBLES
3655ROBUST BAYESIAN RECONSTRUCTION OF MULTISPECTRAL SINGLE-PHOTON 3D LIDAR DATA WITH NON-UNIFORM BACKGROUND
5814ROBUST CLASSIFICATION WITH FLEXIBLE DISCRIMINANT ANALYSIS IN HETEROGENEOUS DATA
9171ROBUST COLLABORATIVE LEARNING FOR SEQUENCE MODELLING
9111Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion
9250ROBUST DYNAMIC MULTI-MODAL DATA FUSION: A MODEL UNCERTAINTY PERSPECTIVE
1138Robust High-Order Tensor Recovery via Nonconvex Low-Rank Approximation
4681ROBUST NONPARAMETRIC DISTRIBUTION FORECAST WITH BACKTEST-BASED BOOTSTRAP AND ADAPTIVE RESIDUAL SELECTION
2226ROBUST PARAMETER ESTIMATION BASED ON THE K-DIVERGENCE
3908ROBUST PRESSURE MATCHING WITH ATF PERTURBATION CONSTRAINTS FOR SOUND FIELD CONTROL
9253Robust Recovery of Jointly-Sparse Signals Using Minimax Concave Loss Function
1307Robust self-supervised speaker representation learning via instance mix regularization
8802ROBUST SIGNAL PROCESSING OVER SIMPLICIAL COMPLEXES
2469Robust speaker verification using Population-based Data Augmentation
1205ROBUST SPEAKER VERIFICATION WITH JOINT SELF-SUPERVISED AND SUPERVISED LEARNING
9261Robust TDOA Source Localization Based on Lagrange Programming Neural Network
2881ROBUST THERMAL INFRARED PEDESTRIAN DETECTION BY ASSOCIATING VISIBLE PEDESTRIAN KNOWLEDGE
2080ROBUST UNSTRUCTURED KNOWLEDGE ACCESS IN CONVERSATIONAL DIALOGUE WITH ASR ERRORS
4889ROBUST VIDEO HASHING BASED ON LOCAL FLUCTUATION PRESERVING FOR TRACKING DEEP FAKE VIDEOS
2297RTSNET: DEEP LEARNING AIDED KALMAN SMOOTHING
3769RUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASR
3370S2 REDUCER: HIGH-PERFORMANCE SPARSE COMMUNICATION TO ACCELERATE DISTRIBUTED DEEP LEARNING
3169S3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONS
6024S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION
3495SADN: LEARNED LIGHT FIELD IMAGE COMPRESSION WITH SPATIAL-ANGULAR DECORRELATION
4402SAFARI FROM VISUAL SIGNALS: RECOVERING VOLUMETRIC 3D SHAPES
3718SAFEGUARDING UAV NETWORKS THROUGH INTEGRATED SENSING, JAMMING, AND COMMUNICATIONS
5944SAGA: SELF-AUGMENTATION WITH GUIDED ATTENTION FOR REPRESENTATION LEARNING
9293SAGRNN: SELF-ATTENTIVE GATED RNN FOR BINAURAL SPEAKER SEPARATION WITH INTERAURAL CUE PRESERVATION
1507SAIN: SIMILARITY-AWARE VIDEO FRAME INTERPOLATION
3656SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays
5881SAMPLING SET SELECTION FOR GRAPH SIGNALS UNDER ARBITRARY SIGNAL PRIORS
3676SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion
5951SA-SDR: A NOVEL LOSS FUNCTION FOR SEPARATION OF MEETING STYLE DATA
4414Scalable Data Association and Multi-target Tracking under a Poisson Mixture Measurement Process
4200SCALABLE NEURAL ARCHITECTURES FOR END-TO-END ENVIRONMENTAL SOUND CLASSIFICATION
9127SCALABLE RIDGE LEVERAGE SCORE SAMPLING FOR THE NYSTRÖM METHOD
3263Scattering Statistics of Generalized Spatial Poisson Point Processes
7937SCORE DIFFICULTY ANALYSIS FOR PIANO PERFORMANCE EDUCATION BASED ON FINGERING
2487SCREEN & RELAX: ACCELERATING THE RESOLUTION OF ELASTIC-NET BY SAFE IDENTIFICATION OF THE SOLUTION SUPPORT
4106S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
4311SDETR: Attention-guided Salient Object Detection with Transformer
2938SDNET: LIGHTWEIGHT FACIAL EXPRESSION RECOGNITION FOR SAMPLE DISEQUILIBRIUM
2578SDR — MEDIUM RARE WITH FAST COMPUTATIONS
2662SECMPNN: 3-PARTY PRIVACY-PRESERVING MOLECULAR STRUCTURE PROPERTIES INFERENCE
5288SEED: SOUND EVENT EARLY DETECTION VIA EVIDENTIAL UNCERTAINTY
5466SEGNET-BASED DEEP REPRESENTATION LEARNING FOR DYSPHAGIA CLASSIFICATION
4667SEISMIC FAULT IDENTIFICATION USING GRAPH HIGH-FREQUENCY COMPONENTS AS INPUT TO GRAPH CONVOLUTIONAL NETWORK
5539SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES
3480SELECTIVE MUTUAL LEARNING: AN EFFICIENT APPROACH FOR SINGLE CHANNEL SPEECH SEPARATION
7339SELECTIVE SCALE CASCADE ATTENTION NETWORK FOR BREAST CANCER HISTOPATHOLOGY IMAGE CLASSIFICATION
5864Self supervised representation learning with deep clustering for acoustic unit discovery from raw speech
1817SELF-ATTENTION FOR INCOMPLETE UTTERANCE REWRITING
8897SELF-CRITICAL SEQUENCE TRAINING FOR AUTOMATIC SPEECH RECOGNITION
5706Self-Ensemble Variance Regularization for Domain Adaptation
1206SELF-KNOWLEDGE DISTILLATION BASED SELF-SUPERVISED LEARNING FOR COVID-19 DETECTION FROM CHEST X-RAY IMAGES
2179SELF-KNOWLEDGE DISTILLATION VIA FEATURE ENHANCEMENT FOR SPEAKER VERIFICATION
3095SELF-LEARNED VIDEO SUPER-RESOLUTION WITH AUGMENTED SPATIAL AND TEMPORAL CONTEXT
4315SELF-SUPERVISED ACOUSTIC ANOMALY DETECTION VIA CONTRASTIVE LEARNING
2617Self-supervised Contrastive Learning for Cross-domain Hyperspectral Image Representation
1218SELF-SUPERVISED LEARNING FOR SENTIMENT ANALYSIS VIA IMAGE-TEXT MATCHING
9068SELF-SUPERVISED LEARNING METHOD USING MULTIPLE SAMPLING STRATEGIES FOR GENERAL-PURPOSE AUDIO REPRESENTATION
3508Self-supervised learning on a lightweight low-light image enhancement model with curve refinement
6153SELF-SUPERVISED REPRESENTATION LEARNING FOR UNSUPERVISED ANOMALOUS SOUND DETECTION UNDER DOMAIN SHIFT
1388Self-supervised Speaker Recognition Training Using Human-Machine Dialogues
2945SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING
2157SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION
4485SEMANTIC ASSOCIATION NETWORK FOR VIDEO CORPUS MOMENT RETRIEVAL
4155SEMANTICALLY PROPORTIONAL PATCHMIX FOR FEW-SHOT LEARNING
1806SEMIDEFINITE RELAXATION METHOD FOR MOVING OBJECT LOCALIZATION USING A STATIONARY TRANSMITTER AT UNKNOWN POSITION
3349SEMI-SUPERVISED 360° DEPTH ESTIMATION FROM MULTIPLE FISHEYE CAMERAS WITH PIXEL-LEVEL SELECTIVE LOSS
6349SEMI-SUPERVISED GAUSSIAN MIXTURE VARIATIONAL AUTOENCODER FOR PULSE SHAPE DISCRIMINATION
9277SEMI-SUPERVISED NEURAL CHORD ESTIMATION BASED ON A VARIATIONAL AUTOENCODER WITH LATENT CHORD LABELS AND FEATURES
1726SEMI-SUPERVISED SOURCE LOCALIZATION WITH RESIDUAL PHYSICAL LEARNING
2871SEMI-SUPERVISED STANDARDIZED DETECTION OF PERIODIC SIGNALS WITH APPLICATION TO EXOPLANET DETECTION
2374SENSING-ASSISTED BEAM TRACKING IN V2I NETWORKS: EXTENDED TARGET CASE
9177SENSORS TO SIGN LANGUAGE: A NATURAL APPROACH TO EQUITABLE COMMUNICATION
3158Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
2385SENTIMENT-AWARE DISTILLATION FOR BITCOIN TREND FORECASTING UNDER PARTIAL OBSERVABILITY
4543SEQUENCE TRANSDUCTION WITH GRAPH-BASED SUPERVISION
8168SEQUENTIAL MCMC METHODS FOR AUDIO SIGNAL ENHANCEMENT
4461SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION
3013SHORT-AND-SPARSE DECONVOLUTION VIA RANK-ONE CONSTRAINED OPTIMIZATION (ROCO)
2916SIGNAL COMPRESSION VIA NEURAL IMPLICIT REPRESENTATIONS
4233SIGNAL PROCESSING ON CELL COMPLEXES
1032SIGNAL RECOVERY FROM INCONSISTENT NONLINEAR OBSERVATIONS
6266SIG-VC: A SPEAKER INFORMATION GUIDED ZERO-SHOT VOICE CONVERSION SYSTEM FOR BOTH HUMAN BEINGS AND MACHINES
2876Simple Attention Module based Speaker Verification with Iterative noisy label detection
2010SIMPLER IS BETTER: SPECTRAL REGULARIZATION AND UP-SAMPLING TECHNIQUES FOR VARIATIONAL AUTOENCODERS
3559SIMPLICIAL CONVOLUTIONAL NEURAL NETWORKS
1874SIMULATION-AND-MINING: TOWARDS ACCURATE SOURCE-FREE UNSUPERVISED DOMAIN ADAPTIVE OBJECT DETECTION
1462Simultaneous Nonlocal Low-Rank and Deep Priors for Poisson Denoising
1710SINGLE IMAGE DE-RAINING WITH HIGH-LOW FREQUENCY GUIDANCE
9328SINGLE IMAGE SUPER-RESOLUTION USING ASYNCHRONOUS MULTI-SCALE NETWORK
5015SINGLE-SHOT BALANCED DETECTOR FOR GEOSPATIAL OBJECT DETECTION
9135Sketch storytelling
2021SKETCHED RT3D: HOW TO RECONSTRUCT BILLIONS OF PHOTONS PER SECOND
4114SKIM: SKIPPING MEMORY LSTM FOR LOW-LATENCY REAL-TIME CONTINUOUS SPEECH SEPARATION
8940SLEEPGAN: TOWARDS PERSONALIZED SLEEP THERAPY MUSIC
1188SLIM: EXPLICIT SLOT-INTENT MAPPING WITH BERT FOR JOINT MULTI-INTENT DETECTION AND SLOT FILLING
4500SLUE: NEW BENCHMARK TASKS FOR SPOKEN LANGUAGE UNDERSTANDING EVALUATION ON NATURAL SPEECH
1129SOCIAL WELFARE MAXIMIZATION IN CROSS-SILO FEDERATED LEARNING
2017SODA: Self-organizing data augmentation in deep neural networks - Application to biomedical image segmentation tasks
4772SOLVING THE LONG-TAILED PROBLEM VIA INTRA- AND INTER-CATEGORY BALANCE
3455SOUND EVENT DETECTION GUIDED BY SEMANTIC CONTEXTS OF SCENES
9301SOUND EVENT DETECTION: A TUTORIAL
2735SOURCE MIXING AND SEPARATION ROBUST AUDIO STEGANOGRAPHY
2602SOURCE SEPARATION BY STEERING PRETRAINED MUSIC MODELS
2369SP ATTACK: SINGLE-PERSPECTIVE ATTACK FOR GENERATING ADVERSARIAL OMNIDIRECTIONAL IMAGES
9183SPAIN-NET: SPATIALLY-INFORMED STEREOPHONIC MUSIC SOURCE SEPARATION
5115Sparse Adversarial Attack for video via Gradient-Based Keyframe Selection
9295SPARSE ANALYSIS MODEL BASED DICTIONARY LEARNING FOR SIGNAL DECLIPPING
3660SPARSE ARRAY SOURCE ENUMERATION VIA COARRAY SUBSPACE OPTIMIZATION
3971SPARSE MODELING OF THE EARLY PART OF NOISY ROOM IMPULSE RESPONSES WITH SPARSE BAYESIAN LEARNING
1348SPARSE MULTI-REFERENCE ALIGNMENT: SAMPLE COMPLEXITY AND COMPUTATIONAL HARDNESS
3161Sparse Recovery of Acoustic Waves
5177SPARSE SELF-ATTENTION FOR SEMI-SUPERVISED SOUND EVENT DETECTION
1801SPARSE SUBSPACE TRACKING IN HIGH DIMENSIONS
4644SPARSEBFA: ATTACKING SPARSE DEEP NEURAL NETWORKS WITH THE WORST-CASE BIT FLIPS ON COORDINATES
3264Sparse-Group Log-Sum Penalized Graphical Model Learning For Time Series
3397SPARSITY IMPROVES UNSUPERVISED ATTRIBUTE DISCOVERY IN STYLEGAN
8883SPARSITY-BASED SOUND FIELD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN
2070SPATIAL ACTIVE NOISE CONTROL BASED ON INDIVIDUAL KERNEL INTERPOLATION OF PRIMARY AND SECONDARY SOUND FIELDS
3487SPATIAL ACTIVE NOISE CONTROL WITH THE REMOTE MICROPHONE TECHNIQUE: AN APPROACH WITH A MOVING HIGHER ORDER MICROPHONE
4057SPATIAL DATA AUGMENTATION WITH SIMULATED ROOM IMPULSE RESPONSES FOR SOUND EVENT LOCALIZATION AND DETECTION
4578SPATIAL MIXUP: DIRECTIONAL LOUDNESS MODIFICATION AS DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION
3366SPATIAL PROCESSING FRONT-END FOR DISTANT ASR EXPLOITING SELF-ATTENTION CHANNEL COMBINATOR
2789SPATIAL-CONTEXT-AWARE DEEP NEURAL NETWORK FOR MULTI-CLASS IMAGE CLASSIFICATION
8752SPATIAL-TEMPORAL GRAPH CONVOLUTION NETWORK FOR MULTICHANNEL SPEECH ENHANCEMENT
3871SPATIO-TEMPORAL ATTENTION GRAPH CONVOLUTION NETWORK FOR FUNCTIONAL CONNECTOME CLASSIFICATION
5850SPATIO-TEMPORAL GRAPH COMPLEMENTARY SCATTERING NETWORKS
4541SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORKS FOR CONTINUOUS SIGN LANGUAGE RECOGNITION
1104SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION
2698Spatio-Temporal PRRS Epidemic Forecasting via Factorized Deep Generative Modeling
3190SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATIBILITY
3669SPEAKER GENERATION
7128SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION
2885SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION
4598SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
3467SPEAKER-TARGETED AUDIO-VISUAL SPEECH RECOGNITION USING A HYBRID CTC/ATTENTION MODEL WITH INTERFERENCE LOSS
4335SPECIALISED VIDEO QUALITY MODEL FOR ENHANCED USER GENERATED CONTENT (UGC) WITH SPECIAL EFFECTS
5167Spectral permutation test on persistence diagrams
2579SPECTRAL-SPATIAL SYMMETRICAL AGGREGATION CROSS-LINKING MULTI-MODAL DATA FUSION NETWORK
6151SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION
4927SPEECH EMOTION RECOGNITION USING SELF-SUPERVISED FEATURES
8977SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
4238SPEECH EMOTION RECOGNITION WITH GLOBAL-AWARE FUSION ON MULTI-SCALE FEATURE REPRESENTATION
4767SPEECH ENHANCEMENT FOR LOW BIT RATE SPEECH CODEC
8904Speech enhancement with neural homomorphic synthesis
3927SPEECH PATTERN BASED BLACK-BOX MODEL WATERMARKING FOR AUTOMATIC SPEECH RECOGNITION
4711SPEECH RECOGNITION USING BIOLOGICALLY-INSPIRED NEURAL NETWORKS
3750SPEECH RECOVERY FOR REAL-WORLD SELF-POWERED INTERMITTENT DEVICES
4931SPEECH TASKS RELEVANT TO SLEEPINESS DETERMINED WITH DEEP TRANSFER LEARNING
4023SPEECHMOE2: MIXTURE-OF-EXPERTS MODEL WITH IMPROVED ROUTING
5930SPEECHSPLIT2.0: UNSUPERVISED SPEECH DISENTANGLEMENT FOR VOICE CONVERSION WITHOUT TUNING AUTOENCODER BOTTLENECKS
4017SPELL MY NAME: KEYWORD BOOSTED SPEECH RECOGNITION
1253SPHERICAL CONVOLUTIONAL RECURRENT NEURAL NETWORK FOR REAL-TIME SOUND SOURCE TRACKING
9278SPLIT BREGMAN APPROACH TO LINEAR PREDICTION BASED DEREVERBERATION WITH ENFORCED SPEECH SPARSITY
4118Spoken language recognition with cluster-based modeling
2304SQAPP: No-Reference Speech Quality Assessment via Pairwise Preference
2489SRP-DNN: LEARNING DIRECT-PATH PHASE DIFFERENCE FOR MULTIPLE MOVING SOUND SOURCE LOCALIZATION
4929SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION
4672STABILITY ANALYSIS OF UNFOLDED WMMSE FOR POWER ALLOCATION
4626STABILITY OF NEURAL NETWORKS ON MANIFOLDS TO RELATIVE PERTURBATIONS
4810STABLE AND TRANSFERABLE WIRELESS RESOURCE ALLOCATION POLICIES VIA MANIFOLD NEURAL NETWORKS
6165STACKED MULTI-SCALE ATTENTION NETWORK FOR IMAGE COLORIZATION
3641STATISTICAL PYRAMID DENSE TIME DELAY NEURAL NETWORK FOR SPEAKER VERIFICATION
7986STATISTICAL, SPECTRAL AND GRAPH REPRESENTATIONS FOR VIDEO-BASED FACIAL EXPRESSION RECOGNITION IN CHILDREN
2310STEALTHY BACKDOOR ATTACK WITH ADVERSARIAL TRAINING
1224STGAT-MAD : Spatial-Temporal Graph Attention Network for Multivariate Time Series Anomaly Detection
2830STPointGCN: Spatial Temporal Graph Convolutional Network for Multiple People Recognition Using Millimeter-Wave Radar
5175STREAMING ON-DEVICE DETECTION OF DEVICE DIRECTED SPEECH FROM VOICE AND TOUCH-BASED INVOCATION
2389STREAMING TRANSFORMER TRANSDUCER BASED SPEECH RECOGNITION USING NON-CAUSAL CONVOLUTION
1782STRUCTURAL PRIOR MODELS FOR 3-D DEEP VESSEL SEGMENTATION
3380STUDY OF POSITIONAL ENCODING APPROACHES FOR AUDIO SPECTROGRAM TRANSFORMERS
9245STUDY OF PRE-PROCESSING DEFENSES AGAINST ADVERSARIAL ATTACKS ON STATE-OF-THE-ART SPEAKER RECOGNITION SYSTEMS
2995STUDY OF THE NULL DIRECTIONS ON THE PERFORMANCE OF DIFFERENTIAL BEAMFORMERS
1789STUDY ON TIME-OF-FLIGHT ESTIMATION IN ULTRASONIC WELL LOGGING TOOL: MODEL-DRIVEN TRANSFER LEARNING
3953STUDYING THREE FAMILIES OF DIVERGENCES TO COMPARE WIDE-SENSE STATIONARY GAUSSIAN ARMA PROCESSES
4638StyleGAN-induced data-driven regularization for inverse problems
9113Subgraph Representation Learning With Hard Negative Samples for Inductive Link Prediction
1262Subjective and Objective Quality Assessment of Mobile Gaming Video
3651SUBSPACE CLUSTERING USING UNSUPERVISED DATA AUGMENTATION
9260SUBSPACE DETECTION AND BLIND SOURCE SEPARATION OF MULTIVARIATE SIGNALS BY DYNAMICAL COMPONENT ANALYSIS (DYCA)
8981SUPERRESOLUTION AND SEGMENTATION OF OCT SCANS USING MULTI-STAGE ADVERSARIAL GUIDED ATTENTION TRAINING
5227SUPER-RESOLUTION OF SATELLITE IMAGES BY TWO-DIMENSIONAL RRDB AND EDGE-ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
4765SUPERVISED AND SELF-SUPERVISED PRETRAINING BASED COVID-19 DETECTION USING ACOUSTIC BREATHING/COUGH/SPEECH SIGNALS
3220SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
5595Supervised Learning based Sparse Channel Estimation for RIS aided Communications
4082SUPERVISED TRAINING OF SIAMESE SPIKING NEURAL NETWORKS WITH EARTH MOVER’S DISTANCE
2200SYMBOL-LEVEL ONLINE CHANNEL TRACKING FOR DEEP RECEIVERS
2075Synergistic Network Learning and Label Correction for Noise-robust Image Classification
2529SYNPOSE: A LARGE-SCALE AND DENSELY ANNOTATED SYNTHETIC DATASET FOR HUMAN POSE ESTIMATION IN CLASSROOM
4435SYNT++: UTILIZING IMPERFECT SYNTHETIC DATA TO IMPROVE SPEECH RECOGNITION
4995SYNTAX-BASED GRAPH MATCHING FOR KNOWLEDGE BASE QUESTION ANSWERING
4755SYNTHESIS OF ADVERSARIAL SAMPLES IN TWO-STAGE CLASSIFIERS
4207SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION
2949TACKLING DATA SCARCITY IN SPEECH TRANSLATION USING ZERO-SHOT MULTILINGUAL MACHINE TRANSLATION TECHNIQUES
4800TACKLING THE SCORE SHIFT IN CROSS-LINGUAL SPEAKER VERIFICATION BY EXPLOITING LANGUAGE INFORMATION
1637TALKINGFLOW: TALKING FACIAL LANDMARK GENERATION WITH MULTI-SCALE NORMALIZING FLOW NETWORK
3877TARGET-AWARE AUTO-AUGMENTATION FOR UNSUPERVISED DOMAIN ADAPTIVE OBJECT DETECTION
1339TARGETDROP: A TARGETED REGULARIZATION METHOD FOR CONVOLUTIONAL NEURAL NETWORKS
1293TCRNET: MAKE TRANSFORMER, CNN AND RNN COMPLEMENT EACH OTHER
3532TEACHING CNNS TO MIMIC HUMAN VISUAL COGNITIVE PROCESS & REGULARISE TEXTURE-SHAPE BIAS
4528TED TALK TEASER GENERATION WITH PRE-TRAINED MODELS
4949TEMPO: IMPROVING TRAINING PERFORMANCE IN CROSS-SILO FEDERATED LEARNING
5058TEMPORAL CONTRASTIVE-LOSS FOR AUDIO EVENT DETECTION
2816TEMPORAL CROSS-GRAPH NETWORK FOR BRAIN FUNCTIONAL ACTIVITY PREDICTION
2778Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemic Analysis
4798Temporal Early Exiting for Streaming Speech Commands Recognition
5998TEMPORAL KNOWLEDGE DISTILLATION FOR ON-DEVICE AUDIO CLASSIFICATION
2152TENSOR-BASED ORTHOGONAL MATCHING PURSUIT WITH PHASE ROTATION FOR CHANNEL ESTIMATION IN HYBRID BEAMFORMING MIMO-OFDM SYSTEMS
2567Terahertz Image Restoration Benchmarking Dataset
5437TEST-TIME DETECTION OF BACKDOOR TRIGGERS FOR POISONED DEEP NEURAL NETWORKS
5739TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING
3611Text2Poster: Laying out Stylized Texts on Retrieved Images
1272Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme - Pose Dictionary
4419TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS
4535TEXT-IMAGE DE-CONTEXTUALIZATION DETECTION USING VISION-LANGUAGE MODELS
4170Texture Information Boosts Video Quality Assessment
1828TFPSNET: TIME-FREQUENCY DOMAIN PATH SCANNING NETWORK FOR SPEECH SEPARATION
5009THE COCKTAIL FORK PROBLEM: THREE-STEM AUDIO SEPARATION FOR REAL-WORLD SOUNDTRACKS
3113THE CORAL++ ALGORITHM FOR UNSUPERVISED DOMAIN ADAPTATION OF SPEAKER RECOGNITION
2583THE DATA/IDENTITY TRADEOFF WITH CENSORED SENSORS
2061THE DAWN OF QUANTUM NATURAL LANGUAGE PROCESSING
9317THE EFFECT OF PARTIAL TIME-FREQUENCY MASKING OF THE DIRECT SOUND ON THE PERCEPTION OF REVERBERANT SPEECH
2436The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis
4523THE IMPACT OF JPEG COMPRESSION ON PRIOR IMAGE NOISE
2541THE IMPACT OF REMOVING HEAD MOVEMENTS ON AUDIO-VISUAL SPEECH ENHANCEMENT
4418THE MIRRORNET : LEARNING AUDIO SYNTHESIZER CONTROLS INSPIRED BY SENSORIMOTOR INTERACTION
2370THE PROTOTYPE CO-PRIME ARRAY WITH A ROBUST DIFFERENCE CO-ARRAY
8267THE REPRESENTATION JENSEN-RÉNYI DIVERGENCE
4471The Second DiCOVA Challenge: Dataset and performance analysis for Diagnosis of COVID-19 using acoustics
4571THIN SLICES OF DEPRESSION: IMPROVING DEPRESSION DETECTION PERFORMANCE THROUGH DATA SEGMENTATION
2170TH-NET: A METHOD OF SINGLE 3D OBJECT TRACKING BASED ON TRANSFORMERS AND HAUSDORFF DISTANCE
7914Threshold Independent Evaluation of Sound Event Detection Scores
5407TIE YOUR EMBEDDINGS DOWN: CROSS-MODAL LATENT SPACES FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
8941Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model
2677TIME DOMAIN RADIAL FILTER DESIGN FOR SPHERICAL WAVES
3178TIME-BALANCED FOCAL LOSS FOR AUDIO EVENT DETECTION
2363TIME-DOMAIN ACOUSTIC CONTRAST CONTROL WITH A SPATIAL UNIFORMITY CONSTRAINT FOR PERSONAL AUDIO SYSTEMS
9236TIME-DOMAIN AUDIO SOURCE SEPARATION WITH NEURAL NETWORKS BASED ON MULTIRESOLUTION ANALYSIS
7978TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
1322TIME-FREQUENCY AND GEOMETRIC ANALYSIS OF TASK-DEPENDENT LEARNING IN RAW WAVEFORM BASED ACOUSTIC MODELS
1181TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
3213TINYS2I: A SMALL-FOOTPRINT UTTERANCE CLASSIFICATION MODEL WITH CONTEXTUAL SUPPORT FOR ON-DEVICE SLU
3277TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
4272T-NGA: TEMPORAL NETWORK GRAFTING ALGORITHM FOR LEARNING TO PROCESS SPIKING AUDIO SENSOR EVENTS
2342TNTC: two-stream network with transformer-based complementarity for gait-based emotion recognition
5064TO CATCH A CHORUS, VERSE, INTRO, OR ANYTHING ELSE: ANALYZING A SONG WITH STRUCTURAL FUNCTIONS
4801TONET: TONE-OCTAVE NETWORK FOR SINGING MELODY EXTRACTION FROM POLYPHONIC MUSIC
5186Topological correlation of brain signals
1505TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING
4303TOWARD DEGRADATION-ROBUST VOICE CONVERSION
3163TOWARD MMWAVE-BASED SOUND ENHANCEMENT AND SEPARATION
4659TOWARDS A COMMON SPEECH ANALYSIS ENGINE
4577TOWARDS ACCURATE CROSS-DOMAIN IN-BED HUMAN POSE ESTIMATION
4957TOWARDS AUTOMATIC TRANSCRIPTION OF POLYPHONIC ELECTRIC GUITAR MUSIC: A NEW DATASET AND A MULTI-LOSS TRANSFORMER MODEL
5123TOWARDS BETTER META-INITIALIZATION WITH TASK AUGMENTATION FOR KINDERGARTEN-AGED SPEECH RECOGNITION
3738TOWARDS CLOSED-LOOP SPEECH SYNTHESIS FROM STEREOTACTIC EEG: A UNIT SELECTION APPROACH
4925TOWARDS CONTROLLABLE AND PHYSICAL INTERPRETABLE UNDERWATER SCENE SIMULATION
4022TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING
5204Towards End-to-End Speaker Diarization with Generalized Neural Speaker Clustering
8925TOWARDS EXPRESSIVE SPEAKING STYLE MODELLING WITH HIERARCHICAL CONTEXT INFORMATION FOR MANDARIN SPEECH SYNTHESIS
4934TOWARDS FAST AND CONVENIENT END-TO-END HRTF PERSONALIZATION
2886TOWARDS FASTER CONTINUOUS MULTI-CHANNEL HRTF MEASUREMENTS BASED ON LEARNING SYSTEM MODELS
3144TOWARDS IDENTITY PRESERVING NORMAL TO DYSARTHRIC VOICE CONVERSION
3139Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning
5735Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders Step 2: Contribution of the emergence of phonetic traits
1696TOWARDS JOINT FRAME-LEVEL AND MOS QUALITY PREDICTIONS WITH LOW-COMPLEXITY OBJECTIVE MODELS
3416TOWARDS LEARNING UNIVERSAL AUDIO REPRESENTATIONS
2055TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS
1575Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification
2696TOWARDS MEASURING FAIRNESS IN SPEECH RECOGNITION: CASUAL CONVERSATIONS DATASET TRANSCRIPTIONS
1085Towards Practical and Efficient Long Video Summary
3199TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS
4669TOWARDS ROBUST SPEECH-TO-TEXT ADVERSARIAL ATTACK
4604Towards Robust Visual Transformer Networks via K-Sparse Attention
4940TOWARDS SPEAKER AGE ESTIMATION WITH LABEL DISTRIBUTION LEARNING
8843TOWARDS TRANSFERABLE SPEECH EMOTION REPRESENTATION: ON LOSS FUNCTIONS FOR CROSS-LINGUAL LATENT REPRESENTATIONS
3633Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation
5586TPARN: Triple-path attentive recurrent network for time-domain multichannel speech enhancement
4803TP-VIT: A TWO-PATHWAY VISION TRANSFORMER FOR VIDEO ACTION RECOGNITION
4947TRACKING THE DIMENSIONS OF LATENT SPACES OF GAUSSIAN PROCESS LATENT VARIABLE MODELS
9323TRADE-OFFS IN DECENTRALIZED MULTI-ANTENNA ARCHITECTURES: THE WAX DECOMPOSITION
4881TRAINING PRIVACY-PRESERVING VIDEO ANALYTICS PIPELINES BY SUPPRESSING FEATURES THAT REVEAL INFORMATION ABOUT PRIVATE ATTRIBUTES
3465TRAINING ROBUST ZERO-SHOT VOICE CONVERSION MODELS WITH SELF-SUPERVISED FEATURES
5114TRAINING STABLE GRAPH NEURAL NETWORKS THROUGH CONSTRAINED LEARNING
4916TRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVE
5963Training Strategies For Improved Lip-reading
1630TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR
1891Transducer-Based Streaming Deliberation For Cascaded Encoders
3350TRANSDUCTIVE CLIP WITH CLASS-CONDITIONAL CONTRASTIVE LEARNING
3330Transformer-based Domain Adaptation for Event Data Classification
4211Transformer-Based Estimation of Spoken Sentences using Electrocorticography
4851TRANSFORMER-BASED MULTI-ASPECT MULTI-GRANULARITY NON-NATIVE ENGLISH SPEAKER PRONUNCIATION ASSESSMENT
3793TRANSFORMER-BASED PERSON SEARCH MODEL WITH SYMMETRIC ONLINE INSTANCE MATCHING
4656TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION
4168TRANSFORMER-S2A: ROBUST AND EFFICIENT SPEECH-TO-ANIMATION
3334TRANSIENT ANALYSIS OF CLUSTERED MULTITASK DIFFUSION RLS ALGORITHM
2236TRANSIENT DETECTION WITH UNKNOWN STATISTICS VIA SOURCE CODING
2488Transmit Beamforming with Fixed Covariance for Integrated MIMO Radar and Multiuser Communications
4231TranSTL: Spatial-Temporal Localization Transformer for Multi-Label Video Classification
1917TRIBYOL: TRIPLET BYOL FOR SELF-SUPERVISED REPRESENTATION LEARNING
9284Triply Complementary Priors for Image Restoration
1840T-SVD BASED BROADBAND NON-SYNCHRONOUS MEASUREMENTS
4014Tts4pretrain 2.0: Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses
3485TUNET: A BLOCK-ONLINE BANDWIDTH EXTENSION MODEL BASED ON TRANSFORMERS AND SELF-SUPERVISED PRETRAINING
1347TURN-TO-DIARIZE: ONLINE SPEAKER DIARIZATION CONSTRAINED BY TRANSFORMER TRANSDUCER SPEAKER TURN DETECTION
3569TWO STRATEGIES TOWARD LIGHTWEIGHT IMAGE SUPER-RESOLUTION
3009TWO-PATH GMM-RESNET AND GMM-SENET FOR ASV SPOOFING DETECTION
4373Two-snapshot DOA Estimation via Hankel-structured Matrix Completion
7896TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING
3219UBILUNG: MULTI-MODAL PASSIVE-BASED LUNG HEALTH ASSESSMENT
9128UBIQUITOUS PHYSIOLOGICAL PREDICTION OF SUD PATIENTS’ WELLNESS STATE USING MEMORY-BASED CONVOLUTIONAL MODELS
1775UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION
2278U-GAT-VC: Unsupervised Generative Attentional Networks for Non-parallel Voice Conversion
4527UNCERTAINTY ESTIMATION WITH A VAE-CLASSIFIER HYBRID MODEL
2587UNCERTAINTY IN DATA-DRIVEN KALMAN FILTERING FOR PARTIALLY KNOWN STATE-SPACE MODELS
9305Underdetermined Direction-of-Arrival Estimation Using Sparse Circular Arrays on a Rotating Platform
3632UNDERDETERMINED TWO-DIMENSIONAL LOCALIZATION FOR WIDEBAND SOURCES BASED ON DISTRIBUTED SENSOR ARRAY NETWORKS
3921UNDERWATER IMAGE ENHANCEMENT VIA LEARNING WATER TYPE DESENSITIZED REPRESENTATIONS
5020UNDERWATER SMALL TARGET DETECTION BASED ON DEFORMABLE CONVOLUTIONAL PYRAMID
2913UNDERWATER STEREO MATCHING VIA UNSUPERVISED APPEARANCE AND FEATURE ADAPTATION NETWORKS
1166UNET-TTS: IMPROVING UNSEEN SPEAKER AND STYLE TRANSFER IN ONE-SHOT VOICE CLONING
3566UNFOLDING MODEL-BASED BEAMFORMING FOR HIGH QUALITY ULTRASOUND IMAGING
5290UNIFIED MATRIX CODING FOR NN ORIGINATED MIP IN H.266/VVC
5796Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
3290UNIFIED SPECULATION, DETECTION, AND VERIFICATION KEYWORD SPOTTING
8894UNIMODULAR WAVEFORM DESIGN WITH LOW CORRELATION LEVELS: A FAST ALGORITHM DEVELOPMENT TO SUPPORT LARGE-SCALE CODE LENGTHS
2884UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING
1379UNIVERSAL EFFICIENT VARIABLE-RATE NEURAL IMAGE COMPRESSION
2641UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS
9074UNLIMITED SAMPLING WITH LOCAL AVERAGES
4468UNLIMITED SAMPLING WITH SPARSE OUTLIERS: EXPERIMENTS WITH IMPULSIVE AND JUMP OR RESET NOISE
4319UNROLLING PARTICLES: UNSUPERVISED LEARNING OF SAMPLING DISTRIBUTIONS
2777UNSUPERVISED AND UNTRAINED UNDERWATER IMAGE RESTORATION BASED ON PHYSICAL IMAGE FORMATION MODEL
2451UNSUPERVISED ANOMALY DETECTION FOR CONTAINER CLOUD VIA BILSTM-BASED VARIATIONAL AUTO-ENCODER
3634UNSUPERVISED AUDIO-CAPTION ALIGNING LEARNS CORRESPONDENCES BETWEEN INDIVIDUAL SOUND EVENTS AND TEXTUAL PHRASES
5195UNSUPERVISED CLUSTERING AND ANALYSIS OF CONTRACTION-DEPENDENT FETAL HEART RATE SEGMENTS
8957UNSUPERVISED CONTRASTIVE HASHING FOR CROSS-MODAL RETRIEVAL IN REMOTE SENSING
6273UNSUPERVISED DATA SELECTION FOR SPEECH RECOGNITION WITH CONTRASTIVE LOSS RATIOS
4973UNSUPERVISED DEEP LEARNING NETWORK FOR DEFORMABLE FUNDUS IMAGE REGISTRATION
1503UNSUPERVISED HIERARCHICAL TRANSLATION-BASED MODEL FOR MULTI-MODAL MEDICAL IMAGE REGISTRATION
3475UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR
4325UNSUPERVISED SPEECH ENHANCEMENT WITH SPEECH RECOGNITION EMBEDDING AND DISENTANGLEMENT LOSSES
3004UNSUPERVISED WORD-LEVEL PROSODY TAGGING FOR CONTROLLABLE SPEECH SYNTHESIS
4030Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content
4706URBAN SOUND & SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING
2041USER SCHEDULING USING GRAPH NEURAL NETWORKS FOR RECONFIGURABLE INTELLIGENT SURFACE ASSISTED MULTIUSER DOWNLINK COMMUNICATIONS
1561USING A SINGLE INPUT TO FORECAST HUMAN ACTION KEYSTATES IN EVERYDAY PICK AND PLACE ACTIONS
3925USING ACOUSTIC DEEP NEURAL NETWORK EMBEDDINGS TO DETECT MULTIPLE SCLEROSIS FROM SPEECH
5154USING MULTIPLE REFERENCE AUDIOS AND STYLE EMBEDDING CONSTRAINTS FOR SPEECH SYNTHESIS
5679USING SPECTRAL SEQUENCE-TO-SEQUENCE AUTOENCODERS TO ASSESS MILD COGNITIVE IMPAIRMENT
5434USTED: IMPROVING ASR WITH A UNIFIED SPEECH AND TEXT ENCODER-DECODER
4734VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION
2035VARARRAY: ARRAY-GEOMETRY-AGNOSTIC CONTINUOUS SPEECH SEPARATION
3015VARIABLE SPAN TRADE-OFF FILTER FOR SOUND ZONE CONTROL WITH KERNEL INTERPOLATION WEIGHTING
2667VARIANCE REDUCTION-BOOSTED BYZANTINE ROBUSTNESS IN DECENTRALIZED STOCHASTIC OPTIMIZATION
2896VarianceFlow: High-quality and Controllable Text-to-Speech Using Variance Information via Normalizing Flow
3568VARIATIONAL BAYESIAN FRAMEWORK FOR ADVANCED IMAGE GENERATION WITH DOMAIN-RELATED VARIABLES
5716VARIATIONAL BAYESIAN GRAPH CONVOLUTIONAL NETWORK FOR ROBUST COLLABORATIVE FILTERING
1649VARIATIONAL BAYESIAN TENSOR NETWORKS WITH STRUCTURED POSTERIORS
2882VCD: VIEW-CONSTRAINT DISENTANGLEMENT FOR ACTION RECOGNITION
9142VCVTS: MULTI-SPEAKER VIDEO-TO-SPEECH SYNTHESIS VIA CROSS-MODAL KNOWLEDGE TRANSFER FROM VOICE CONVERSION
2569VIDEO ANOMALY DETECTION VIA PREDICTION NETWORK WITH ENHANCED SPATIO-TEMPORAL MEMORY EXCHANGE
1922VIDEO FRAME INTERPOLATION VIA LOCAL LIGHTWEIGHT BIDIRECTIONAL ENCODING WITH CHANNEL ATTENTION CASCADE
5171VIOLINIST IDENTIFICATION USING NOTE-LEVEL TIMBRE FEATURE DISTRIBUTIONS
3963VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS
1683VISION TRANSFORMER EQUIPPED WITH NEURAL RESIZER ON FACIAL EXPRESSION RECOGNITION TASK
5683VISION TRANSFORMER-BASED RETINA VESSEL SEGMENTATION WITH DEEP ADAPTIVE GAMMA CORRECTION
4393VISUAL REPRESENTATION LEARNING WITH SELF-SUPERVISED ATTENTION FOR LOW-LABEL HIGH-DATA REGIME
4197VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER
4584VOCALSOUND: A DATASET FOR IMPROVING HUMAN VOCAL SOUNDS RECOGNITION
5297VOCBENCH: A NEURAL VOCODER BENCHMARK FOR SPEECH SYNTHESIS
4428VOICE FILTER: FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION USING VOICE CONVERSION AS A POST-PROCESSING MODULE
3088VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING
8800VR-FAM: VARIANCE-REDUCED ENCODER WITH NONLINEAR TRANSFORMATION FOR FACIAL ATTRIBUTE MANIPULATION
1052VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK
5318VU-BERT: A UNIFIED FRAMEWORK FOR VISUAL DIALOG
7434W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
5082Wasserstein Cross-lingual Alignment for Named Entity Recognition
1209WASSERTRAIN: AN ADVERSARIAL TRAINING FRAMEWORK AGAINST WASSERSTEIN ADVERSARIAL ATTACKS
3205WATERMARKING IMAGES IN SELF-SUPERVISED LATENT SPACES
2600WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
3222WAV2VEC-SWITCH: CONTRASTIVE LEARNING FROM ORIGINAL-NOISY SPEECH PAIRS FOR ROBUST SPEECH RECOGNITION
9291WAVE DIGITAL MODELING AND IMPLEMENTATION OF NONLINEAR AUDIO CIRCUITS WITH NULLORS
9060WAVEBENDER GAN: AN ARCHITECTURE FOR PHONETICALLY MEANINGFUL SPEECH MANIPULATION
4478WAVE-DOMAIN APPROACH FOR CANCELLING NOISE ENTERING OPEN WINDOWS
1195WAVEFORM OPTIMIZATION FOR WIRELESS POWER TRANSFER WITH POWER AMPLIFIER AND ENERGY HARVESTER NON-LINEARITIES
2597WAVELET-BASED UNSUPERVISED LABEL-TO-IMAGE TRANSLATION
3659WEAK TARGET DETECTION IN MASSIVE MIMO RADAR VIA AN IMPROVED REINFORCEMENT LEARNING APPROACH
1643Weakly Supervised Point Cloud Upsampling via Optimal Transport
6607WEARABLE SELD DATASET: DATASET FOR SOUND EVENT LOCALIZATION AND DETECTION USING WEARABLE DEVICES AROUND HEAD
3847WEIGHTED GRAPH EMBEDDED LOW-RANK PROJECTION LEARNING FOR FEATURE EXTRACTION
1115WEIGHTED WAVELET-BASED SPECTRAL-SPATIAL TRANSFORMS FOR CFA-SAMPLED RAW CAMERA IMAGE COMPRESSION CONSIDERING IMAGE FEATURES
5156WENETSPEECH: A 10000+ HOURS MULTI-DOMAIN MANDARIN CORPUS FOR SPEECH RECOGNITION
2478What is the Patient Looking at? Robust Gaze-Scene Intersection under free-viewing conditions
1523When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing
5338WHEN DOES BACKDOOR ATTACK SUCCEED IN IMAGE RECONSTRUCTION? A STUDY OF HEURISTICS VS. BI-LEVEL SOLUTION
3476WIDE-SENSE STATIONARITY AND SPECTRAL ESTIMATION FOR GENERALIZED GRAPH SIGNAL
4627wikiTAG: Wikipedia-based knowledge embeddings towards improved acoustic event classification
4608Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning
8833WISHART LOCALIZATION PRIOR ON SPATIAL COVARIANCE MATRIX IN AMBISONIC SOURCE SEPARATION USING NON-NEGATIVE TENSOR FACTORIZATION
1436WLINKER: MODELING RELATIONAL TRIPLET EXTRACTION AS WORD LINKING
4375WLS DESIGN OF ARMA GRAPH FILTERS USING ITERATIVE SECOND-ORDER CONE PROGRAMMING
9159WORD ORDER DOES NOT MATTER FOR SPEECH RECOGNITION
4451WORDMARKOV: A NEW PASSWORD PROBABILITY MODEL OF SEMANTICS
2009Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation
8847ZEROTH-ORDER RANDOMIZED SUBSPACE NEWTON METHODS