List of Accepted Papers
Following is the list of accepted ICASSP 2022 papers, sorted by paper title. You can use the search feature of your web browser to find your paper number. Notifications to all authors have also been sent by email. If you have not received your notification of the results by email, please contact us at papers@2022.ieeeicassp.org.
Paper Number | Paper Title |
---|---|
1929 | 3D CROSS-SCALE FEATURE TRANSFORMER NETWORK FOR BRAIN MR IMAGE SUPER-RESOLUTION |
4429 | 3D TEXTURE SUPER RESOLUTION VIA THE RENDERING LOSS |
4597 | 4D CONVOLUTIONAL NEURAL NETWORKS FOR MULTI-SPECTRAL AND MULTI-TEMPORAL REMOTE SENSING DATA CLASSIFICATION |
4012 | A Bayesian Permutation training deep representation learning method for speech enhancement with variational autoencoder |
8962 | A BENCHMARK OF STATE-OF-THE-ART SOUND EVENT DETECTION SYSTEMS EVALUATED ON SYNTHETIC SOUNDSCAPES |
3369 | A BERT based Joint Learning Model with Feature Gated Mechanism for Spoken Language Understanding |
4579 | A BRIDGE BETWEEN FEATURES AND EVIDENCE FOR BINARY ATTRIBUTE-DRIVEN PERFECT PRIVACY |
5995 | A Byzantine-resilient Dual Subgradient Method for Vertical Federated Learning |
6464 | A CHANNEL ATTENTION BASED MLP-MIXER NETWORK FOR MOTOR IMAGERY DECODING WITH EEG |
3372 | A CHARACTER-LEVEL SPAN-BASED MODEL FOR MANDARIN PROSODIC STRUCTURE PREDICTION |
8736 | A CLOSER LOOK AT AUTOENCODERS FOR UNSUPERVISED ANOMALY DETECTION |
8757 | A CLUSTERING-BASED ML SCHEME FOR CAPACITY APPROACHING SOFT LEVEL SENSING IN 3D TLC NAND |
9143 | A COMMONSENSE KNOWLEDGE ENHANCED NETWORK WITH RETROSPECTIVE LOSS FOR EMOTION RECOGNITION IN SPOKEN DIALOG |
8790 | A COMMUNICATION EFFICIENT QUASI-NEWTON METHOD FOR LARGE-SCALE DISTRIBUTED MULTI-AGENT OPTIMIZATION |
4532 | A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION |
4193 | A COMPLEX SPECTRAL MAPPING WITH INPLACE CONVOLUTION RECURRENT NEURAL NETWORKS FOR ACOUSTIC ECHO CANCELLATION |
2525 | A Configurable Multilingual Model is All You Need to Recognize All Languages |
4411 | A CONVEX FORMULATION FOR THE ROBUST ESTIMATION OF MULTIVARIATE EXPONENTIAL POWER MODELS |
1164 | A CRLB ANALYSIS OF AOA ESTIMATION USING BLUETOOTH 5 |
4653 | A DATA-DRIVEN APPROACH FOR ACOUSTIC PARAMETER SIMILARITY ESTIMATION OF SPEECH RECORDING |
1857 | A DATA-DRIVEN COGNITIVE SALIENCE MODEL FOR OBJECTIVE PERCEPTUAL AUDIO QUALITY ASSESSMENT |
4546 | A DATA-DRIVEN QUANTIZATION DESIGN FOR DISTRIBUTED TESTING AGAINST INDEPENDENCE WITH COMMUNICATION CONSTRAINTS |
4253 | A DIFFERENTIABLE OPTIMISATION FRAMEWORK FOR THE DESIGN OF INDIVIDUALISED DNN-BASED HEARING-AID STRATEGIES |
1087 | A DILATED RESIDUAL VISION TRANSFORMER FOR ATRIAL FIBRILLATION DETECTION FROM STACKED TIME-FREQUENCY ECG REPRESENTATIONS |
4068 | A DNN BASED POST-FILTER TO ENHANCE THE QUALITY OF CODED SPEECH IN MDCT DOMAIN |
8853 | A domain transfer based data augmentation method for automated respiratory classification |
4665 | A DYNAMIC REWEIGHTING STRATEGY FOR FAIR FEDERATED LEARNING |
4313 | A FAST AND EFFICIENT NETWORK FOR SINGLE IMAGE SHADOW DETECTION |
2491 | A Few-sample Strategy for Guitar Tablature Transcription Based on Inharmonicity Analysis and Playability Constraints |
8818 | A FRAME LOSS OF MULTIPLE INSTANCE LEARNING FOR WEAKLY SUPERVISED SOUND EVENT DETECTION |
5035 | A Framework for Private Communication with Secret Block Structure |
2386 | A FREE LUNCH FROM VIT: ADAPTIVE ATTENTION MULTI-SCALE FUSION TRANSFORMER FOR FINE-GRAINED VISUAL RECOGNITION |
3033 | A Gaussian Mixture Model for Dialogue Generation with Dynamic Parameter Sharing Strategy |
9279 | A GENERAL FRAMEWORK FOR DISTRIBUTED INFERENCE WITH UNCERTAIN MODELS |
1055 | A GENERAL FRAMEWORK FOR INCOMPLETE CROSS-MODAL RETRIEVAL WITH MISSING LABELS AND MISSING MODALITIES |
1733 | A GENERALIZED HIERARCHICAL NONNEGATIVE TENSOR DECOMPOSITION |
1579 | A Generalized Kernel Risk Sensitive Loss for Robust Two-dimensional Singular Value Decomposition |
3613 | A GENERIC METHOD TO ESTIMATE CAMERA EXTRINSIC PARAMETERS |
4860 | A glance-and-gaze network for respiratory sound classification |
9225 | A Global to Local Guiding Network for Missing Data Imputation |
3000 | A GRAPH ATTENTION INTERACTIVE REFINE FRAMEWORK WITH CONTEXTUAL REGULARIZATION FOR JOINTING INTENT DETECTION AND SLOT FILLING |
3430 | A HYBRID APPROACH TO COMBINE WIRELESS AND EARCUP MICROPHONES FOR ANC HEADPHONES WITH ERROR SEPARATION MODULE |
6023 | A HYBRID LEARNING FRAMEWORK FOR DEEP SPIKING NEURAL NETWORKS WITH ONE-SPIKE TEMPORAL CODING |
1847 | A KNOWLEDGE/DATA ENHANCED METHOD FOR JOINT EVENT AND TEMPORAL RELATION EXTRACTION |
1831 | A LIGHT WEIGHT MODEL FOR VIDEO SHOT OCCLUSION DETECTION |
4718 | A LIGHTWEIGHT INSTRUMENT-AGNOSTIC MODEL FOR POLYPHONIC NOTE TRANSCRIPTION AND MULTIPITCH ESTIMATION |
2302 | A LIGHTWEIGHT SELF-SUPERVISED TRAINING FRAMEWORK FOR MONOCULAR DEPTH ESTIMATION |
4710 | A likelihood ratio based domain adaptation method for E2E models |
3087 | A LOW-PARAMETRIC MODEL FOR BIT-RATE ESTIMATION OF VVC RESIDUAL CODING |
4292 | A Maximal Correlation Approach to Imposing Fairness in Machine Learning |
5377 | A MELODY-UNSUPERVISION MODEL FOR SINGING VOICE SYNTHESIS |
4919 | A METHOD FOR DETECTING CORONARY ARTERY DISEASE USING NOISY ULTRASHORT ELECTROCARDIOGRAM RECORDINGS |
2280 | A METHOD FOR ESTIMATING THE GROUPING OF PARTICIPANTS IN CLASSROOM GROUP WORK USING ONLY AUDIO INFORMATION |
3236 | A METHOD TO REVEAL SPEAKER IDENTITY IN DISTRIBUTED ASR TRAINING, AND HOW TO COUNTER IT |
8754 | A MINIMALLY SUPERVISED APPROACH FOR MEDICAL IMAGE QUALITY ASSESSMENT IN DOMAIN SHIFT SETTINGS |
5891 | A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT |
1800 | A MULTI DOMAIN KNOWLEDGE ENHANCED MATCHING NETWORK FOR RESPONSE SELECTION IN RETRIEVAL-BASED DIALOGUE SYSTEMS |
2301 | A MULTI-RESOLUTION LOW-RANK TENSOR DECOMPOSITION |
2765 | A MULTISCALE GRADIENT-BACKPROPAGATION OPTIMIZATION FRAMEWORK FOR DEFORMABLE CONVOLUTION BASED COMPRESSED VIDEO ENHANCEMENT |
2255 | A MULTI-TASK LEARNING FRAMEWORK FOR CHINESE MEDICAL PROCEDURE ENTITY NORMALIZATION |
2198 | A MULTITASK LEARNING FRAMEWORK FOR SPEAKER CHANGE DETECTION WITH CONTENT INFORMATION FROM UNSUPERVISED SPEECH DECOMPOSITION |
4053 | A MULTI-TASK LEARNING METHOD FOR WEAKLY SUPERVISED SOUND EVENT DETECTION |
2105 | A MUTUAL LEARNING FRAMEWORK FOR FEW-SHOT SOUND EVENT DETECTION |
1475 | A NEURAL NETWORK-BASED HOWLING DETECTION METHOD FOR REAL-TIME COMMUNICATION APPLICATIONS |
2636 | A NEURAL PROSODY ENCODER FOR END-TO-END DIALOGUE ACT CLASSIFICATION |
1028 | A NEW COPRIME-ARRAY-BASED CONFIGURATION WITH AUGMENTED DEGREES OF FREEDOM AND REDUCED MUTUAL COUPLING |
3518 | A NEW DATA AUGMENTATION METHOD FOR INTENT CLASSIFICATION ENHANCEMENT AND ITS APPLICATION ON SPOKEN CONVERSATION DATASETS |
5733 | A NEW DEEP LEARNING METHOD FOR MULTISPECTRAL IMAGE TIME SERIES COMPLETION USING HYPERSPECTRAL DATA |
2305 | A NEW FRAMEWORK FOR MULTIPLE DEEP CORRELATION FILTERS BASED OBJECT TRACKING |
1930 | A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION |
3967 | A NON-CONVEX PROXIMAL APPROACH FOR CENTROID-BASED CLASSIFICATION |
1255 | A NON-HIERARCHICAL ATTENTION NETWORK WITH MODALITY DROPOUT FOR TEXTUAL RESPONSE GENERATION IN MULTIMODAL DIALOGUE SYSTEMS |
3525 | A NONLINEAR STEERABLE COMPLEX WAVELET DECOMPOSITION OF IMAGES |
4336 | A NOTE ON TOTALLY SYMMETRIC EQUI-ISOCLINIC TIGHT FUSION FRAMES |
6126 | A NOVEL 1D STATE SPACE FOR EFFICIENT MUSIC RHYTHMIC ANALYSIS |
3734 | A NOVEL ANGULAR ESTIMATION METHOD IN THE PRESENCE OF NONUNIFORM NOISE |
3267 | A NOVEL CONVOLUTIONAL NEURAL NETWORK BASED ON ADAPTIVE MULTI-SCALE AGGREGATION AND BOUNDARY-AWARE FOR LATERAL VENTRICLE SEGMENTATION ON MR IMAGES |
2051 | A NOVEL LIGHTWEIGHT NETWORK FOR FAST MONOCULAR DEPTH ESTIMATION |
1645 | A NOVEL MICRO-EXPRESSION RECOGNITION APPROACH USING ATTENTION-BASED MAGNIFICATION-ADAPTIVE NETWORKS |
1252 | A NOVEL NEGATIVE L1 PENALTY APPROACH FOR MULTIUSER ONE-BIT MASSIVE MIMO DOWNLINK WITH PSK SIGNALING |
2573 | A NOVEL PART FEATURE INTEGRATION AND FUSION METHOD FOR FINE-GRAINED VEHICLE RECOGNITION |
6301 | A NOVEL SEQUENTIAL MONTE CARLO FRAMEWORK FOR PREDICTING AMBIGUOUS EMOTION STATES |
1470 | A NOVEL UNSUPERVISED AUTOENCODER-BASED HFOS DETECTOR IN INTRACRANIAL EEG SIGNALS |
6378 | A PERFORMANCE ANALYSIS FOR MULTI-RIS-ASSISTED FULL DUPLEX WIRELESS COMMUNICATION SYSTEM |
5669 | A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION |
5237 | A PRIORI SNR ESTIMATION FOR SPEECH ENHANCEMENT BASED ON PESQ-INDUCED REINFORCEMENT LEARNING |
3175 | A QUESTION-ORIENTED PROPAGATION NETWORK FOR NEWS READING COMPREHENSION |
5298 | A REMEDY FOR DISTRIBUTIONAL SHIFTS THROUGH EXPECTED DOMAIN TRANSLATION |
4180 | A ROBUST CONTRASTIVE ALIGNMENT METHOD FOR MULTI-DOMAIN TEXT CLASSIFICATION |
2490 | A ROBUST DEEP AUDIO SPLICING DETECTION METHOD VIA SINGULARITY DETECTION FEATURE |
3690 | A ROBUST OBJECT SEGMENTATION NETWORK FOR UNDERWATER SCENES |
1949 | A SELF-SUPERVISED PRE-TRAINING FRAMEWORK FOR VISION-BASED SEIZURE CLASSIFICATION |
1506 | A SEMI-HANDCRAFTED KEYPOINT DETECTOR WITH DISCRIMINATIVE FEATURE ENCODING |
2990 | A set-theoretic approach to MIMO detection |
5146 | A SIMPLE FORMULA FOR THE MOMENTS OF UNITARILY INVARIANT MATRIX DISTRIBUTIONS |
3408 | A SIMPLE GRAPH NEURAL NETWORK VIA LAYER SNIFFER |
4045 | A SIMPLE HYBRID FILTER PRUNING FOR EFFICIENT EDGE INFERENCE |
4112 | A SLIDE-SAVE BASED FRAMEWORK FOR MULTI-SOURCE DOA EXTRACTION WITH SPATIAL CLOSELY SEPARATED SOURCES |
2416 | A STIMULI-RELEVANT DIRECTED DEPENDENCY INDEX FOR TIME SERIES |
5143 | A STUDY OF DESIGNING COMPACT AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM BASED ON ITERATIVE FINE-TUNING IN NEURAL NETWORK PRUNING |
5429 | A STUDY OF THE ROBUSTNESS OF RAW WAVEFORM BASED SPEAKER EMBEDDINGS UNDER MISMATCHED CONDITIONS |
2962 | A study on the efficacy of model pre-training in developing neural text-to-speech system |
2308 | A STYLE TRANSFER MAPPING AND FINE-TUNING SUBJECT TRANSFER FRAMEWORK USING CONVOLUTIONAL NEURAL NETWORKS FOR SURFACE ELECTROMYOGRAM PATTERN RECOGNITION |
2855 | A TEST FOR CONDITIONAL CORRELATION BETWEEN RANDOM VECTORS BASED ON WEIGHTED U-STATISTICS |
2354 | A TIME DOMAIN PROGRESSIVE LEARNING APPROACH WITH SNR CONSTRICTION FOR SINGLE-CHANNEL SPEECH ENHANCEMENT AND RECOGNITION |
6234 | A TIME ENCODING APPROACH TO TRAINING SPIKING NEURAL NETWORKS |
5618 | A TRAINABLE BOUNDED DENOISER USING DOUBLE TIGHT FRAME NETWORK FOR SNAPSHOT COMPRESSIVE IMAGING |
4816 | A TRAINING FRAMEWORK FOR STEREO-AWARE SPEECH ENHANCEMENT USING DEEP NEURAL NETWORKS |
3260 | A TRANSFER LEARNING APPROACH FOR PRONUNCIATION SCORING |
3732 | A TWO-STAGE CONTRASTIVE LEARNING FRAMEWORK FOR IMBALANCED AERIAL SCENE RECOGNITION |
1116 | A TWO-STAGE U-NET FOR HIGH-FIDELITY DENOISING OF HISTORICAL RECORDINGS |
4081 | A TWO-STEP APPROACH TO LEVERAGE CONTEXTUAL DATA: SPEECH RECOGNITION IN AIR-TRAFFIC COMMUNICATION |
5800 | A TWO-STEP BACKWARD COMPATIBLE FULLBAND SPEECH ENHANCEMENT SYSTEM |
2608 | A TWO-STREAM INFORMATION FUSION APPROACH TO ABNORMAL EVENT DETECTION IN VIDEO |
9165 | A unified two-stage model for separating superimposed images |
1693 | A UNIVERSAL ORDINAL REGRESSION FOR ASSESSING PHONEME-LEVEL PRONUNCIATION |
2182 | A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer |
4370 | A WAVELET-BASED DUAL-STREAM NETWORK FOR UNDERWATER IMAGE ENHANCEMENT |
2372 | AASIST: AUDIO ANTI-SPOOFING USING INTEGRATED SPECTRO-TEMPORAL GRAPH ATTENTION NETWORKS |
4195 | ACCELERATED INTRAVASCULAR ULTRASOUND IMAGING USING DEEP REINFORCEMENT LEARNING |
4297 | ACCELERATING ILL-CONDITIONED ROBUST LOW-RANK TENSOR REGRESSION |
5867 | ACCESS CONTROL FOR PRIVACY-PRESERVING GAUSSIAN PROCESS REGRESSION |
5360 | ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS |
1029 | ACCURATE INSTANCE SEGMENTATION VIA COLLABORATIVE LEARNING |
5620 | Accurate Multiscale Selective Fusion of CT and Video Images for Real-Time Endoscopic Camera 3D Tracking in Robotic Surgery |
5149 | ACOUSTIC APPLICATION OF PHASE RECONSTRUCTION ALGORITHMS IN OPTICS |
1473 | ACOUSTIC COMPARISON OF PHYSICAL VOCAL TRACT MODELS WITH HARD AND SOFT WALLS |
4456 | ACOUSTIC IMAGING ABOARD THE INTERNATIONAL SPACE STATION (ISS): CHALLENGES AND PRELIMINARY RESULTS |
3558 | ACOUSTIC-TO-ARTICULATORY INVERSION BASED ON SPEECH DECOMPOSITION AND AUXILIARY FEATURE |
3746 | ACP: ADAPTIVE CHANNEL PRUNING FOR EFFICIENT NEURAL NETWORKS |
8560 | Ada-JSR: SAMPLE EFFICIENT ADAPTIVE JOINT SUPPORT RECOVERY FROM EXTREMELY COMPRESSED MEASUREMENT VECTORS |
3037 | AdaPID: An Adaptive PID Optimizer for Training Deep Neural Networks |
4129 | ADAPTING SPEECH SEPARATION TO REAL-WORLD MEETINGS USING MIXTURE INVARIANT TRAINING |
1162 | Adaptive Actor-Critic Bilateral Filter |
3170 | ADAPTIVE ATTENTION GRAPH CAPSULE NETWORK |
2870 | ADAPTIVE DIFFUSION WITH COMPRESSED COMMUNICATION |
8878 | ADAPTIVE DISCOUNTING OF IMPLICIT LANGUAGE MODELS IN RNN-TRANSDUCERS |
3157 | ADAPTIVE GROUP TESTING WITH MISMATCHED MODELS |
1097 | ADAPTIVE IDENTIFICATION OF UNDERWATER ACOUSTIC CHANNEL WITH A MIX OF STATIC AND TIME-VARYING PARAMETERS |
2013 | ADAPTIVE INTRA-GROUP AGGREGATION FOR CO-SALIENCY DETECTION |
1342 | ADAPTIVE MATCHING STRATEGY FOR MULTI-TARGET MULTI-CAMERA TRACKING |
4405 | ADAPTIVE NODE PARTICIPATION FOR STRAGGLER-RESILIENT FEDERATED LEARNING |
2197 | Adaptive Pseudo Labeling for Source-Free Domain Adaptation in Medical Image Segmentation |
9319 | Adaptive Rank Selection for Tensor Ring Decomposition |
5736 | ADAPTIVE VARIATIONAL NONLINEAR CHIRP MODE DECOMPOSITION |
1799 | ADAPTIVE WEIGHTED NETWORK WITH EDGE ENHANCEMENT MODULE FOR MONOCULAR SELF-SUPERVISED DEPTH ESTIMATION |
5552 | ADAPTIVE WIRELESS POWER ALLOCATION WITH GRAPH NEURAL NETWORKS |
4804 | ADA-STNET: A DYNAMIC ADABOOST SPATIO-TEMPORAL NETWORK FOR TRAFFIC FLOW PREDICTION |
2889 | ADA-VAD: UNPAIRED ADVERSARIAL DOMAIN ADAPTATION FOR NOISE-ROBUST VOICE ACTIVITY DETECTION |
1700 | ADDERIC: TOWARDS LOW COMPUTATION COST IMAGE COMPRESSION |
3941 | ADIMA: ABUSE DETECTION IN MULTILINGUAL AUDIO |
8803 | Adjacency Pairs-Aware Hierarchical Attention Networks for Dialogue Intent Classification |
4154 | ADMM-DAD NET: A DEEP UNFOLDING NETWORK FOR ANALYSIS COMPRESSED SENSING |
4989 | ADT: ANTI-DEEPFAKE TRANSFORMER |
2666 | ADVANCING MOMENTUM PSEUDO-LABELING WITH CONFORMER AND INITIALIZATION STRATEGY |
8031 | ADVERFACIAL: PRIVACY-PRESERVING UNIVERSAL ADVERSARIAL PERTURBATION AGAINST FACIAL MICRO-EXPRESSION LEAKAGES |
5617 | ADVERSARIAL AUDIO SYNTHESIS USING A HARMONIC-PERCUSSIVE DISCRIMINATOR |
4794 | ADVERSARIAL EXAMPLES DETECTION BASED ON ERROR LEVEL ANALYSIS AND SPACE MAPPING |
2495 | ADVERSARIAL EXAMPLES FOR IMAGE CROPPING IN SOCIAL MEDIA |
4923 | ADVERSARIAL INPUT ABLATION FOR AUDIO-VISUAL LEARNING |
1250 | ADVERSARIAL LEARNING ENHANCEMENT FOR 3D HUMAN POSE AND SHAPE ESTIMATION |
3077 | ADVERSARIAL LEARNING IN TRANSFORMER BASED NEURAL NETWORK IN RADIO SIGNAL CLASSIFICATION |
7814 | ADVERSARIAL LINEAR QUADRATIC REGULATOR UNDER FALSIFIED ACTIONS |
3301 | ADVERSARIAL MASK TRANSFORMER FOR SEQUENTIAL LEARNING |
1600 | ADVERSARIAL ROBUSTNESS BY DESIGN THROUGH ANALOG COMPUTING AND SYNTHETIC GRADIENTS |
3316 | Adversarial sample detection for speaker verification by neural vocoders |
9296 | Adversarially-Trained Nonnegative Matrix Factorization |
2258 | ADVERSARY DISTILLATION FOR ONE-SHOT ATTACKS ON 3D TARGET TRACKING |
5063 | ADVERSPARSE: AN ADVERSARIAL ATTACK FRAMEWORK FOR DEEP SPATIAL-TEMPORAL GRAPH NEURAL NETWORKS |
4830 | ADVIN: AUTOMATICALLY DISCOVERING NOVEL DOMAINS AND INTENTS FROM USER TEXT UTTERANCES |
3758 | AECMOS: A SPEECH QUALITY ASSESSMENT METRIC FOR ECHO IMPAIRMENT |
1269 | AERIAL BASE STATION PLACEMENT LEVERAGING RADIO TOMOGRAPHIC MAPS |
2336 | AGCYCLEGAN: ATTENTION-GUIDED CYCLEGAN FOR SINGLE UNDERWATER IMAGE RESTORATION |
2635 | AIMNET: ADAPTIVE IMAGE-TAG MERGING NETWORK FOR AUTOMATIC MEDICAL REPORT GENERATION |
1725 | Airborne MIMO Radar Transmit-Receive Design Under Spectral Constraint in Signal-Dependent Clutter |
5090 | AISHELL-NER: NAMED ENTITY RECOGNITION FROM CHINESE SPEECH |
1825 | ALARM SOUND DETECTION USING TOPOLOGICAL SIGNAL PROCESSING |
5537 | Alignment-Learning based single-step decoding for accurate and fast non-autoregressive speech recognition |
6330 | Alleviating the Loss-Metric Mismatch in Supervised Single-Channel Speech Enhancement |
2036 | ALL-NEURAL BEAMFORMER FOR CONTINUOUS SPEECH SEPARATION |
8762 | ALSNET: A DILATED 1-D CNN FOR IDENTIFYING ALS FROM RAW EMG SIGNAL |
8938 | AMBIGUITY MODELLING WITH LABEL DISTRIBUTION LEARNING FOR MUSIC CLASSIFICATION |
1619 | AMICABLE EXAMPLES FOR INFORMED SOURCE SEPARATION |
3341 | AN ACCELERATED RANK-(L,L,1,1) BLOCK TERM DECOMPOSITION OF MULTI-SUBJECT FMRI DATA UNDER SPATIAL ORTHONORMALITY CONSTRAINT |
3766 | AN ADAPTER BASED PRE-TRAINING FOR EFFICIENT AND SCALABLE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING |
9257 | AN ADAPTIVE ALL-PASS FILTER FOR TIME-VARYING DELAY ESTIMATION |
3529 | AN ADAPTIVE ORIENTATIONAL BEAMFORMING TECHNIQUE FOR NARROWBAND INTERFERENCE REJECTION |
8976 | An Anomaly Detection Method Based on Self-supervised Learning With Soft Label Assignment for Defect Visual Inspection |
3965 | AN APPROACH TO MISPRONUNCIATION DETECTION AND DIAGNOSIS WITH ACOUSTIC, PHONETIC AND LINGUISTIC (APL) EMBEDDINGS |
3795 | AN ASYMPTOTICALLY OPTIMAL APPROXIMATION OF THE CONDITIONAL MEAN CHANNEL ESTIMATOR BASED ON GAUSSIAN MIXTURE MODELS |
3343 | AN AUDIO-SALIENCY MASKING TRANSFORMER FOR AUDIO EMOTION CLASSIFICATION IN MOVIEs |
3431 | An effective steganalysis for robust steganography with repetitive JPEG compression |
1384 | AN EFFICIENT DP-SGD MECHANISM FOR LARGE SCALE NLU MODELS |
2067 | An Efficient Framework for Detection and Recognition of Numerical Traffic Signs |
3116 | AN EFFICIENT METHOD FOR GENERIC DSP IMPLEMENTATION OF DILATED CONVOLUTION |
4210 | AN EFFICIENT METHOD FOR MODEL PRUNING USING KNOWLEDGE DISTILLATION WITH FEW SAMPLES |
1469 | An Embarrassingly Simple Model for Dialogue Relation Extraction |
4096 | AN END-TO-END CHINESE TEXT NORMALIZATION MODEL BASED ON RULE-GUIDED FLAT-LATTICE TRANSFORMER |
4715 | AN END-TO-END DEEP LEARNING FRAMEWORK FOR MULTIPLE AUDIO SOURCE SEPARATION AND LOCALIZATION |
6604 | AN END-TO-END DEEP LEARNING SPEECH CODING AND DENOISING STRATEGY FOR COCHLEAR IMPLANTS |
4222 | AN ENHANCED DEEP LEARNING APPROACH FOR TECTONIC FAULT AND FRACTURE EXTRACTION IN VERY HIGH RESOLUTION OPTICAL IMAGES |
5421 | AN ERROR CORRECTION SCHEME FOR IMPROVED AIR-TISSUE BOUNDARY IN REAL-TIME MRI VIDEO FOR SPEECH PRODUCTION |
4905 | An Experimental Study on Transferring Data-driven Image Compressive Sensing to Bioelectric Signal |
5048 | AN EXPLORATION OF HUBERT WITH LARGE NUMBER OF CLUSTER UNITS AND MODEL ASSESSMENT USING BAYESIAN INFORMATION CRITERION |
9130 | AN IMPLICIT GRADIENT-TYPE METHOD FOR LINEARLY CONSTRAINED BILEVEL PROBLEMS |
2257 | AN INFORMATION MAXIMIZATION BASED BLIND SOURCE SEPARATION APPROACH FOR DEPENDENT AND INDEPENDENT SOURCES |
4942 | AN INVESTIGATION OF STREAMING NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION |
4236 | AN INVESTIGATION OF THE EFFECTIVENESS OF PHASE FOR AUDIO CLASSIFICATION |
2678 | AN ONLINE THROUGHPUT MAXIMIZATION ALGORITHM FOR GREEN COORDINATED MULTI-POINT SYSTEMS |
2887 | AN OVERVIEW OF THE FIRST ICASSP SPECIAL SESSION ON COMPUTER AUDITION FOR HEALTHCARE |
4366 | ANALYZING THE ROBUSTNESS OF UNSUPERVISED SPEECH RECOGNITION |
7179 | ANNIHILATION FILTER APPROACH FOR ESTIMATING GRAPH DYNAMICS FROM DIFFUSION PROCESSES |
2025 | ANNO-MI: A DATASET OF EXPERT-ANNOTATED COUNSELLING DIALOGUES |
4773 | ANOMALOUS SOUND DETECTION USING SPECTRAL-TEMPORAL INFORMATION FUSION |
1735 | A-PIXELHOP: A GREEN, ROBUST AND EXPLAINABLE FAKE-IMAGE DETECTOR |
5147 | APPLADE: ADJUSTABLE PLUG-AND-PLAY AUDIO DECLIPPER COMBINING DNN WITH SPARSE OPTIMIZATION |
3960 | APPLYING DEEP LEARNING TO KNOWN-PLAINTEXT ATTACK ON CHAOTIC IMAGE ENCRYPTION SCHEMES |
1362 | APPLYING DIFFERENTIAL PRIVACY TO TENSOR COMPLETION |
8999 | APPROACHES TOWARD PHYSICAL AND GENERAL VIDEO ANOMALY DETECTION |
8775 | APPROXIMATING THE LIKELIHOOD RATIO IN LINEAR-GAUSSIAN STATE-SPACE MODELS FOR CHANGE DETECTION |
5094 | ARCHITECTURE FOR VARIABLE BITRATE NEURAL SPEECH CODEC WITH CONFIGURABLE COMPUTATION COMPLEXITY |
9141 | Are GAN-based Morphs Threatening Face Recognition? |
8824 | ARM 4-BIT PQ: SIMD-BASED ACCELERATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH ON ARM |
4699 | ASD-TRANSFORMER: EFFICIENT ACTIVE SPEAKER DETECTION USING SELF AND MULTIMODAL TRANSFORMERS |
9115 | ASR ERROR CORRECTION WITH DUAL-CHANNEL SELF-SUPERVISED LEARNING |
2604 | ASR-AWARE END-TO-END NEURAL DIARIZATION |
2733 | ASSEM-VC: REALISTIC VOICE CONVERSION BY ASSEMBLING MODERN SPEECH SYNTHESIS TECHNIQUES |
3528 | ATOMIC NORM BASED LOCALIZATION AND ORIENTATION ESTIMATION FOR MILLIMETER-WAVE MIMO OFDM SYSTEMS |
4270 | ATTACHMENT RECOGNITION IN SCHOOL-AGE CHILDREN: A MULTIMODAL APPROACH BASED ON LANGUAGE AND PARALANGUAGE ANALYSIS |
2742 | ATTENTION BACK-END FOR AUTOMATIC SPEAKER VERIFICATION WITH MULTIPLE ENROLLMENT UTTERANCES |
3548 | ATTENTION GUIDED INVARIANCE SELECTION FOR LOCAL FEATURE DESCRIPTORS |
3671 | ATTENTION PROBE: VISION TRANSFORMER DISTILLATION IN THE WILD |
2507 | Attentional Gated Res2Net for Multivariate Time Series Classification |
1594 | ATTENTION-BASED ADVERSARIAL PARTIAL DOMAIN ADAPTATION |
2863 | ATTENTION-BASED DUAL-STREAM VISION TRANSFORMER FOR RADAR GAIT RECOGNITION |
4568 | ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN |
3553 | AttentionPIT: Soft permutation invariant training for audio source separation with attention mechanism |
1160 | ATTENTIVE MAX FEATURE MAP AND JOINT TRAINING FOR ACOUSTIC SCENE CLASSIFICATION |
1668 | ATTENUATION OF ACOUSTIC EARLY REFLECTIONS IN TELEVISION STUDIOS USING PRETRAINED SPEECH SYNTHESIS NEURAL NETWORK |
6345 | ATTRIBUTABLE WATERMARKING OF SPEECH GENERATIVE MODELS |
1833 | Attribute-conditioned Face swapping Network for Low-Resolution images |
4333 | AUDIO PEAK REDUCTION USING A SYNCED ALLPASS FILTER |
9244 | Audio scene monitoring using redundant ad-hoc microphone arrays |
3167 | AUDIO SIGNAL PROCESSING FOR TELEPRESENCE BASED ON WEARABLE ARRAY IN NOISY AND DYNAMIC SCENES |
8362 | AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO |
3404 | AUDIO-TEXT RETRIEVAL IN CONTEXT |
1472 | Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning |
2062 | AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION |
4177 | AUDIO-VISUAL SCENE-AWARE DIALOG AND REASONING USING AUDIO-VISUAL TRANSFORMERS WITH JOINT STUDENT-TEACHER LEARNING |
4694 | Audio-Visual Tracking of Multiple Speakers via a PMBM Filter |
4580 | AUDITORY-BASED DATA AUGMENTATION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION |
3256 | AUGMENTATION STRATEGY OPTIMIZATION FOR LANGUAGE UNDERSTANDING |
4439 | AUGMENTING MOLECULAR DEEP GENERATIVE MODELS WITH TOPOLOGICAL DATA ANALYSIS REPRESENTATIONS |
1906 | Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization |
4395 | AUTOMATED PROSODY CLASSIFICATION FOR ORAL READING FLUENCY WITH QUADRATIC KAPPA LOSS AND ATTENTIVE X-VECTORS |
2530 | Automatic Assessment of the Degree of Clinical Depression from Speech Using X-Vectors |
3510 | AUTOMATIC DEPRESSION DETECTION: AN EMOTIONAL AUDIO-TEXTUAL CORPUS AND A GRU/BILSTM-BASED MODEL |
2692 | AUTOMATIC DEPRESSION LEVEL ASSESSMENT FROM SPEECH BY LONG-TERM GLOBAL INFORMATION EMBEDDING |
5365 | AUTOMATIC DJ TRANSITIONS WITH DIFFERENTIABLE AUDIO EFFECTS AND GENERATIVE ADVERSARIAL NETWORKS |
6099 | AUTOMATIC RESPIRATORY SOUND CLASSIFICATION VIA MULTI-BRANCH TEMPORAL CONVOLUTIONAL NETWORK |
8917 | AUTOREGRESSIVE VARIATIONAL AUTOENCODER WITH A HIDDEN SEMI-MARKOV MODEL-BASED STRUCTURED ATTENTION FOR SPEECH SYNTHESIS |
4629 | AuxFormer: Robust Approach to Audiovisual Emotion Recognition |
5415 | AUXILIARY LOSS OF TRANSFORMER WITH RESIDUAL CONNECTION FOR END-TO-END SPEAKER DIARIZATION |
5046 | AVQVC: One-shot Voice Conversion by Vector Quantization with Applying Contrastive Learning |
3961 | AXONAL DELAY AS A SHORT-TERM MEMORY FOR FEED FORWARD DEEP SPIKING NEURAL NETWORKS |
9008 | BALANCED RANKING AND SORTING FOR CLASS INCREMENTAL OBJECT DETECTION |
8536 | BALANCED STRIPE-WISE PRUNING IN THE FILTER |
3958 | BAYESIAN CONTINUAL IMPUTATION AND PREDICTION FOR IRREGULARLY SAMPLED TIME SERIES DATA |
9231 | BAYESIAN POPT-MODEL-SELECTION ESTIMATION |
4144 | BEING GREEDY DOES NOT HURT: SAMPLING STRATEGIES FOR END-TO-END SPEECH RECOGNITION |
2653 | BEST OF BOTH WORLDS: MULTI-TASK AUDIO-VISUAL AUTOMATIC SPEECH RECOGNITION AND ACTIVE SPEAKER DETECTION |
5056 | BI-DIRECTIONAL MODALITY FUSION NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION |
5990 | BI-DIRECTIONAL NORMALIZATION AND COLOR ATTENTION-GUIDED GENERATIVE ADVERSARIAL NETWORK FOR IMAGE ENHANCEMENT |
3318 | BILEVEL LEARNING OF L1 REGULARIZERS WITH CLOSED-FORM GRADIENTS (BLORC) |
2591 | BILINGUAL END-TO-END ASR WITH BYTE-LEVEL SUBWORDS |
2092 | BINARY DENSE PREDICTORS FOR HUMAN POSE ESTIMATION BASED ON DYNAMIC THRESHOLDS AND FILTERING |
9316 | BINAURAL REPRODUCTION BASED ON BILATERAL AMBISONICS AND EAR-ALIGNED HRTFS |
1444 | BIP-NET: BIDIRECTIONAL PERSPECTIVE STRATEGY BASED ARBITRARY-SHAPED TEXT DETECTION NETWORK |
3011 | BLIND EQUALIZATION OF MOVING AVERAGE CHANNELS OVER GALOIS FIELDS |
4317 | BLIND EXTRACTION OF EQUITABLE PARTITIONS FROM GRAPH SIGNALS |
9313 | BLIND LOCALIZATION OF EARLY ROOM REFLECTIONS USING PHASE ALIGNED SPATIAL CORRELATION |
2589 | BLIND MODULO ANALOG-TO-DIGITAL CONVERSION OF VECTOR PROCESSES |
2861 | BLIND REVERBERATION TIME ESTIMATION IN DYNAMIC ACOUSTIC CONDITIONS |
2570 | BLIND SEPARATION OF LINEAR-QUADRATIC MIXTURES OF MUTUALLY INDEPENDENT AND AUTOCORRELATED SOURCES |
6903 | BLIND SOURCE SEPARATION VIA A WEAK EXCLUSION PRINCIPLE |
4390 | BLIND UNMIXING USING A DOUBLE DEEP IMAGE PRIOR |
1033 | BLOCK-ACTIVATED ALGORITHMS FOR MULTICOMPONENT FULLY NONSMOOTH MINIMIZATION |
1294 | BLOCK-COORDINATE FRANK-WOLFE ALGORITHM AND CONVERGENCE ANALYSIS FOR SEMI-RELAXED OPTIMAL TRANSPORT PROBLEM |
8691 | BLOCK-SPARSE ADVERSARIAL ATTACK TO FOOL TRANSFORMER-BASED TEXT CLASSIFIERS |
8737 | BLOOM-NET: BLOCKWISE OPTIMIZATION FOR MASKING NETWORKS TOWARD SCALABLE AND EFFICIENT SPEECH ENHANCEMENT |
3563 | BNU: A BALANCE-NORMALIZATION-UNCERTAINTY MODEL FOR INCREMENTAL EVENT DETECTION |
3770 | BONA FIDE RIESZ PROJECTIONS FOR DENSITY ESTIMATION |
9029 | Boost Ensemble Learning for Classification of CTG Signals |
3556 | BOUNDARY-AWARE BIAS LOSS FOR TRANSFORMER-BASED AERIAL IMAGE SEGMENTATION MODEL |
2483 | BOUNDED SIMPLEX-STRUCTURED MATRIX FACTORIZATION |
3912 | BOUNDING BOX DISTRIBUTION LEARNING AND CENTER POINT CALIBRATION FOR ROBUST VISUAL TRACKING |
1923 | BSOLO: BOUNDARY-AWARE ONE-STAGE INSTANCE SEGMENTATION SOLO |
4709 | BUILDING ROBUST SPOKEN LANGUAGE UNDERSTANDING BY CROSS ATTENTION BETWEEN PHONEME SEQUENCE AND ASR HYPOTHESIS |
1265 | BUNDLE ICP WITH VIRTUAL DEPTH FOR HAND-HELD 3D SCANNER |
8734 | BYTECOVER2: TOWARDS DIMENSIONALITY REDUCTION OF LATENT EMBEDDING FOR EFFICIENT COVER SONG IDENTIFICATION |
2273 | BYZANTINE-RESILIENT DECENTRALIZED COLLABORATIVE LEARNING |
2756 | Byzantine-resilient Decentralized Resource Allocation |
1439 | Byzantine-Robust Aggregation with Gradient Difference Compression and Stochastic Variance Reduction for Federated Learning |
1697 | BYZANTINE-ROBUST AND COMMUNICATION-EFFICIENT DISTRIBUTED NON-CONVEX LEARNING OVER NON-IID DATA |
4074 | BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT |
9294 | CAA-NET: CONDITIONAL ATROUS CNNS WITH ATTENTION FOR EXPLAINABLE DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION |
5784 | CACHE: MODELING CONTRIBUTION-AWARE CONTEXT HIERARCHICALLY FOR LONG-RANGE DIALOGUE STATE TRACKING |
4464 | CACHING NETWORKS: CAPITALIZING ON COMMON SPEECH FOR ASR |
4246 | CALL-SIGN RECOGNITION AND UNDERSTANDING FOR NOISY AIR-TRAFFIC TRANSCRIPTS USING SURVEILLANCE INFORMATION |
3448 | Camera Calibration through Camera Projection Loss |
3893 | CAN AUDIO CAPTIONS BE EVALUATED WITH IMAGE CAPTION METRICS? |
1310 | CAPITALIZATION NORMALIZATION FOR LANGUAGE MODELING WITH AN ACCURATE AND EFFICIENT HIERARCHICAL RNN MODEL |
1972 | CARINA – A CORPUS OF ALIGNED GERMAN READ SPEECH INCLUDING ANNOTATIONS |
4499 | CASCADE MULTI-CHANNEL NOISE REDUCTION AND ACOUSTIC FEEDBACK CANCELLATION |
1044 | CASCADING BANDIT UNDER DIFFERENTIAL PRIVACY |
1075 | CATEGORY-ADAPTED SOUND EVENT ENHANCEMENT WITH WEAKLY LABELED DATA |
8856 | Category-Adaptive Domain Adaptation for Semantic Segmentation |
3207 | CAUSAL LINEAR TOPOLOGICAL FILTERS OVER A 2-SIMPLEX |
5689 | CDMA: CROSS-DOMAIN DISTANCE METRIC ADAPTATION FOR SPEAKER VERIFICATION |
5996 | CDX-Net: Cross-Domain Multi-Feature Fusion Modeling via Deep Neural Networks for Multivariate Time Series Forecasting in AIOps |
8301 | CELL-FREE MASSIVE MIMO: EXPLOITING THE WAX DECOMPOSITION |
2873 | CF-NET: COMPLEMENTARY FUSION NETWORK FOR ROTATION INVARIANT POINT CLOUD COMPLETION |
2291 | CHANNEL REDUNDANCY AND OVERLAP IN CONVOLUTIONAL NEURAL NETWORKS WITH CHANNEL-WISE NNK GRAPHS |
3287 | Characterizing the adversarial vulnerability of speech self-supervised learning |
8300 | CHINESE SPELLING TEXT GENERATION OF MATHEMATICAL FORMULAS |
2329 | CHUNKFUSION: A LEARNING-BASED RGB-D 3D RECONSTRUCTION FRAMEWORK VIA CHUNK-WISE INTEGRATION |
8912 | CLASSICAL-TO-QUANTUM TRANSFER LEARNING FOR SPOKEN COMMAND RECOGNITION BASED ON QUANTUM NEURAL NETWORKS |
2773 | CLIMATE AND WEATHER: INSPECTING DEPRESSION DETECTION VIA EMOTION RECOGNITION |
5907 | CLIPCAM: A Simple Baseline for Zero-shot Text-guided Object and Action Localization |
2918 | Cloning one's voice using very limited data in the wild |
3810 | Closed-form single source direction-of-arrival estimator using first-order relative harmonic coefficients |
4368 | CLOSING THE SIM-TO-REAL GAP IN GUIDED WAVE DAMAGE DETECTION WITH ADVERSARIAL TRAINING OF VARIATIONAL AUTO-ENCODERS |
4123 | CLSEG: Contrastive Learning of Story Ending Generation |
4870 | CLUSTERING AND SEPARATING SIMILARITIES FOR DEEP UNSUPERVISED HASHING |
5968 | CLUSTERING COMPLEX SUBSPACES IN LARGE DIMENSIONS |
3359 | cMRI2SPEC: Cine MRI Sequence to Spectrogram Synthesis via a Pairwise Heterogeneous Translator |
3225 | CNN-AIDED FACTOR GRAPHS WITH ESTIMATED MUTUAL INFORMATION FEATURES FOR SEIZURE DETECTION |
1957 | CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION |
6003 | COARRAY MANIFOLD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN FOR ENHANCED SOURCE LOCALIZATION |
2090 | COARSE-TO-FINE UNSUPERVISED CHANGE DETECTION FOR REMOTE SENSING IMAGES VIA OBJECT-BASED MRF AND INCEPTION UNET |
1271 | CO-ATTENTION-GUIDED BILINEAR MODEL FOR ECHO-BASED DEPTH ESTIMATION |
9270 | Cognitive Antenna Selection for Automotive Radar Using Bobrovsky-Zakai Bound |
4847 | COGNITIVE CODING OF SPEECH |
2609 | COLLABORATIVE OBJECT DETECTORS ADAPTIVE TO BANDWIDTH AND COMPUTATION |
2134 | Combating False Sense of Security: Breaking the Defense of Adversarial Training via Non-Gradient Adversarial Attack |
1465 | COMBINING MULTIPLE STYLE TRANSFER NETWORKS AND TRANSFER LEARNING FOR LGE-CMR SEGMENTATION |
4891 | COMBINING UNSUPERVISED AND TEXT AUGMENTED SEMI-SUPERVISED LEARNING FOR LOW RESOURCED AUTOREGRESSIVE SPEECH RECOGNITION |
1393 | Communication-Efficient Distributed MAX-VAR Generalized CCA via Error Feedback-Assisted Quantization |
2468 | Communication-Efficient Online Federated Learning Framework for Nonlinear Regression |
8629 | COMPARISON OF BOUNDARY ARTIFACT REMOVAL METHODS IN CODING OF GENERALIZED CUBEMAP PROJECTION USING VVC |
2294 | COMPETITIVE MULTI-AGENT REINFORCEMENT LEARNING WITH SELF-SUPERVISED REPRESENTATION |
3597 | COMPLEX IRM-AWARE TRAINING FOR VOICE ACTIVITY DETECTION USING ATTENTION MODEL |
1989 | COMPLEX-VALUED SPATIAL AUTOENCODERS FOR MULTICHANNEL SPEECH ENHANCEMENT |
2707 | COMPOSING GRAPHICAL MODELS WITH GENERATIVE ADVERSARIAL NETWORKS FOR EEG SIGNAL MODELING |
3931 | COMPRESSED DATA SHARING BASED ON INFORMATION BOTTLENECK MODEL |
9304 | Compressed Super-Resolution of Positive Sources |
2769 | Compressing Transformer-based ASR Model by Task-driven Loss and Attention-based Multi-level Feature Distillation |
4214 | COMPRESSION-AWARE PROJECTION WITH GREEDY DIMENSION REDUCTION FOR CONVOLUTIONAL NEURAL NETWORK ACTIVATIONS |
8776 | COMPRESSIVE PHASE RETRIEVAL BASED ON SPARSE LATENT GENERATIVE PRIORS |
2566 | Compressive Scanning Transmission Electron Microscopy |
8837 | COMPUTATIONALLY EFFICIENT FIXED-FILTER ANC FOR SPEECH BASED ON LONG-TERM PREDICTION FOR HEADPHONE APPLICATIONS |
4849 | CONDITIONAL DIFFUSION PROBABILISTIC MODEL FOR SPEECH ENHANCEMENT |
5714 | CONDITIONALLY FACTORIZED VARIATIONAL BAYES WITH IMPORTANCE SAMPLING |
1959 | ConeFace: Approximate Pairwise Loss for Face Recognition |
3389 | CONFIDENCE ESTIMATION FOR SPEECH EMOTION RECOGNITION BASED ON THE RELATIONSHIP BETWEEN EMOTION CATEGORIES AND PRIMITIVES |
3985 | CONFIDENCE-AWARE MULTI-TEACHER KNOWLEDGE DISTILLATION |
4938 | CONFORMER-BASED HYBRID ASR SYSTEM FOR SWITCHBOARD DATASET |
3268 | CONFORMER-BASED SELF-SUPERVISED LEARNING FOR NON-SPEECH AUDIO TASKS |
6843 | CONFORMER-BASED SPEECH RECOGNITION WITH LINEAR NYSTRÖM ATTENTION AND ROTARY POSITION EMBEDDING |
1458 | Conjugate Augmented Spatial-Temporal Near-Field Sources Localization with Cross Array |
1425 | CONNECTING TARGETS VIA LATENT TOPICS AND CONTRASTIVE LEARNING: A UNIFIED FRAMEWORK FOR ROBUST ZERO-SHOT AND FEW-SHOT STANCE DETECTION |
1263 | Considering user agreement in learning to predict the aesthetic quality |
1673 | CONSISTENT TRAINING AND DECODING FOR END-TO-END SPEECH RECOGNITION USING LATTICE-FREE MMI |
9184 | CONSTANT Q CEPSTRAL COEFFICIENTS FOR CLASSIFICATION OF NORMAL VS. PATHOLOGICAL INFANT CRY |
4513 | CONTENT PRESERVING SCALE SPACE NETWORK FOR FAST IMAGE RESTORATION FROM NOISY-BLURRY PAIRS |
2444 | CONTEXT MODELING WITH EVIDENCE FILTER FOR MULTIPLE CHOICE QUESTION ANSWERING |
3137 | Context-Adaptive Document-Level Neural Machine Translation |
4853 | CONTEXT-AWARE GRAPH-BASED SELF-SUPERVISED LEARNING OF WHOLE SLIDE IMAGES |
3040 | CONTEXT-AWARE MASK PREDICTION NETWORK FOR END-TO-END TEXT-BASED SPEECH EDITING |
4480 | CONTEXTUAL ADAPTERS FOR PERSONALIZED SPEECH RECOGNITION IN NEURAL TRANSDUCERS |
4903 | Continual learning using lattice-free MMI for speech recognition |
5163 | CONTINUAL SELF-TRAINING WITH BOOTSTRAPPED REMIXING FOR SPEECH ENHANCEMENT |
4434 | CONTINUOUS SPEECH SEPARATION WITH RECURRENT SELECTIVE ATTENTION NETWORK |
1193 | CONTINUOUS STREAMING MULTI-TALKER ASR WITH DUAL-PATH TRANSDUCERS |
2265 | CONTRASTIVE HEARTBEATS: CONTRASTIVE LEARNING FOR SELF-SUPERVISED ECG REPRESENTATION AND PHENOTYPING |
1127 | CONTRASTIVE KNOWLEDGE GRAPH ATTENTION NETWORK FOR REQUEST-BASED RECIPE RECOMMENDATION |
7416 | CONTRASTIVE PREDICTION STRATEGIES FOR UNSUPERVISED SEGMENTATION AND CATEGORIZATION OF PHONEMES AND WORDS |
2448 | CONTRASTIVE PREDICTIVE CODING FOR ANOMALY DETECTION OF FETAL HEALTH FROM THE CARDIOTOCOGRAM |
3130 | CONTRASTIVE SENSOR TRANSFORMER FOR PREDICTIVE MAINTENANCE OF INDUSTRIAL ASSETS |
4575 | CONTRASTIVE SIAMESE NETWORK FOR SEMI-SUPERVISED SPEECH RECOGNITION |
9036 | CONTRASTIVE TRANSLATION LEARNING FOR MEDICAL IMAGE SEGMENTATION |
2669 | Contrastive-Mixup Learning for Improved Speaker Verification |
5160 | CONTROLLABLE SPEECH REPRESENTATION LEARNING VIA VOICE CONVERSION AND AIC LOSS |
8866 | CONTROLLED SENSING AND ANOMALY DETECTION VIA SOFT ACTOR-CRITIC REINFORCEMENT LEARNING |
2029 | CONTROLLING SMART PROPAGATION ENVIRONMENTS: LONG-TERM VERSUS SHORT-TERM PHASE SHIFT OPTIMIZATION |
3300 | CONTROLLING THE FRÉCHET VARIANCE IMPROVES BATCH NORMALIZATION ON THE SYMMETRIC POSITIVE DEFINITE MANIFOLD |
4029 | Conversational Speech Recognition by Learning Conversation-level Characteristics |
2797 | CONVEX CLUSTERING FOR AUTOCORRELATED TIME SERIES |
1546 | CONVMIXER: FEATURE INTERACTIVE CONVOLUTION WITH CURRICULUM LEARNING FOR SMALL FOOTPRINT AND NOISY FAR-FIELD KEYWORD SPOTTING |
4539 | CONVOLUATIONAL TRANSFORMER WITH ADAPTIVE POSITION EMBEDDING FOR COVID-19 DETECTION FROM COUGH SOUNDS |
2171 | CONVOLUTIONAL BEAMSPACE USING IIR FILTERS |
7095 | CONVOLUTIONAL FILTERING IN SIMPLICIAL COMPLEXES |
1510 | CONVOLUTIONAL ISTA NETWORK WITH TEMPORAL CONSISTENCY CONSTRAINTS FOR VIDEO RECONSTRUCTION FROM EVENT CAMERAS |
4750 | CONVOLUTIONAL WEIGHTED MINIMUM MEAN SQUARE ERROR FILTER FOR JOINT SOURCE SEPARATION AND DEREVERBERATION |
4967 | COUGHTRIGGER: EARBUDS IMU BASED COUGH DETECTION ACTIVATOR USING AN ENERGY-EFFICIENT SENSITIVITY-PRIORITIZED TIME SERIES CLASSIFIER |
2001 | Counting the number of different scaling exponents in multivariate scale-free dynamics: Clustering by bootstrap in the wavelet domain |
1345 | COUPLED FEATURE LEARNING VIA STRUCTURED CONVOLUTIONAL SPARSE CODING FOR MULTIMODAL IMAGE FUSION |
3873 | CPD computation via recursive eigenspace decompositions |
4892 | CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION |
4790 | CRAMER-RAO BOUND ANALYSIS OF DISTRIBUTED DOA ESTIMATION EXPLOITING MIXED-PRECISION COVARIANCE MATRIX |
5044 | CRAMÉR-RAO BOUND AND ANTENNA SELECTION OPTIMIZATION FOR DUAL RADAR-COMMUNICATION DESIGN |
9246 | CRAMÉR-RAO BOUND FOR ESTIMATION AFTER MODEL SELECTION AND ITS APPLICATION TO SPARSE VECTOR ESTIMATION |
2228 | CRAMER-RAO BOUND FOR THE TIME-VARYING POISSON |
9230 | CROSS-CORPUS SPEECH EMOTION RECOGNITION BASED ON FEW-SHOT LEARNING AND DOMAIN ADAPTATION |
1756 | CROSS-DOMAIN FEW-SHOT LEARNING FOR RARE-DISEASE SKIN LESION SEGMENTATION |
3286 | CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE |
9264 | CROSS-EPOCH LEARNING FOR WEAKLY SUPERVISED ANOMALY DETECTION IN SURVEILLANCE VIDEOS |
5781 | CROSS-LAYER AGGREGATION WITH TRANSFORMERS FOR MULTI-LABEL IMAGE CLASSIFICATION |
3230 | CROSS-MODAL KNOWLEDGE DISTILLATION FOR VISION-TO-SENSOR ACTION RECOGNITION |
3942 | CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION |
2971 | Cross-speaker style transfer for text-to-speech using data augmentation |
4179 | CROSS-TARGET STANCE DETECTION VIA REFINED META-LEARNING |
1363 | CRPN: DISTINGUISH NOVEL CATEGORIES VIA CLASS-RELEVANT REGION PROPOSAL NETWORK FOR FEW-SHOT OBJECT DETECTION |
1432 | CSENET: COMPLEX SQUEEZE-AND-EXCITATION NETWORK FOR SPEECH DEPRESSION LEVEL PREDICTION |
2177 | CS-GRESNET: A SIMPLE AND HIGHLY EFFICIENT NETWORK FOR FACIAL EXPRESSION RECOGNITION |
3645 | CSI CLUSTERING WITH VARIATIONAL AUTOENCODING |
2279 | CS-REP: MAKING SPEAKER VERIFICATION NETWORKS EMBRACING RE-PARAMETERIZATION |
3337 | CURRICULUM OPTIMIZATION FOR LOW-RESOURCE SPEECH RECOGNITION |
4793 | CUSTOM ATTRIBUTION LOSS FOR IMPROVING GENERALIZATION AND INTERPRETABILITY OF DEEPFAKE DETECTION |
8991 | CUSTOMER SATISFACTION ESTIMATION USING UNSUPERVISED REPRESENTATION LEARNING WITH MULTI-FORMAT PREDICTION LOSS |
4103 | CUSTOMIZABLE END-TO-END OPTIMIZATION OF ONLINE NEURAL NETWORK-SUPPORTED DEREVERBERATION FOR HEARING DEVICES |
1636 | Cut and Continuous Paste Towards Real-time Deep Fall Detection |
6461 | CYBER-THREAT PROPAGATION OVER NETWORK-SLICING ARCHITECTURES |
5717 | DAM-GAN : IMAGE INPAINTING USING DYNAMIC ATTENTION MAP BASED ON FAKE TEXTURE DETECTION |
2204 | Data Agnostic Filter Gating for Efficient Deep Networks |
2574 | DATA AUGMENTATION FOR LONG-TAILED AND IMBALANCED POLYPHONE DISAMBIGUATION IN MANDARIN |
2046 | DATA EFFICIENT SUPPORT VECTOR MACHINE TRAINING USING THE MINIMUM DESCRIPTION LENGTH PRINCIPLE |
3765 | DATA INCUBATION — SYNTHESIZING MISSING DATA FOR HANDWRITING RECOGNITION |
8358 | Data Shapley Value for Handling Noisy Labels: An application in Screening COVID-19 Pneumonia from Chest CT Scans |
3521 | DATA-DRIVEN ALGORITHMS FOR GAUSSIAN MEASUREMENT MATRIX DESIGN IN COMPRESSIVE SENSING |
2243 | DATA-DRIVEN APPROACH FOR THE FLOQUET PROPAGATOR INVERSE PROBLEM SOLUTION |
3911 | Data-driven Optimization for Zero-delay Lossy Source Coding with Side Information |
1618 | DATA-DRIVEN SPATIALLY DEPENDENT PDE IDENTIFICATION |
1772 | DCNGAN: A DEFORMABLE CONVOLUTION-BASED GAN WITH QP ADAPTATION FOR PERCEPTUAL QUALITY ENHANCEMENT OF COMPRESSED VIDEO |
3699 | DCSN: Deformable Convolutional Semantic Segmentation Neural Network for Non-Rigid Scenes |
3345 | DECENTRALIZED BILEVEL OPTIMIZATION FOR PERSONALIZED CLIENT LEARNING |
1267 | DECENTRALIZED LEARNING IN THE PRESENCE OF LOW-RANK NOISE |
5111 | DEEP ACTOR-CRITIC FOR CONTINUOUS 3D MOTION CONTROL IN MOBILE RELAY BEAMFORMING NETWORKS |
2915 | DEEP ADAPTATION CONTROL FOR ACOUSTIC ECHO CANCELLATION |
4526 | DEEP ADAPTIVE AEC: HYBRID OF DEEP LEARNING AND ADAPTIVE ACOUSTIC ECHO CANCELLATION |
1254 | DEEP AUGMENTED MUSIC ALGORITHM FOR DATA-DRIVEN DOA ESTIMATION |
9307 | Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation |
4587 | DEEP DETERMINISTIC INDEPENDENT COMPONENT ANALYSIS FOR HYPERSPECTRAL UNMIXING |
4514 | DEEP HASHING WITH HASH CENTER UPDATE FOR EFFICIENT IMAGE RETRIEVAL |
3265 | DEEP IMPULSE RESPONSES: ESTIMATING AND PARAMETERIZING FILTERS WITH DEEP NETWORKS |
3250 | DEEP INITIALIZATION FOR GUARANTEED UNIMODULAR QUADRATIC PROGRAMMING |
4369 | DEEP ITERATIVE PHASE RETRIEVAL FOR PTYCHOGRAPHY |
3329 | DEEP JOINT SOURCE-CHANNEL CODING FOR WIRELESS IMAGE TRANSMISSION WITH ADAPTIVE RATE CONTROL |
5062 | DEEP KERNEL LEARNING NETWORKS WITH MULTIPLE LEARNING PATHS |
8829 | DEEP LEARNING BASED OFF-ANGLE IRIS RECOGNITION |
3438 | DEEP LEARNING BASED PASSIVE BEAMFORMING FOR IRS-ASSISTED MONOSTATIC BACKSCATTER SYSTEMS |
1305 | DEEP LEARNING FOR LOCATION BASED BEAMFORMING WITH NLOS CHANNELS |
4443 | DEEP LEARNING FOR PROMINENCE DETECTION IN CHILDREN'S READ SPEECH |
4623 | DEEP LEARNING ON THE SPHERE FOR MULTI-MODEL ENSEMBLING OF SIGNIFICANT WAVE HEIGHT |
6298 | Deep Markov Clustering For Panoptic Segmentation |
3773 | DEEP NEURAL NETWORK (DNN) AUDIO CODER USING A PERCEPTUALLY IMPROVED TRAINING METHOD |
1375 | DEEP OBJECT DETECTION WITH EXAMPLE ATTRIBUTE BASED PREDICTION MODULATION |
2805 | DEEP PERFORMER: SCORE-TO-AUDIO MUSIC PERFORMANCE SYNTHESIS |
1676 | DEEP PIECEWISE HASHING FOR EFFICIENT HAMMING SPACE RETRIEVAL |
9158 | DEEP PROXIMAL UNFOLDING FOR IMAGE RECOVERY FROM UNDER-SAMPLED CHANNEL DATA IN INTRAVASCULAR ULTRASOUND |
3697 | DEEP RANK CROSS-MODAL HASHING WITH SEMANTIC CONSISTENT FOR IMAGE-TEXT RETRIEVAL |
2900 | DEEP RESIDUAL ECHO SUPPRESSION AND NOISE REDUCTION: A MULTI-INPUT FCRN APPROACH IN A HYBRID SPEECH ENHANCEMENT SYSTEM |
2785 | DEEP SCALE-AWARE IMAGE SMOOTHING |
4689 | DEEP SEQUENTIAL BEAMFORMER LEARNING FOR MULTIPATH CHANNELS IN MMWAVE COMMUNICATION SYSTEMS |
1424 | Deep Spatio-Temporal Wind Power Forecasting |
9032 | DEEP TEMPORAL INTERPOLATION OF RADAR-BASED PRECIPITATION |
3107 | DEEP VIDEO INPAINTING GUIDED BY AUDIO-VISUAL SELF-SUPERVISION |
3677 | DEEP VIDEO INPAINTING LOCALIZATION USING SPATIAL AND TEMPORAL TRACES |
2132 | DEEPCHORUS: A HYBRID MODEL OF MULTI-SCALE CONVOLUTION AND SELF-ATTENTION FOR CHORUS DETECTION |
3940 | DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH |
9166 | DEEPFILTERNET: A LOW COMPLEXITY SPEECH ENHANCEMENT FRAMEWORK FOR FULL-BAND AUDIO BASED ON DEEP FILTERING |
5168 | DeepGBASS: Deep Guided Boundary-Aware Semantic Segmentation |
5382 | DEEPHULL: FAST CONVEX HULL APPROXIMATION IN HIGH DIMENSIONS |
4298 | DEEP-LEARNING-ASSISTED CONFIGURATION OF RECONFIGURABLE INTELLIGENT SURFACES IN DYNAMIC RICH-SCATTERING ENVIRONMENTS |
4854 | DEEP-MLE: FUSION BETWEEN A NEURAL NETWORK AND MLE FOR A SINGLE SNAPSHOT DOA ESTIMATION |
1771 | DEFENDING AGAINST BACKDOOR ATTACKS IN FEDERATED LEARNING WITH DIFFERENTIAL PRIVACY |
1484 | Defending Against Universal Attack via Curvature-aware Category Adversarial Training |
9265 | DEFENSIVE COMPRESSIVE TIME DELAY ESTIMATION USING INFORMATION BOTTLENECK |
9091 | Deformable Convolution Dense Network for Compressed Video Quality Enhancement |
3331 | Deformable VisTR: Spatio temporal deformable attention for video instance segmentation |
3083 | DELAY-ORIENTED DISTRIBUTED SCHEDULING USING GRAPH NEURAL NETWORKS |
4497 | DELIBERATION OF STREAMING RNN-TRANSDUCER BY NON-AUTOREGRESSIVE DECODING |
8854 | DELTA DISTANCING: A LIFTING APPROACH TO LOCALIZING ITEMS FROM USER COMPARISONS |
2136 | DEMENTIA DETECTION BY FUSING SPEECH AND EYE-TRACKING REPRESENTATION |
2630 | DEMON: IMPROVED NEURAL NETWORK TRAINING WITH MOMENTUM DECAY |
3868 | DENOISING-GUIDED DEEP REINFORCEMENT LEARNING FOR SOCIAL RECOMMENDATION |
1369 | DENOISING-ORIENTED DEEP HIERARCHICAL REINFORCEMENT LEARNING FOR NEXT-BASKET RECOMMENDATION |
2753 | DEPTH PRUNING WITH AUXILIARY NETWORKS FOR TINYML |
1699 | DEPTH REMOVAL DISTILLATION FOR RGB-D SEMANTIC SEGMENTATION |
6429 | DEPTH-BASED ENSEMBLE LEARNING NETWORK FOR FACE ANTI-SPOOFING |
8783 | Deriving Explainable Discriminative Attributes Using Confusion About Counterfactual Class |
1880 | DESIGN OF REAL-TIME SYSTEM BASED ON MACHINE LEARNING FOR SNORING AND OSA DETECTION |
3415 | DESIGNING A QAM SIGNAL DETECTOR FOR MASSIVE MIMO SYSTEMS VIA PS-ADMM APPROACH |
8753 | DETAIL GENERATION AND FUSION NETWORKS FOR IMAGE INPAINTING |
5713 | DETECTING ANOMALY IN CHEMICAL SENSORS VIA REGULARIZED CONTRASTIVE LEARNING |
1926 | DETECTING BACKDOOR ATTACKS AGAINST POINT CLOUD CLASSIFIERS |
5656 | Detection of COPD exacerbation from speech: comparison of acoustic features and deep learning based speech breathing models |
4901 | DETECTION OF COVID-19 FROM JOINT TIME AND FREQUENCY ANALYSIS OF SPEECH, BREATHING AND COUGH AUDIO |
3875 | DETERMINING JOINT PERIODICITIES IN MULTI-TIME DATA WITH SAMPLING UNCERTAINTIES |
3517 | DETERMINING THE BEST ACOUSTIC FEATURES FOR SMOKER IDENTIFICATION |
5660 | DETERMINISTIC TRANSFORM BASED WEIGHT MATRICES FOR NEURAL NETWORKS |
2233 | DGC-VECTOR: A NEW SPEAKER EMBEDDING FOR ZERO-SHOT VOICE CONVERSION |
9071 | DHWP: LEARNING HIGH-QUALITY SHORT HASH CODES VIA WEIGHT PRUNING |
5606 | DICTIONARY LEARNING WITH UNIFORM SPARSE REPRESENTATIONS FOR ANOMALY DETECTION |
3646 | DIFFERENTIABLE DIGITAL SIGNAL PROCESSING MIXTURE MODEL FOR SYNTHESIS PARAMETER EXTRACTION FROM MIXTURE OF HARMONIC SOUNDS |
1912 | DIFFERENTIABLE PROGRAMMING A LA MOREAU |
1889 | DIFFERENTIABLE WAVETABLE SYNTHESIS |
4536 | DIFFERENTIATE-AND-FIRE TIME-ENCODING OF FINITE-RATE-OF-INNOVATION SIGNALS |
3918 | DIFFICULTY-AWARE NEURAL BAND-TO-PIANO SCORE ARRANGEMENT BASED ON NOTE- AND STATISTIC-LEVEL CRITERIA |
9237 | DIGRAPH SIGNAL PROCESSING WITH GENERALIZED BOUNDARY CONDITIONS |
1629 | DILATED CONVOLUTIONAL NEURAL NETWORK-BASED DEEP REFERENCE PICTURE GENERATION FOR VIDEO COMPRESSION |
4347 | DIRECT DESIGN OF BIQUAD FILTER CASCADES WITH DEEP LEARNING BY SAMPLING RANDOM POLYNOMIALS |
3121 | DIRECT LOCALIZATION: AN ISING MODEL APPROACH |
5347 | DIRECT NOISY SPEECH MODELING FOR NOISY-TO-NOISY VOICE CONVERSION |
4703 | DISCOURSE-LEVEL PROSODY MODELING WITH A VARIATIONAL AUTOENCODER FOR NON-AUTOREGRESSIVE EXPRESSIVE SPEECH SYNTHESIS |
3111 | DISCRETE MULTI-KERNEL K-MEANS WITH DIVERSE AND OPTIMAL KERNEL LEARNING |
2739 | DISENTANGLED FEATURE-GUIDED MULTI-EXPOSURE HIGH DYNAMIC RANGE IMAGING |
3580 | DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION |
1888 | DISENTANGLING CONTENT AND FINE-GRAINED PROSODY INFORMATION VIA HYBRID ASR BOTTLENECK FEATURES FOR VOICE CONVERSION |
6570 | DISPEECH: A SYNTHETIC TOY DATASET FOR SPEECH DISENTANGLING |
2821 | DISTILHUBERT: SPEECH REPRESENTATION LEARNING BY LAYER-WISE DISTILLATION OF HIDDEN-UNIT BERT |
3388 | DISTRIBUTED AUDIO-VISUAL PARSING BASED ON MULTIMODAL TRANSFORMER AND DEEP JOINT SOURCE CHANNEL CODING |
4760 | DISTRIBUTED GRAPH LEARNING WITH SMOOTH DATA PRIORS |
9189 | DISTRIBUTED HYBRID BEAMFORMING FOR MMWAVE CELL-FREE MASSIVE MIMO |
5681 | DISTRIBUTED IMAGE TRANSMISSION USING DEEP JOINT SOURCE-CHANNEL CODING |
3947 | DISTRIBUTED LABEL DEQUANTIZED GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR MULTI-VIEW DATA INTEGRATION |
9027 | DISTRIBUTED LINK SPARSIFICATION FOR SCALABLE SCHEDULING USING GRAPH NEURAL NETWORKS |
1508 | DISTRIBUTED PARTICLE FILTERS FOR STATE TRACKING ON THE STIEFEL MANIFOLD USING TANGENT SPACE STATISTICS |
4412 | DISTRIBUTION AUGMENTATION FOR LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH |
8848 | DISTRIBUTION LEARNING FOR AGE ESTIMATION FROM SPEECH |
2440 | DIVERGENCE-GUIDED FEATURE ALIGNMENT FOR CROSS-DOMAIN OBJECT DETECTION |
4642 | DIVERSE AUDIO CAPTIONING VIA ADVERSARIAL TRAINING |
4795 | DIVERSITY-CONTROLLABLE AND ACCURATE AUDIO CAPTIONING BASED ON NEURAL CONDITION |
1514 | DMANET: DEEP LEARNING-BASED DIFFERENTIAL MICROPHONE ARRAYS FOR MULTI-CHANNEL SPEECH SEPARATION |
1485 | DNN BASED MULTIFRAME SINGLE-CHANNEL NOISE REDUCTION FILTERS |
2086 | DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors |
1239 | DO YOU LIVE A HEALTHY LIFE? ANALYZING LIFESTYLE BY VISUAL LIFE LOGGING |
1914 | DOA ESTIMATION VIA COARRAY TENSOR COMPLETION WITH MISSING SLICES |
1243 | DOA M-ESTIMATION USING SPARSE BAYESIAN LEARNING |
1211 | DOCUMENT-LEVEL EVENT EXTRACTION VIA HUMAN-LIKE READING PROCESS |
9312 | DOMAIN ADAPTATION FOR FOOD INTAKE CLASSIFICATION WITH TEACHER/STUDENT LEARNING |
4825 | Domain Adaptation for Speaker Recognition in Singing and Spoken Voice |
4863 | DOMAIN ADAPTATION VIA MUTUAL INFORMATION MAXIMIZATION FOR HANDWRITING RECOGNITION |
8459 | DOMAIN DECOMPOSITION ALGORITHMS FOR REAL-TIME HOMOGENEOUS DIFFUSION INPAINTING IN 4K |
1435 | DOMAIN GENERALIZED FEW-SHOT IMAGE CLASSIFICATION VIA META REGULARIZATION NETWORK |
4020 | DOMAIN ROBUST DEEP EMBEDDING LEARNING FOR SPEAKER RECOGNITION |
3224 | DOMAIN-AGNOSTIC META-LEARNING FOR CROSS-DOMAIN FEW-SHOT CLASSIFICATION |
1552 | DomainDesc: Learning Local Descriptors with Domain Adaptation |
1534 | DOMAIN-INVARIANT FEATURE LEARNING FOR CROSS CORPUS SPEECH EMOTION RECOGNITION |
2998 | DOMAIN-INVARIANT REPRESENTATION LEARNING FROM EEG WITH PRIVATE ENCODERS |
3310 | Don't Separate, Learn to Remix: End-to-End Neural Remixing with Joint Optimization |
8811 | DON'T SPEAK TOO FAST: THE IMPACT OF DATA BIAS ON SELF-SUPERVISED SPEECH MODELS |
1395 | DOUBLE CLOSED-LOOP NETWORK FOR IMAGE DEBLURRING |
1779 | DOUBLE NOISE MEAN TEACHER SELF-ENSEMBLING MODEL FOR SEMI-SUPERVISED TUMOR SEGMENTATION |
2934 | DOUBLE-RIS VERSUS SINGLE-RIS AIDED SYSTEMS: TENSOR-BASED MIMO CHANNEL ESTIMATION AND DESIGN PERSPECTIVES |
3462 | DOWNSTREAM AUGMENTATION GENERATION FOR CONTRASTIVE LEARNING |
1390 | DPCCN: DENSELY-CONNECTED PYRAMID COMPLEX CONVOLUTIONAL NETWORK FOR ROBUST SPEECH SEPARATION AND EXTRACTION |
2211 | DP-DWA: DUAL-PATH DYNAMIC WEIGHT ATTENTION NETWORK WITH STREAMING DFSMN-SAN FOR AUTOMATIC SPEECH RECOGNITION |
2333 | DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT |
3587 | DRC-NET: DENSELY CONNECTED RECURRENT CONVOLUTIONAL NEURAL NETWORK FOR SPEECH DEREVERBERATION |
5039 | DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning |
2459 | Dual Active Noise Control with Common Sensors |
8900 | DUAL ATTENTION POOLING NETWORK FOR RECORDING DEVICE CLASSIFICATION USING NEUTRAL AND WHISPERED SPEECH |
2702 | Dual Graph Cross-domain Few-shot Learning for Hyperspectral Image Classification |
6020 | DUAL PATH GRAPH CONVOLUTIONAL NETWORKS |
2106 | DUAL-ATTENTION NETWORK FOR FEW-SHOT SEGMENTATION |
1098 | Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement |
1919 | DUAL-DOMAIN LOW-RANK FUSION DEEP METRIC LEARNING FOR OFF-THE-PERSON ECG BIOMETRICS |
5414 | DURATION MODELING OF NEURAL TTS FOR AUTOMATIC DUBBING |
2295 | Dynamic Binary Neural Network by learning channel-wise thresholds |
2242 | DYNAMIC MULTI-SCALE LOSS BALANCE FOR OBJECT DETECTION |
5010 | Dynamic Point Cloud Interpolation |
3748 | DYNAMIC PORTFOLIO CUTS: A SPECTRAL APPROACH TO GRAPH-THEORETIC DIVERSIFICATION |
4643 | DYNAMIC RESOURCE OPTIMIZATION FOR ADAPTIVE FEDERATED LEARNING EMPOWERED BY RECONFIGURABLE INTELLIGENT SURFACES |
5032 | DYNAMIC SLIDING WINDOW FOR REALTIME DENOISING NETWORKS |
5984 | DYNAMIC TEXTURE RECOGNITION USING PDV HASHING AND DICTIONARY LEARNING ON MULTI-SCALE VOLUME LOCAL BINARY PATTERN |
3081 | DYNAMICALLY PRUNING SEGFORMER FOR EFFICIENT SEMANTIC SEGMENTATION |
2650 | DYNIMP: DYNAMIC IMPUTATION FOR WEARABLE SENSING DATA THROUGH SENSORY AND TEMPORAL RELATEDNESS |
3819 | DynSNN: A Dynamic Approach to Reduce Redundancy in Spiking Neural Networks |
1427 | DYSFLUENCY CLASSIFICATION IN STUTTERED SPEECH USING DEEP LEARNING FOR REAL-TIME APPLICATIONS |
1680 | EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION |
7989 | ECHO-AWARE ADAPTATION OF SOUND EVENT LOCALIZATION AND DETECTION IN UNKNOWN ENVIRONMENTS |
3784 | ECO-FEDSPLIT: FEDERATED LEARNING WITH ERROR-COMPENSATED COMPRESSION |
2383 | ECONOMICS OF SEMANTIC COMMUNICATION SYSTEM IN WIRELESS POWERED INTERNET OF THINGS |
2793 | EDGE SAMPLING OF GRAPHS BASED ON EDGE SMOOTHNESS |
3891 | EFFECT OF NOISE SUPPRESSION LOSSES ON SPEECH DISTORTION AND ASR PERFORMANCE |
3351 | EFFECTIVE AND INCONSPICUOUS OVER-THE-AIR ADVERSARIAL EXAMPLES WITH ADAPTIVE FILTERING |
3803 | EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION |
8880 | EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING |
4877 | EFFICIENT IDENTITY-BASED CHAMELEON HASH FOR MOBILE DEVICES |
9269 | EFFICIENT IMAGE-WARPING FRAMEWORK FOR CONTENT-ADAPTIVE SUPERPIXELS GENERATION |
3618 | EFFICIENT MONAURAL SPEECH SEPARATION WITH MULTISCALE TIME-DELAY SAMPLING |
4616 | EFFICIENT SEQUENCE TRAINING OF ATTENTION MODELS USING APPROXIMATIVE RECOMBINATION |
3395 | EFFICIENT TWO-STAGE BEAM TRAINING AND CHANNEL ESTIMATION FOR RIS-AIDED MMWAVE SYSTEMS VIA FAST ALTERNATING LEAST SQUARES |
2015 | EFFICIENT UNIVERSAL SHUFFLE ATTACK FOR VISUAL OBJECT TRACKING |
3491 | EFFICIENTLY AND GLOBALLY SOLVING JOINT BEAMFORMING AND COMPRESSION PROBLEM IN THE COOPERATIVE CELLULAR NETWORK VIA LAGRANGIAN DUALITY |
1213 | Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement |
4167 | EMBEDDING SIGNALS ON GRAPHS WITH UNBALANCED DIFFUSION EARTH MOVER’S DISTANCE |
3881 | EMGSE: ACOUSTIC/EMG FUSION FOR MULTIMODAL SPEECH ENHANCEMENT |
3550 | EMOQ-TTS: EMOTION INTENSITY QUANTIZATION FOR FINE-GRAINED CONTROLLABLE EMOTIONAL TEXT-TO-SPEECH |
2209 | EMOTIONFLOW: CAPTURE THE DIALOGUE LEVEL EMOTION TRANSITIONS |
2057 | ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT |
1215 | ENCRYPTED IMAGE VISUAL SECURITY INDEX VIA NON-LOCAL RECOGNIZABLE DEGREE EVALUATION |
4244 | ENCRYPTION RESISTANT DEEP NEURAL NETWORK WATERMARKING |
3327 | Endpoint Detection for Streaming End-to-End Multi-talker ASR |
2993 | End-to-end Alexa Device Arbitration |
8908 | END-TO-END ASR-ENHANCED NEURAL NETWORK FOR ALZHEIMER’S DISEASE DIAGNOSIS |
1740 | END-TO-END COMPLEX-VALUED MULTIDILATED CONVOLUTIONAL NEURAL NETWORK FOR JOINT ACOUSTIC ECHO CANCELLATION AND NOISE SUPPRESSION |
1133 | END-TO-END DEEP LEARNING-BASED ADAPTATION CONTROL FOR FREQUENCY-DOMAIN ADAPTIVE SYSTEM IDENTIFICATION |
2523 | END-TO-END KEYWORD SPOTTING USING NEURAL ARCHITECTURE SEARCH AND QUANTIZATION |
3001 | END-TO-END LOW RESOURCE KEYWORD SPOTTING THROUGH CHARACTER RECOGNITION AND BEAM-SEARCH RE-SCORING |
3835 | End-to-end multi-modal speech recognition with air and bone conducted speech |
5943 | END-TO-END MUSIC REMASTERING SYSTEM USING SELF-SUPERVISED AND ADVERSARIAL TRAINING |
4077 | END-TO-END NETWORK BASED ON TRANSFORMER FOR AUTOMATIC DETECTION OF COVID-19 |
1719 | End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline |
5220 | END-TO-END NEURAL SPEECH CODING FOR REAL-TIME COMMUNICATIONS |
2738 | End-to-End Speech Recognition from Federated Acoustic Models |
2387 | END-TO-END SPEECH RECOGNITION WITH JOINT DEREVERBERATION OF SUB-BAND AUTOREGRESSIVE ENVELOPES |
4743 | END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION |
3382 | ENERGY ALIGNMENT FOR BIAS RECTIFICATION IN CLASS INCREMENTAL LEARNING |
4802 | ENHANCE RNNLMS WITH HIERARCHICAL MULTI-TASK LEARNING FOR ASR |
6829 | ENHANCING AFFECTIVE REPRESENTATIONS OF MUSIC-INDUCED EEG THROUGH MULTIMODAL SUPERVISION AND LATENT DOMAIN ADAPTATION |
1560 | ENHANCING AND DISSECTING CROWD COUNTING BY SYNTHETIC DATA |
1599 | ENHANCING CLASS UNDERSTANDING VIA PROMPT-TUNING FOR ZERO-SHOT TEXT CLASSIFICATION |
8540 | ENHANCING CONTEXTUAL ENCODING WITH STAGE-CONFUSION AND STAGE-TRANSITION ESTIMATION FOR EEG-BASED SLEEP STAGING |
3187 | ENHANCING CONTRASTIVE LEARNING WITH TEMPORAL COGNIZANCE FOR AUDIO-VISUAL REPRESENTATION GENERATION |
4721 | Enhancing Privacy Through Domain Adaptive Noise Injection for Speech Emotion Recognition |
1541 | ENHANCING PROTOTYPICAL FEW-SHOT LEARNING BY LEVERAGING THE LOCAL-LEVEL STRATEGY |
8728 | ENHANCING SPEAKING STYLES IN CONVERSATIONAL TEXT-TO-SPEECH SYNTHESIS WITH GRAPH-BASED MULTI-MODAL CONTEXT MODELING |
5512 | ENHANCING UTILITY IN THE WATCHDOG PRIVACY MECHANISM |
2407 | ENRICH FEATURES FOR FEW-SHOT POINT CLOUD CLASSIFICATION |
3110 | ENTRAINMENT ANALYSIS FOR ASSESSMENT OF AUTISTIC SPEECH PROSODY USING BOTTLENECK FEATURES OF DEEP NEURAL NETWORK |
2361 | ENVIRONMENTAL SOUND EXTRACTION USING ONOMATOPOEIC WORDS |
9271 | EPIGRAPHICAL RELAXATION FOR MINIMIZING LAYERED MIXED NORMS |
4730 | Epileptic Spike Detection by Recurrent Neural Networks with Self-Attention Mechanism |
8911 | EQUAL LOSS: A SIMPLE LOSS FUNCTION FOR NOISE ROBUST LEARNING |
1829 | ER-PIQA: A TASK-GUIDED PEDESTRIAN IMAGE QUALITY ASSESSMENT VIA EMBEDDING RECONSTRUCTION |
5372 | ESPNET-SLU: ADVANCING SPOKEN LANGUAGE UNDERSTANDING THROUGH ESPNET |
2903 | ESTIMATING THE CONFIDENCE OF SPEECH SPOOFING COUNTERMEASURE |
9155 | ESTIMATION OF CHANNELS IN SYSTEMS WITH INTELLIGENT REFLECTING SURFACES |
5509 | ESTIMATION OF THE ADMITTANCE MATRIX IN POWER SYSTEMS UNDER LAPLACIAN AND PHYSICAL CONSTRAINTS |
4805 | EVALUATION OF ORTHOGONAL CHIRP DIVISION MULTIPLEXING FOR AUTOMOTIVE INTEGRATED SENSING AND COMMUNICATIONS |
4067 | EVALUATION OF VIDEO CODING FOR MACHINES WITHOUT GROUND TRUTH |
2337 | EVENT-BASED MULTIMODAL SPIKING NEURAL NETWORK WITH ATTENTION MECHANISM |
3231 | EVOLUTIONARY NEURAL ARCHITECTURE DESIGN OF LIQUID STATE MACHINE FOR IMAGE CLASSIFICATION |
2554 | EXACT PARTITIONING OF HIGH-ORDER PLANTED MODELS WITH A TENSOR NUCLEAR NORM CONSTRAINT |
2679 | EXACT SPARSE SUPER-RESOLUTION VIA MODEL AGGREGATION |
2999 | EXPECTATION CONSISTENT PLUG-AND-PLAY FOR MRI |
3663 | EXPERIMENTAL INVESTIGATION ON STFT PHASE REPRESENTATIONS FOR DEEP LEARNING-BASED DYSARTHRIC SPEECH DETECTION |
8808 | EXPERTS VERSUS ALL-ROUNDERS: TARGET LANGUAGE EXTRACTION FOR MULTIPLE TARGET LANGUAGES |
5131 | EXPLAINABLE ARTIFICIAL INTELLIGENCE FOR AUTHORSHIP ATTRIBUTION ON SOCIAL MEDIA |
3152 | EXPLAINABLE FACT-CHECKING THROUGH QUESTION ANSWERING |
4273 | EXPLAINING DEEP LEARNING MODELS FOR SPOOFING AND DEEPFAKE DETECTION WITH SHAPLEY ADDITIVE EXPLANATIONS |
1378 | Explicitly Modeling Importance and Coherence for Timeline Summarization |
4748 | EXPLOITING ANNOTATORS’ TYPED DESCRIPTION OF EMOTION PERCEPTION TO MAXIMIZE UTILIZATION OF RATINGS FOR SPEECH EMOTION RECOGNITION |
3184 | EXPLOITING CAPTION DIVERSITY FOR UNSUPERVISED VIDEO SUMMARIZATION |
2191 | EXPLOITING CROSS DOMAIN ACOUSTIC-TO-ARTICULATORY INVERTED FEATURES FOR DISORDERED SPEECH RECOGNITION |
3093 | EXPLOITING HYBRID MODELS OF TENSOR-TRAIN NETWORKS FOR SPOKEN COMMAND RECOGNITION |
9309 | Exploiting Information About the Structure of Signals of Opportunity for Passive Radar Performance Increase |
1659 | Exploiting Language Model for Efficient Linguistic Steganalysis |
9290 | EXPLOITING TEMPORAL CONTEXT IN CNN BASED MULTISOURCE DOA ESTIMATION |
6888 | EXPLORING AUDITORY ACOUSTIC FEATURES FOR THE DIAGNOSIS OF COVID-19 |
2435 | EXPLORING CATEGORY CONSISTENCY FOR WEAKLY SUPERVISED SEMANTIC SEGMENTATION |
4004 | EXPLORING COMPLEMENTARITY OF GLOBAL AND LOCAL SPATIOTEMPORAL INFORMATION FOR FAKE FACE VIDEO DETECTION |
3673 | EXPLORING DEEPER GRAPH CONVOLUTIONS FOR SEMI-SUPERVISED NODE CLASSIFICATION |
8708 | EXPLORING DEMENTIA DETECTION FROM SPEECH: CROSS CORPUS ANALYSIS |
1364 | Exploring Dual Stream Global Information for Image Captioning |
3916 | EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION |
4685 | EXPLORING HETEROGENEOUS CHARACTERISTICS OF LAYERS IN ASR MODELS FOR MORE EFFICIENT TRAINING |
1750 | Exploring Machine Speech Chain for Domain Adaptation |
3637 | EXPLORING NON-AUTOREGRESSIVE END-TO-END NEURAL MODELING FOR ENGLISH MISPRONUNCIATION DETECTION AND DIAGNOSIS |
5291 | EXPLORING THE EFFECT OF L0/L2 REGULARIZATION IN NEURAL NETWORK PRUNING USING THE LC TOOLKIT |
1962 | EXPLORING TRANSFERABILITY MEASURES AND DOMAIN SELECTION IN CROSS-DOMAIN SLOT FILLING |
3449 | EXPLORING TRANSFORMER’S POTENTIAL ON AUTOMATIC PIANO TRANSCRIPTION |
9259 | EXPONENTIAL HYPERBOLIC COSINE ROBUST ADAPTIVE FILTERS FOR AUDIO SIGNAL PROCESSING |
4842 | EXTENDED GRAPH TEMPORAL CLASSIFICATION FOR MULTI-SPEAKER END-TO-END ASR |
9129 | EXTENDING THE USE OF MDL FOR HIGH-DIMENSIONAL PROBLEMS: VARIABLE SELECTION, ROBUST FITTING, AND ADDITIVE MODELING |
8521 | EXTRACTING AND DISTILLING DIRECTION-ADAPTIVE KNOWLEDGE FOR LIGHTWEIGHT OBJECT DETECTION IN REMOTE SENSING IMAGES |
8865 | EXTREME-POINT PURSUIT FOR UNIT-MODULUS OPTIMIZATION |
5141 | EYES TELL ALL: IRREGULAR PUPIL SHAPES REVEAL GAN-GENERATED FACES |
1509 | Factorized Neural Transducer for Efficient Language Model Adaptation |
4761 | FAIRNESS-AWARE SELECTIVE SAMPLING ON ATTRIBUTED GRAPHS |
9233 | Fast Adaptive Active Noise Control Based on Modified Model-Agnostic Meta-Learning Algorithm |
4829 | FAST AND STABLE CONVERGENCE OF ONLINE SGD FOR CV@R-BASED RISK-AWARE LEARNING |
3282 | Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition |
5743 | FAST FAULT DIAGNOSIS METHOD OF ROLLING BEARINGS IN MULTI-SENSOR MEASUREMENT ENVIROMENT |
9267 | Fast Graph Filters for Decentralized Subspace Projection |
5050 | FAST GRAPH SAMPLING FOR SHORT VIDEO SUMMARIZATION USING GERSHGORIN DISC ALIGNMENT |
2937 | FAST LEARNING OF FAST TRANSFORMS, WITH GUARANTEES |
4287 | FAST LOW RANK COLUMN-WISE COMPRESSIVE SENSING FOR ACCELERATED DYNAMIC MRI |
4291 | FAST MULTISCALE DIFFUSION ON GRAPHS |
3707 | FAST TASK-SPECIFIC ADAPTATION IN SPOKEN LANGUAGE ASSESSMENT WITH META-LEARNING |
2922 | FAST VIDEO OBJECT SEGMENTATION VIA DYNAMIC YOLACT |
4589 | FastAudio: A Learnable Audio Front-End for Spoof Speech Detection |
1736 | FAST-RIR: FAST NEURAL DIFFUSE ROOM IMPULSE RESPONSE GENERATOR |
2625 | FAST-SLOW TRANSFORMER FOR VISUALLY GROUNDING SPEECH |
1747 | FAZ-BV: A DIABETIC MACULAR ISCHEMIA GRADING FRAMEWORK COMBINING FAZ ATTENTION NETWORK AND BLOOD VESSEL ENHANCEMENT FILTERS |
1581 | FDSNET: AN ACCURATE REAL-TIME SURFACE DEFECT SEGMENTATION NETWORK |
9303 | FEASIBILITY OF JOINT POWER OPTIMIZATION OF MULTIPLE SOURCE-DESTINATIONS IN AN AF RELAY NETWORK |
8768 | FEATURE AUGMENTATION LEARNING FOR FEW-SHOT PALMPRINT IMAGE RECOGNITION WITH UNCONSTRAINED ACQUISITION |
4754 | Feature Imitating Networks |
1246 | FEATURE SPACE MESSAGE PASSING NETWORK FOR MEDICAL IMAGE SEMANTIC SEGMENTATION |
4788 | FEATURE-BASED SENSING MATRIX DESIGN FOR ANALOG TO INFORMATION CONVERTERS |
2921 | FedClean: A Defense Mechanism Against Parameter Poisoning Attacks in Federated Learning |
2647 | Federated Learning Challenges and Opportunities: An Outlook |
5203 | FEDERATED MULTI-ARMED BANDIT VIA UNCOORDINATED EXPLORATION |
4784 | FEDERATED OVER-AIR ROBUST SUBSPACE TRACKING FROM MISSING DATA |
4970 | FEDERATED SELF-SUPERVISED LEARNING FOR ACOUSTIC EVENT CLASSIFICATION |
2802 | FEDERATED SELF-TRAINING FOR DATA-EFFICIENT AUDIO RECOGNITION |
1445 | FEDERATED STOCHASTIC GRADIENT DESCENT BEGETS SELF-INDUCED MOMENTUM |
5243 | FEW-SHOT GAZE ESTIMATION WITH MODEL OFFSET PREDICTORS |
2713 | FEW-SHOT GENERATION BY MODELING STEREOSCOPIC PRIORS |
4026 | Few-shot learning with improved local representations via bias rectify module |
4232 | FEW-SHOT MUSICAL SOURCE SEPARATION |
4130 | FEW-SHOT OBJECT DETECTION WITH LOCAL CORRESPONDENCE RPN and ATTENTIVE HEAD |
9187 | FEW-SHOT ONE-CLASS DOMAIN ADAPTATION BASED ON FREQUENCY FOR IRIS PRESENTATION ATTACK DETECTION |
9326 | FifthNet: Structured Compact Neural Networks for Automatic Chord Recognition |
3589 | FilterAugment: An Acoustic Environmental Data Augmentation Method |
1059 | FIND THE WAY BACK: INVERTIBLE KERNEL ESTIMATOR FOR BLIND IMAGE SUPER-RESOLUTION |
6873 | Fine-Grained Dynamic Loss for Accurate Single-Image Super-Resolution |
4787 | FINE-GRAINED STYLE CONTROL IN TRANSFORMER-BASED TEXT-TO-SPEECH SYNTHESIS |
1826 | FINE-TUNING WAV2VEC2 FOR SPEAKER RECOGNITION |
9175 | FINT: FIELD-AWARE INTERACTION NEURAL NETWORK FOR CLICK-THROUGH RATE PREDICTION |
2747 | FLDP: Flexible strategy for local differential privacy |
4814 | Floor Plan Reconstruction with High-Precision RF-based Tracking |
9073 | FLOW-BASED FAST MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR BLIND SOURCE SEPARATION |
2235 | FLOW-BASED POINT CLOUD COMPLETION NETWORK WITH ADVERSARIAL REFINEMENT |
3217 | FLOWDT: A FLOW-AWARE DIGITAL TWIN FOR COMPUTER NETWORKS |
4716 | FORENSIC ANALYSIS AND LOCALIZATION OF MULTIPLY COMPRESSED MP3 AUDIO USING TRANSFORMERS |
1298 | FOSTERING THE ROBUSTNESS OF WHITE-BOX DEEP NEURAL NETWORK WATERMARKS BY NEURON ALIGNMENT |
1677 | FOV-BASED CODING OPTIMIZATION FOR 360-DEGREE VIRTUAL REALITY VIDEOS |
2464 | FRACTURE DETECTION AND LOCALIZATION IN CHEST X-RAYS USING SEMI-SUPERVISED LEARNING WITH DYNAMIC SHARPENING |
2381 | FrAUG: A Frame Rate Based Data Augmentation Method for Depression Detection from Speech Signals |
1945 | FREE LUNCH FOR CROSS-DOMAIN OCCLUDED FACE RECOGNITION WITHOUT SOURCE DATA |
3545 | FRE-GAN 2: FAST AND EFFICIENT FREQUENCY-CONSISTENT AUDIO SYNTHESIS |
9262 | Frequency Domain Long-Term Prediction for Low Delay General Audio Coding |
3474 | FREQUENCY-SPECIFIC NON-LINEAR GRANGER CAUSALITY IN A NETWORK OF BRAIN SIGNALS |
2205 | FROM BOTTOM-UP TO TOP-DOWN: CHARACTERIZATION OF TRAINING PROCESS IN GAZE MODELING |
2584 | FROM SHALLOW TO DEEP: COMPOSITIONAL REASONING OVER GRAPHS FOR VISUAL QUESTION ANSWERING |
5900 | FRONTEND ATTRIBUTES DISENTANGLEMENT FOR SPEECH EMOTION RECOGNITION |
1748 | FSM: FEATURE SAMPLING MODULE FOR OBJECT DETECTION |
2074 | FSOINET: FEATURE-SPACE OPTIMIZATION-INSPIRED NETWORK FOR IMAGE COMPRESSIVE SENSING |
3022 | FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement |
6017 | FUSING ASR OUTPUTS IN JOINT TRAINING FOR SPEECH EMOTION RECOGNITION |
3118 | FUSION AND ORTHOGONAL PROJECTION FOR IMPROVED FACE-VOICE ASSOCIATION |
1734 | FUSION OF MODULATION SPECTRAL AND SPECTRAL FEATURES WITH SYMPTOM METADATA FOR IMPROVED SPEECH-BASED COVID-19 DETECTION |
3261 | FUSION-ID: A PHOTOPLETHYSMOGRAPHY AND MOTION SENSOR FUSION BIOMETRIC AUTHENTICATOR WITH FEW-SHOT ON-BOARDING |
9095 | GAN-BASED JOINT ACTIVITY DETECTION AND CHANNEL ESTIMATION FOR GRANT-FREE RANDOM ACCESS |
9058 | GANET: UNARY ATTENTION REACHES PAIRWISE ATTENTION VIA IMPLICIT GROUP CLUSTERING IN LIGHT-WEIGHT CNNS |
4702 | GATED MULTIMODAL FUSION WITH CONTRASTIVE LEARNING FOR TURN-TAKING PREDICTION IN HUMAN-ROBOT DIALOGUE |
4850 | GAZEATTENTIONNET: GAZE ESTIMATION WITH ATTENTIONS |
3694 | GENERALIZATION ABILITY OF MOS PREDICTION NETWORKS |
1371 | GENERALIZED AUTOCORRELATION ANALYSIS FOR MULTI-TARGET DETECTION |
1707 | GENERALIZED FACE ANTI-SPOOFING VIA CROSS-ADVERSARIAL DISENTANGLEMENT WITH MIXING AUGMENTATION |
4489 | GENERALIZED MATCHING PURSUITS FOR THE SPARSE OPTIMIZATION OF SEPARABLE OBJECTIVES |
3608 | GENERALIZED SLICED PROBABILITY METRICS |
4641 | GENERALIZED TIME DOMAIN VELOCITY VECTOR |
9042 | GENERALIZED ZERO-SHOT LEARNING USING CONDITIONAL WASSERSTEIN AUTOENCODER |
9240 | GENERALIZING AUC OPTIMIZATION TO MULTICLASS CLASSIFICATION FOR AUDIO SEGMENTATION WITH LIMITED TRAINING DATA |
2248 | GENERATING DISENTANGLED ARGUMENTS WITH PROMPTS: A SIMPLE EVENT EXTRACTION FRAMEWORK THAT WORKS |
1916 | GENERATION FOR UNSUPERVISED DOMAIN ADAPTATION: A GAN-BASED APPROACH FOR OBJECT CLASSIFICATION WITH 3D POINT CLOUD DATA |
2815 | GENERATION OF PERSONAL SOUND FIELDS IN REVERBERANT ENVIRONMENTS USING INTERFRAME CORRELATION |
3843 | GENERATIVE ADVERSARIAL NETWORK INCLUDING REFERRING IMAGE SEGMENTATION FOR TEXT-GUIDED IMAGE MANIPULATION |
5505 | GENRE-CONDITIONED ACOUSTIC MODELS FOR AUTOMATIC LYRICS TRANSCRIPTION OF POLYPHONIC MUSIC |
5710 | Genre-Conditioned Long-Term 3D Dance Generation Driven by Music |
3400 | GEOMETRIC LOW-RANK TENSOR APPROXIMATION FOR REMOTELY SENSED HYPERSPECTRAL AND MULTISPECTRAL IMAGERY FUSION |
3571 | GLASSOFORMER: A QUERY-SPARSE TRANSFORMER FOR POST-FAULT POWER GRID VOLTAGE PREDICTION |
3365 | GLOBAL EVOLUTION NEURAL NETWORK FOR SEGMENTATION OF REMOTE SENSING IMAGES |
3543 | GLOBAL OPTIMIZATION SOLUTION FOR DYNAMIC ADAPTIVE 360-DEGREE STREAMING |
3433 | GLOBAL-LOCAL FEATURE ENHANCEMENT NETWORK FOR ROBUST OBJECT DETECTION USING MMWAVE RADAR AND CAMERA |
8952 | GOAL-ORIENTED COMMUNICATION FOR EDGE LEARNING BASED ON THE INFORMATION BOTTLENECK |
2528 | GOS: A LARGE-SCALE ANNOTATED OUTDOOR SCENE SYNTHETIC DATASET |
8984 | GPU-ACCELERATED FORWARD-BACKWARD ALGORITHM WITH APPLICATION TO LATTICE-FREE MMI |
4756 | GRADIENT STALENESS IN ASYNCHRONOUS OPTIMIZATION UNDER RANDOM COMMUNICATION DELAYS |
1308 | GRADIENT VARIANCE LOSS FOR STRUCTURE-ENHANCED IMAGE SUPER-RESOLUTION |
4570 | Gradient-weighted Class Activation Mapping for spatio temporal graph convolutional network |
3762 | GRADUAL SURROGATE GRADIENT LEARNING IN DEEP SPIKING NEURAL NETWORKS |
2854 | GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION |
2282 | GRAPH CONVOLUTION FOR RE-RANKING IN PERSON RE-IDENTIFICATION |
3490 | GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA |
4634 | GRAPH CONVOLUTIONAL NETWORKS WITH AUTOENCODER-BASED COMPRESSION AND MULTI-LAYER GRAPH LEARNING |
2856 | Graph Fine-Grained Contrastive Representation Learning |
1796 | GRAPH LEARNING BASED AUTOENCODER FOR HYPERSPECTRAL BAND SELECTION |
3269 | Graph Learning from Multivariate Dependent Time Series via a Multi-Attribute Formulation |
6022 | GRAPH LEARNING INFORMATION CRITERION |
9268 | GRAPH SIGNAL PROCESSING: VERTEX MULTIPLICATION |
5577 | GRAPH-BASED POINT CLOUD DENOISING USING SHAPE-AWARE CONSISTENCY FOR FREE-VIEWPOINT VIDEO |
3275 | GRAPHON-AIDED JOINT ESTIMATION OF MULTIPLE GRAPHS |
3537 | GRAPH-STRUCTURED SPARSE REGULARIZATION VIA CONVEX OPTIMIZATION |
4133 | Grassmannian Dimensionality Reduction Using Triplet Margin Loss for UME Classification of 3D Point Clouds |
9248 | GRIDLESS DOA ESTIMATION AND ROOT-MUSIC FOR NON-UNIFORM LINEAR ARRAYS |
1616 | Gridless DOA Estimation Under the Multi-frequency Model |
5752 | Group-wise Feature Selection for Supervised Learning |
3919 | HALF INVERTED NESTED ARRAYS WITH LARGE HOLE-FREE FOURTH-ORDER DIFFERENCE CO-ARRAYS |
5258 | Hand Gesture Recognition Using Temporal Convolutions and Attention Mechanism |
5144 | HARMONIC AND PERCUSSIVE SOUND SEPARATION BASED ON MIXED PARTIAL DERIVATIVE OF PHASE SPECTROGRAM |
5158 | HARMONICITY PLAYS A CRITICAL ROLE IN DNN BASED VERSUS IN BIOLOGICALLY-INSPIRED MONAURAL SPEECH SEGREGATION SYSTEMS |
9252 | HARMONIC-TEMPORAL FACTOR DECOMPOSITION FOR UNSUPERVISED MONAURAL SEPARATION OF HARMONIC SOUNDS |
4907 | HARVESTING PARTIALLY-DISJOINT TIME-FREQUENCY INFORMATION FOR IMPROVING DEGENERATE UNMIXING ESTIMATION TECHNIQUE |
5695 | HAVE BEST OF BOTH WORLDS: TWO-PASS HYBRID AND E2E CASCADING FRAMEWORK FOR SPEECH RECOGNITION |
1639 | HBP: AN EFFICIENT BLOCK PERMUTATION SOLVER USING HUNGARIAN ALGORITHM AND SPECTROGRAM INPAINTING FOR MULTICHANNEL AUDIO SOURCE SEPARATION |
1986 | HEART RATE AND OXYGEN SATURATION ESTIMATION FROM FACIAL VIDEO WITH MULTIMODAL PHYSIOLOGICAL DATA GENERATION |
9117 | Heterogeneous Graph Node Classification with Multi-Hops Relation Features |
8799 | HEURISTIC DROPOUT: AN EFFICIENT REGULARIZATION METHOD FOR MEDICAL IMAGE SEGMENTATION MODELS |
3955 | HGCN: HARMONIC GATED COMPENSATION NETWORK FOR SPEECH ENHANCEMENT |
1876 | HIERARCHICAL AND MULTI-VIEW DEPENDENCY MODELLING NETWORK FOR CONVERSATIONAL EMOTION RECOGNITION |
1333 | HIERARCHICAL CLASSIFICATION OF SINGING ACTIVITY, GENDER, AND TYPE IN COMPLEX MUSIC RECORDINGS |
1980 | HIERARCHICAL CONDITIONAL END-TO-END ASR WITH CTC AND MULTI-GRANULAR SUBWORD UNITS |
2649 | HIERARCHICAL DEEP LEARNING MODEL WITH INERTIAL AND PHYSIOLOGICAL SENSORS FUSION FOR WEARABLE-BASED HUMAN ACTIVITY RECOGNITION |
3519 | HIERARCHICAL FEATURE AGGREGATION NETWORK FOR DEEP IMAGE COMPRESSION |
5946 | Hierarchical Graph-based Neural Network for Singing Melody Extraction |
4490 | HIERARCHICAL PROSODY MODELING AND CONTROL IN NON-AUTOREGRESSIVE PARALLEL NEURAL TTS |
4181 | Hierarchical Signal Fusion Network for Pulsar Detection with Phase-Correlation and Signal Attentions |
2127 | HIFIDENOISE: HIGH-FIDELITY DENOISING TEXT TO SPEECH WITH ADVERSARIAL NETWORKS |
2776 | HIFI-SVC: FAST HIGH FIDELITY CROSS-DOMAIN SINGING VOICE CONVERSION |
4913 | HIGH-DIMENSIONAL SPARSE BAYESIAN LEARNING WITHOUT COVARIANCE MATRICES |
8090 | High-fidelity Portrait Editing via Exploring Differentiable Guided Sketches from the Latent Space |
2393 | HIGH-QUALITY SELF-SUPERVISED SNAPSHOT HYPERSPECTRAL IMAGING |
3410 | HIRL: Hybrid Image Restoration based on Hierarchical Deep Reinforcement Learning via Two-Step Analysis |
3815 | HISTOGRAM-GUIDED SEMANTIC-AWARE COLORIZATION |
3031 | HISTOKT: CROSS KNOWLEDGE TRANSFER IN COMPUTATIONAL PATHOLOGY |
2601 | HODGELETS: LOCALIZED SPECTRAL REPRESENTATIONS OF FLOWS ON SIMPLICIAL COMPLEXES |
1359 | HOLISTIC SEMI-SUPERVISED APPROACHES FOR EEG REPRESENTATION LEARNING |
9194 | HOQRI: Higher-order QR Iteration for Scalable Tucker Decomposition |
4726 | HOW CAN A COGNITIVE RADAR MASK ITS COGNITION? |
5261 | HOW NEURAL PROCESSES IMPROVE GRAPH LINK PREDICTION |
5252 | HOW SECURE ARE THE ADVERSARIAL EXAMPLES THEMSELVES? |
4739 | HTS-AT: A HIERARCHICAL TOKEN-SEMANTIC AUDIO TRANSFORMER FOR SOUND CLASSIFICATION AND DETECTION |
1608 | Human Decision Making with Bounded Rationality |
8946 | Human emotion recognition using multi-modal biological signals based on time lag-considered correlation maximization |
4769 | HYBRID ATTENTION-BASED PROTOTYPICAL NETWORKS FOR FEW-SHOT SOUND CLASSIFICATION |
5278 | HYBRID RNN-T/ATTENTION-BASED STREAMING ASR WITH TRIGGERED CHUNKWISE ATTENTION AND DUAL INTERNAL LANGUAGE MODEL INTEGRATION |
8997 | Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages |
3023 | Hybrid Weighting Loss for Precipitation Nowcasting from Radar Images |
2957 | HYPERGRAPH-BASED REINFORCEMENT LEARNING FOR STOCK PORTFOLIO SELECTION |
2082 | HYPERGRAPHS WITH EDGE-DEPENDENT VERTEX WEIGHTS: SPECTRAL CLUSTERING BASED ON THE 1-LAPLACIAN |
4353 | HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON CO-LEARNING THROUGH DUAL-ARCHITECTURE ENSEMBLE |
4141 | HYPERSPECTRAL IMAGE SUPER-RESOLUTION WITH DEEP PRIORS AND DEGRADATION MODEL INVERSION |
9243 | Identification of Edge Disconnections in Networks Based on Graph Filter Outputs |
4704 | IDENTIFICATION OF PULSE STREAMS OF UNKNOWN SHAPE FROM TIME ENCODING MACHINE SAMPLES |
4070 | IMAGE DENOISING WITH DEEP UNFOLDING AND NORMALIZING FLOWS |
9285 | Image Restoration via Reconciliation of Group Sparsity and Low-Rank Models |
4245 | IMAGE STEGANALYSIS WITH CONVOLUTIONAL VISION TRANSFORMER |
2111 | IMAGE-TEXT ALIGNMENT AND RETRIEVAL USING LIGHT-WEIGHT TRANSFORMER |
2640 | IMAGE-TO-GRAPH TRANSFORMERS FOR CHEMICAL STRUCTURE RECOGNITION |
1148 | IMAGE-TO-VIDEO RE-IDENTIFICATION VIA MUTUAL DISCRIMINATIVE KNOWLEDGE TRANSFER |
3502 | Importance of switch optimization criterion in Switching WPE dereverberation |
4561 | IMPORTANCE SAMPLING CAMS FOR WEAKLY-SUPERVISED SEGMENTATION |
8989 | IMPORTANTAUG: A DATA AUGMENTATION AGENT FOR SPEECH |
2565 | IMPQ: REDUCED COMPLEXITY NEURAL NETWORKS VIA GRANULAR PRECISION ASSIGNMENT |
2590 | IMPROVE FEW-SHOT VOICE CLONING USING MULTI-MODAL LEARNING |
3456 | IMPROVE IMAGE CAPTIONING VIA RELATION MODELING |
9178 | IMPROVED BEAMFORMING ENCODING FOR JOINT RADAR AND COMMUNICATION |
5081 | IMPROVED LANGUAGE IDENTIFICATION THROUGH CROSS-LINGUAL SELF-SUPERVISED LEARNING |
5826 | IMPROVED META LEARNING FOR LOW RESOURCE SPEECH RECOGNITION |
4605 | IMPROVED REPRESENTATION LEARNING FOR ACOUSTIC EVENT CLASSIFICATION USING TREE-STRUCTURED ONTOLOGY |
4601 | IMPROVED SIMULATION OF REALISTICALLY-SPATIALISED SIMULTANEOUS SPEECH USING MULTI-CAMERA ANALYSIS IN THE CHIME-5 DATASET |
4732 | IMPROVED SINGING VOICE SEPARATION WITH CHROMAGRAM-BASED PITCH-AWARE REMIXING |
8789 | IMPROVING ACTOR-CRITIC REINFORCEMENT LEARNING VIA HAMILTONIAN MONTE CARLO METHOD |
1453 | IMPROVING ADVERSARIAL WAVEFORM GENERATION BASED SINGING VOICE CONVERSION WITH HARMONIC SIGNALS |
1013 | IMPROVING ANOMALY DETECTION WITH A SELF-SUPERVISED TASK BASED ON GENERATIVE ADVERSARIAL NETWORK |
3270 | IMPROVING BCI-BASED COLOR VISION ASSESSMENT USING GAUSSIAN PROCESS REGRESSION |
1303 | IMPROVING BIOMEDICAL NAMED ENTITY RECOGNITION WITH A UNIFIED MULTI-TASK MRC FRAMEWORK |
4473 | IMPROVING BIRD CLASSIFICATION WITH UNSUPERVISED SOUND SEPARATION |
5783 | IMPROVING BRAIN DECODING METHODS AND EVALUATION |
8838 | Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models |
2646 | IMPROVING CLASS ACTIVATION MAP FOR WEAKLY SUPERVISED OBJECT LOCALIZATION |
3210 | IMPROVING CONFIDENCE ESTIMATION ON OUT-OF-DOMAIN DATA FOR END-TO-END SPEECH RECOGNITION |
2365 | Improving Contextual Coherence in Variational Personalized and Empathetic Dialogue Agents |
2148 | IMPROVING CROSS-LINGUAL SPEECH SYNTHESIS WITH TRIPLET TRAINING SCHEME |
1758 | IMPROVING CROSS-MODAL UNDERSTANDING IN VISUAL DIALOG VIA CONTRASTIVE LEARNING |
2725 | IMPROVING CTC-BASED SPEECH RECOGNITION VIA KNOWLEDGE TRANSFERRING FROM PRE-TRAINED LANGUAGE MODELS |
2931 | IMPROVING DIALOGUE GENERATION VIA PROACTIVELY QUERYING GROUNDED KNOWLEDGE |
1537 | IMPROVING DUAL-MICROPHONE SPEECH ENHANCEMENT BY LEARNING CROSS-CHANNEL FEATURES WITH MULTI-HEAD ATTENTION |
2970 | IMPROVING DYNAMIC GRAPH CONVOLUTIONAL NETWORK WITH FINE-GRAINED ATTENTION MECHANISM |
1837 | IMPROVING EMOTIONAL SPEECH SYNTHESIS BY USING SUS-CONSTRAINED VAE AND TEXT ENCODER AGGREGATION |
1498 | IMPROVING END-TO-END CONTEXTUAL SPEECH RECOGNITION WITH FINE-GRAINED CONTEXTUAL KNOWLEDGE SELECTION |
3228 | IMPROVING END-TO-END MODELS FOR SET PREDICTION IN SPOKEN LANGUAGE UNDERSTANDING |
2902 | IMPROVING END-TO-END SPEECH TRANSLATION MODEL WITH BERT-BASED CONTEXTUAL INFORMATION |
4647 | Improving Factored Hybrid HMM Acoustic Modeling without State Tying |
2064 | IMPROVING FAIRNESS IN SPEAKER VERIFICATION VIA GROUP-ADAPTED FUSION NETWORK |
2914 | IMPROVING FASTSPEECH TTS WITH EFFICIENT SELF-ATTENTION AND COMPACT FEED-FORWARD NETWORK |
2806 | IMPROVING FEATURE GENERALIZABILITY WITH MULTITASK LEARNING IN CLASS INCREMENTAL LEARNING |
2516 | IMPROVING INFERENCE FOR SPATIAL SIGNALS BY CONTEXTUAL FALSE DISCOVERY RATES |
1521 | Improving Joint Sparse Hyperspectral Unmixing by Simultaneously Clustering Pixels According to their Mixtures |
3126 | IMPROVING LYRICS ALIGNMENT THROUGH JOINT PITCH DETECTION |
8615 | Improving Maximum Likelihood Difference Scaling method to measure inter content scale |
4455 | IMPROVING NOISE ROBUSTNESS OF CONTRASTIVE SPEECH REPRESENTATION LEARNING WITH SPEECH RECONSTRUCTION |
4920 | IMPROVING NON-AUTOREGRESSIVE END-TO-END SPEECH RECOGNITION WITH PRE-TRAINED ACOUSTIC AND LANGUAGE MODELS |
5216 | IMPROVING PHASE-RECTIFIED SIGNAL AVERAGING FOR FETAL HEART RATE ANALYSIS |
2475 | IMPROVING PHONETIC REALIZATIONS IN TTS BY USING PHONEME-ALIGNED GRAPHEMES |
2076 | IMPROVING PSEUDO-LABEL TRAINING FOR END-TO-END SPEECH RECOGNITION USING GRADIENT MASK |
1846 | IMPROVING RECOGNITION-SYNTHESIS BASED ANY-TO-ONE VOICE CONVERSION WITH CYCLIC TRAINING |
3621 | IMPROVING REFERENCE-BASED IMAGE COLORIZATION FOR LINE ARTS VIA FEATURE AGGREGATION AND CONTRASTIVE LEARNING |
3002 | IMPROVING SELF-SUPERVISED LEARNING FOR SPEECH RECOGNITION WITH INTERMEDIATE LAYER SUPERVISION |
2368 | IMPROVING SEPARATION-BASED SPEAKER DIARIZATION VIA ITERATIVE MODEL REFINEMENT AND SPEAKER EMBEDDING BASED POST-PROCESSING |
2720 | IMPROVING SOURCE SEPARATION BY EXPLICITLY MODELING DEPENDENCIES BETWEEN SOURCES |
1222 | IMPROVING SPOKEN LANGUAGE UNDERSTANDING BY ENHANCING TEXT REPRESENTATION |
3278 | IMPROVING THE CLASSIFICATION OF PHONETIC SEGMENTS FROM RAW ULTRASOUND USING SELF-SUPERVISED LEARNING AND HARD EXAMPLE MINING |
3970 | IMPROVING THE FUSION OF ACOUSTIC AND TEXT REPRESENTATIONS IN RNN-T |
2065 | IMPROVING THE LATENCY AND QUALITY OF CASCADED ENCODERS |
4572 | Improving Ultrasound Image Classification With Local Texture Quantisation |
3328 | In Pursuit of Preserving the Fidelity of Adversarial Images |
2668 | INCIPIENT FAULT SEVERITY ESTIMATION USING LOCAL MAHALANOBIS DISTANCE |
1225 | INCOHERENT SYNTHESIS OF SPARSE BROADBAND ARRAYS BASED ON A PARAMETER-FREE SUBSPACE CLUSTERING |
3486 | INCORPORATING END-TO-END FRAMEWORK INTO TARGET-SPEAKER VOICE ACTIVITY DETECTION |
9221 | INCORPORATING GAZE BEHAVIOR USING JOINT EMBEDDING WITH SCENE CONTEXT FOR DRIVER TAKEOVER DETECTION |
3509 | Increasing Loudness in Audio Signals: a perceptually motivated approach to preserve audio quality |
4098 | INCREMENTAL CONTEXT AWARE ATTENTIVE KNOWLEDGE TRACING |
9232 | INCREMENTAL TEXT-TO-SPEECH SYNTHESIS USING PSEUDO LOOKAHEAD WITH LARGE PRETRAINED LANGUAGE MODEL |
5708 | INCREMENTAL USER EMBEDDING MODELING FOR PERSONALIZED TEXT CLASSIFICATION |
4562 | Independent Vector Analysis Based Subgroup Identification from Multisubject fMRI data |
9272 | INDEPENDENT VECTOR ANALYSIS VIA LOG-QUADRATICALLY PENALIZED QUADRATIC MINIMIZATION |
1386 | INDIVIDUALIZED HEAR-THROUGH FOR ACOUSTIC TRANSPARENCY USING PCA-BASED SOUND PRESSURE ESTIMATION AT THE EARDRUM |
5159 | INFANT CRYING DETECTION IN REAL-WORLD ENVIRONMENTS |
4486 | INFERGRAD: IMPROVING DIFFUSION MODELS FOR VOCODER BY CONSIDERING INFERENCE IN TRAINING |
2156 | Inferring Camera Intrinsics Based on Surfaces of Revolution: A Single Image Geometric Network Approach for Camera Calibration |
5127 | INFORMATION THEORETIC LIMITS FOR STANDARD AND ONE-BIT COMPRESSED SENSING WITH GRAPH-STRUCTURED SPARSITY |
2343 | Informative Attention Supervision for Grounded Video Description |
7908 | INITIALIZATION-FREE IMPLICIT-FOCUSING (IF2) FOR WIDEBAND DIRECTION-OF-ARRIVAL ESTIMATION |
9016 | INJECTING TEXT AND CROSS-LINGUAL SUPERVISION IN FEW-SHOT LEARNING FROM SELF-SUPERVISED MODELS |
6360 | INSTANTANEOUS LINEAR DIMENSIONALITY REDUCTION OF MULTICHANNEL TIME-SERIES SIGNAL FOR ARRAY SIGNAL PROCESSING |
2184 | INTEGER-ONLY ZERO-SHOT QUANTIZATION FOR EFFICIENT SPEECH RECOGNITION |
8863 | Integrated Sensing and Communications via 5G NR Waveform: Performance Analysis |
2997 | INTEGRATING DEPENDENCY TREE INTO SELF-ATTENTION FOR SENTENCE REPRESENTATION |
5232 | Integrating multiple ASR systems into NLP backend with attention fusion |
2933 | INTEGRATING PRETRAINED LANGUAGE MODEL FOR DIALOGUE POLICY EVALUATION |
4660 | INTEGRATING STATISTICAL UNCERTAINTY INTO NEURAL NETWORK-BASED SPEECH ENHANCEMENT |
3193 | INTEGRATING TEXT INPUTS FOR TRAINING AND ADAPTING RNN TRANSDUCER ASR MODELS |
1431 | INTEGRATION OF ANOMALY MACHINE SOUND DETECTION INTO ACTIVE NOISE CONTROL TO SHAPE THE RESIDUAL SOUND |
2481 | INTEGRATION OF PRE-TRAINED NETWORKS WITH CONTINUOUS TOKEN INTERFACE FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING |
4372 | Intelligent Wi-Fi Based Child Presence Detection System |
2783 | INTERACTIVE FEATURE FUSION FOR END-TO-END NOISE-ROBUST SPEECH RECOGNITION |
4654 | INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS |
5789 | INTERMIX: AN INTERFERENCE-BASED DATA AUGMENTATION AND REGULARIZATION TECHNIQUE FOR AUTOMATIC DEEP SOUND CLASSIFICATION |
5614 | INTERNET STREAMING AUDIO BASED SPEECH RECEPTION THRESHOLD MEASUREMENT IN COCHLEAR IMPLANT USERS |
4975 | INTERPRETABLE IMAGE CLASSIFICATION USING SPARSE OBLIQUE DECISION TREES |
2994 | INTERPRETING INTERMEDIATE CONVOLUTIONAL LAYERS IN UNSUPERVISED ACOUSTIC WORD CLASSIFICATION |
3168 | INVERSE IMAGING WITH GENERATIVE PRIORS VIA LANGEVIN DYNAMICS |
8412 | INVESTIGATING ROBUSTNESS OF BIOLOGICAL VS. BACKPROP BASED LEARNING |
9078 | INVESTIGATING SELF-SUPERVISED LEARNING FOR SPEECH ENHANCEMENT AND SEPARATION |
8766 | INVESTIGATING SEQUENCE-LEVEL NORMALISATION FOR CTC-LIKE END-TO-END ASR |
5588 | INVESTIGATING THE POTENTIAL OF AUXILIARY-CLASSIFIER GANS FOR IMAGE CLASSIFICATION IN LOW DATA REGIMES |
5054 | INVESTIGATION AND COMPARISON OF OPTIMIZATION METHODS FOR VARIATIONAL AUTOENCODER-BASED UNDERDETERMINED MULTICHANNEL SOURCE SEPARATION |
5182 | INVESTIGATION OF ROBUSTNESS OF HUBERT FEATURES FROM DIFFERENT LAYERS TO DOMAIN, ACCENT AND LANGUAGE VARIATIONS |
1606 | INVISIBLE AND EFFICIENT BACKDOOR ATTACKS FOR COMPRESSED DEEP NEURAL NETWORKS |
4619 | IS CROSS-ATTENTION PREFERABLE TO SELF-ATTENTION FOR MULTI-MODAL EMOTION RECOGNITION? |
2331 | ISDA: POSITION-AWARE INSTANCE SEGMENTATION WITH DEFORMABLE ATTENTION |
5268 | ISOMETRIC MT: NEURAL MACHINE TRANSLATION FOR AUTOMATIC DUBBING |
4283 | ISTFTNET: FAST AND LIGHTWEIGHT MEL-SPECTROGRAM VOCODER INCORPORATING INVERSE SHORT-TIME FOURIER TRANSFORM |
3640 | ITERATIVE CHANNEL ESTIMATION AND DATA DETECTION ALGORITHM FOR OTFS MODULATION |
2508 | Iterative Learning for Distorted Image Restoration |
2540 | Iterative Re-weighted Least Squares Algorithms for Non-negative Sparse and Group-sparse Recovery |
1385 | ITERATIVE SELF KNOWLEDGE DISTILLATION --- FROM POTHOLE CLASSIFICATION TO FINE-GRAINED AND COVID RECOGNITION |
1780 | ITOWAVE: ITO STOCHASTIC DIFFERENTIAL EQUATION IS ALL YOU NEED FOR WAVE GENERATION |
2119 | JE2Net: Joint Exploitation and Exploration in Reinforcement Learning Based Image Restoration |
1670 | JMPNET: JOINT MOTION PREDICTION FOR LEARNING-BASED VIDEO COMPRESSION |
5193 | JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS |
1220 | JOINT BEAM SELECTION AND PRECODING BASED ON DIFFERENTIAL EVOLUTION FOR MILLIMETER-WAVE MASSIVE MIMO SYSTEMS |
3791 | Joint calibration and mapping of satellite altimetry data using trainable variational models |
2749 | JOINT CENTRALITY ESTIMATION AND GRAPH IDENTIFICATION FROM MIXTURE OF LOW PASS GRAPH SIGNALS |
1324 | JOINT DUAL-DOMAIN MATRIX FACTORIZATION FOR ECG BIOMETRIC RECOGNITION |
1460 | JOINT EGO-NOISE SUPPRESSION AND KEYWORD SPOTTING ON SWEEPING ROBOTS |
2414 | Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index |
5079 | Joint Global-Local alignment for domain adaptive semantic segmentation |
1910 | JOINT HYPOGLYCEMIA PREDICTION AND GLUCOSE FORECASTING VIA DEEP MULTI-TASK LEARNING |
3198 | JOINT INFERENCE OF MULTIPLE GRAPHS WITH HIDDEN VARIABLES FROM STATIONARY GRAPH SIGNALS |
7004 | JOINT LEARNING FOR ADDRESSEE SELECTION AND RESPONSE GENERATION IN MULTI-PARTY CONVERSATION |
8931 | JOINT LEARNING OF FEATURE EXTRACTION AND COST AGGREGATION FOR SEMANTIC CORRESPONDENCE |
1095 | Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement |
3814 | JOINT MODEL ORDER ESTIMATION FOR MULTIPLE TENSORS WITH A COUPLED MODE AND APPLICATIONS TO THE JOINT DECOMPOSITION OF EEG, MEG MAGNETOMETER, AND GRADIOMETER TENSORS |
4764 | JOINT MODELING OF CODE-SWITCHED AND MONOLINGUAL ASR VIA CONDITIONAL FACTORIZATION |
1993 | JOINT MULTIPLE INTENT DETECTION AND SLOT FILLING VIA SELF-DISTILLATION |
3644 | Joint Normality Test via Two-dimensional Projection |
4109 | JOINT RADAR-COMMUNICATIONS PROCESSING FROM A DUAL-BLIND DECONVOLUTION PERSPECTIVE |
4992 | JOINT SOURCE LOCALIZATION AND ASSOCIATION THROUGH OVERCOMPLETE REPRESENTATION UNDER MULTIPATH PROPAGATION ENVIRONMENT |
9275 | Joint Source-Channel Coding for Semantics-Aware Grant-Free Radio Access in IoT Fog Networks |
1991 | JOINT SPEECH RECOGNITION AND AUDIO CAPTIONING |
5563 | JOINT TEMPORAL CONVOLUTIONAL NETWORKS AND ADVERSARIAL DISCRIMINATIVE DOMAIN ADAPTATION FOR EEG-BASED CROSS-SUBJECT EMOTION RECOGNITION |
6203 | JOINT UNSUPERVISED AND SUPERVISED TRAINING FOR MULTILINGUAL ASR |
3422 | KARASINGER: SCORE-FREE SINGING VOICE SYNTHESIS WITH VQ-VAE USING MEL-SPECTROGRAMS |
2446 | K-Converter: An unsupervised Singing Voice Conversion System |
1278 | KERNEL ESTIMATION NETWORK FOR BLIND SUPER-RESOLUTION |
1841 | KEY-SPARSE TRANSFORMER FOR MULTIMODAL SPEECH EMOTION RECOGNITION |
9308 | Kinship Verification Based on Cross-Generation Feature Interaction Learning |
2534 | Knowledge Augmented BERT Mutual Network in Multi-turn Spoken Dialogues |
7593 | KNOWLEDGE DISTILLATION FOR NEURAL TRANSDUCERS FROM LARGE SELF-SUPERVISED PRE-TRAINED MODELS |
1850 | KNOWLEDGE DISTILLATION FROM LANGUAGE MODEL TO ACOUSTIC MODEL: A HIERARCHICAL MULTI-TASK LEARNING APPROACH |
2719 | KNOWLEDGE TRANSFER FROM LARGE-SCALE PRETRAINED LANGUAGE MODELS TO END-TO-END SPEECH RECOGNIZERS |
9287 | KRYLOV-LEVENBERG-MARQUARDT ALGORITHM FOR STRUCTURED TUCKER TENSOR DECOMPOSITIONS |
5206 | LABEL PROPAGATION ACROSS GRAPHS: NODE CLASSIFICATION USING GRAPH NEURAL TANGENT KERNELS |
4203 | LABEL-AWARE RANKED LOSS FOR ROBUST PEOPLE COUNTING USING AUTOMOTIVE IN-CABIN RADAR |
2269 | LABEL-OCCURRENCE-BALANCED MIXUP FOR LONG-TAILED RECOGNITION |
1773 | LANGUAGE ADAPTIVE CROSS-LINGUAL SPEECH REPRESENTATION LEARNING WITH SPARSE SHARING SUB-NETWORKS |
1909 | LARGE-SCALE ASR DOMAIN ADAPTATION USING SELF- AND SEMI-SUPERVISED LEARNING |
1587 | LARGE-SCALE INDEPENDENT COMPONENT ANALYSIS BY SPEEDING UP LIE GROUP TECHNIQUES |
4276 | LARGE-SCALE SELF-SUPERVISED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEAKER VERIFICATION |
5526 | LATENT SPACE SLICING FOR ENHANCED ENTROPY MODELING IN LEARNING-BASED POINT CLOUD GEOMETRY COMPRESSION |
2973 | LATTENTION: LATTICE-ATTENTION IN ASR RESCORING |
2325 | LATTICE RESCORING BASED ON LARGE ENSEMBLE OF COMPLEMENTARY NEURAL LANGUAGE MODELS |
8432 | LATTICEBART: LATTICE-TO-LATTICE PRE-TRAINING FOR SPEECH RECOGNITION |
3172 | LDNET: UNIFIED LISTENER DEPENDENT MODELING IN MOS PREDICTION FOR SYNTHETIC SPEECH |
4099 | Learnable Hypergraph Laplacian for Hypergraph Learning |
4479 | LEARNABLE NONLINEAR COMPRESSION FOR ROBUST SPEAKER VERIFICATION |
6265 | Learnable Wavelet Packet Transform for Data-Adapted Spectrograms |
5008 | LEARNED ACOUSTIC RECONSTRUCTION USING SYNTHETIC APERTURE FOCUSING |
3653 | LEARNING ACOUSTIC FRAME LABELING FOR PHONEME SEGMENTATION WITH REGULARIZED ATTENTION MECHANISM |
2710 | LEARNING ADJUSTABLE IMAGE RESCALING WITH JOINT OPTIMIZATION OF PERCEPTION AND DISTORTION |
4900 | LEARNING APPROACH FOR FAST APPROXIMATE MATRIX FACTORIZATIONS |
5602 | Learning Common Dependency Structure for Unsupervised Cross-Domain NER |
7049 | LEARNING CONTINUOUS REPRESENTATION OF AUDIO FOR ARBITRARY SCALE SUPER RESOLUTION |
3850 | LEARNING CORRELATION FOR ONLINE MULTIPLE OBJECT TRACKING |
3924 | LEARNING DECOUPLING FEATURES THROUGH ORTHOGONALITY REGULARIZATION |
6015 | LEARNING DEEP PATHOLOGICAL FEATURES FOR WSI-LEVEL CERVICAL CANCER GRADING |
3867 | LEARNING DOMAIN-INVARIANT TRANSFORMATION FOR SPEAKER VERIFICATION |
2437 | Learning Expanding Graphs for Signal Interpolation |
8610 | LEARNING FILTERBANKS FOR END-TO-END ACOUSTIC BEAMFORMING |
3215 | LEARNING GAUSSIAN GRAPHICAL MODELS WITH DIFFERING PAIRWISE SAMPLE SIZES |
5106 | LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION |
3776 | LEARNING MONOCULAR MESH RECOVERY OF MULTIPLE BODY PARTS VIA SYNTHETICS |
5230 | LEARNING MULTIPLE EXPLAINABLE AND GENERALIZABLE CUES FOR FACE ANTI-SPOOFING |
4530 | LEARNING MUSIC AUDIO REPRESENTATIONS VIA WEAK LANGUAGE SUPERVISION |
1401 | Learning Music Sequence Representation from Text Supervision |
4359 | LEARNING SEMANTIC-ALIGNED FEATURE REPRESENTATION FOR TEXT-BASED PERSON SEARCH |
4727 | LEARNING SOUND LOCALIZATION BETTER FROM SEMANTICALLY SIMILAR SAMPLES |
4483 | LEARNING SPARSE GRAPHS WITH A CORE-PERIPHERY STRUCTURE |
5459 | LEARNING STRUCTURED SPARSITY FOR TIME-FREQUENCY RECONSTRUCTION |
4674 | LEARNING SUBJECT-INVARIANT REPRESENTATIONS FROM SPEECH-EVOKED EEG USING VARIATIONAL AUTOENCODERS |
1948 | LEARNING TASK-SPECIFIC REPRESENTATION FOR VIDEO ANOMALY DETECTION WITH SPATIAL-TEMPORAL ATTENTION |
2634 | LEARNING TO ENHANCE OR NOT: NEURAL NETWORK-BASED SWITCHING OF ENHANCED AND OBSERVED SIGNALS FOR OVERLAPPING SPEECH RECOGNITION |
2801 | LEARNING TO FUSE HETEROGENEOUS FEATURES FOR LOW-LIGHT IMAGE ENHANCEMENT |
5629 | LEARNING TO INTEGRATE VISION DATA INTO ROAD NETWORK DATA |
2834 | LEARNING TO PREDICT SPEECH IN SILENT VIDEOS VIA AUDIOVISUAL ANALOGY |
1835 | LEARNING TO SAMPLE FOR SPARSE SIGNALS |
9298 | LEARNING YOUR HEART ACTIONS FROM PULSE: ECG WAVEFORM RECONSTRUCTION FROM PPG |
1615 | Learning-aided initialization for variational Bayesian DOA estimation |
2798 | LEARNING-BASED PERSONAL SPEECH ENHANCEMENT FOR TELECONFERENCING BY EXPLOITING SPATIAL-SPECTRAL FEATURES |
7953 | LEARNING-BASED RESOURCE ALLOCATION WITH DYNAMIC DATA RATE CONSTRAINTS |
4632 | LEARNINGS FROM FEDERATED LEARNING IN THE REAL WORLD |
9119 | LERPS: LIGHTING ESTIMATION AND RELIGHTING FOR PHOTOMETRIC STEREO |
1336 | LETR: A LIGHTWEIGHT AND EFFICIENT TRANSFORMER FOR KEYWORD SPOTTING |
1943 | LEVERAGING BILINEAR ATTENTION TO IMPROVE SPOKEN LANGUAGE UNDERSTANDING |
3194 | LEVERAGING LOCAL TEMPORAL INFORMATION FOR MULTIMODAL SCENE CLASSIFICATION |
3437 | Leveraging Sparse Coding for EEG Based Emotion Recognition in Shooting |
3032 | LIGHTPOSE: A LIGHTWEIGHT AND EFFICIENT MODEL WITH TRANSFORMER FOR HUMAN POSE ESTIMATION |
4683 | Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition |
4545 | LINEAR-TIME SAMPLING ON SIGNED GRAPHS VIA GERSHGORIN DISC PERFECT ALIGNMENT |
1422 | LIPREADING MODEL BASED ON WHOLE-PART COLLABORATIVE LEARNING |
5180 | LISTEN, KNOW AND SPELL: KNOWLEDGE-INFUSED SUBWORD MODELING FOR IMPROVING ASR PERFORMANCE OF OOV NAMED ENTITIES |
3420 | LiteHAR: LIGHTWEIGHT HUMAN ACTIVITY RECOGNITION FROM WIFI SIGNALS WITH RANDOM CONVOLUTION KERNELS |
3024 | LMS AND NLMS ALGORITHMS FOR THE IDENTIFICATION OF IMPULSE RESPONSES WITH INTRINSIC SYMMETRIC OR ANTISYMMETRIC PROPERTIES |
2270 | LOCAL AND GLOBAL ALIGNMENTS FOR GENERALIZABLE SENSOR-BASED HUMAN ACTIVITY RECOGNITION |
2126 | LOCAL CONTEXT INTERACTION-AWARE GLYPH-VECTORS FOR CHINESE SEQUENCE TAGGING |
4076 | LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION |
2681 | LOCAL-GLOBAL FEATURE AGGREGATION FOR LIGHT FIELD IMAGE SUPER-RESOLUTION |
3186 | LOCALIZATION BASED SEQUENTIAL GROUPING FOR CONTINUOUS SPEECH SEPARATION |
9147 | LOCALIZING MORE SOURCES THAN SENSORS IN PRESENCE OF COHERENT SOURCES |
5387 | Locate This, Not That: Class-Conditioned Sound Event DOA Estimation |
2626 | LOCATION-BASED TRAINING FOR MULTI-CHANNEL TALKER-INDEPENDENT SPEAKER SEPARATION |
3216 | LOCUNET: FAST URBAN POSITIONING USING RADIO MAPS AND DEEP LEARNING |
1985 | LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION |
3120 | LOW COMPLEX ACCURATE MULTI-SOURCE RTF ESTIMATION |
8901 | LOW COMPLEXITY EQUALIZATION FOR AFDM IN DOUBLY DISPERSIVE CHANNELS |
4817 | Low Precision Local Learning for Hardware-friendly Neuromorphic Visual Recognition |
4323 | LOW RESOURCES ONLINE SINGLE-MICROPHONE SPEECH ENHANCEMENT WITH HARMONIC EMPHASIS |
2463 | LOW-COMPLEXITY ATTENTION MODELLING VIA GRAPH TENSOR NETWORKS |
2116 | LOW-COMPLEXITY MULTI-MODEL CNN IN-LOOP FILTER FOR AVS3 |
4306 | LOW-LATENCY HUMAN-COMPUTER AUDITORY INTERFACE BASED ON REAL-TIME VISION ANALYSIS |
1592 | LOW-LIGHT IMAGE ENHANCEMENT VIA FEATURE RESTORATION |
3141 | LOW-RANK PHASE RETRIEVAL WITH STRUCTURED TENSOR MODELS |
4713 | LPC AUGMENT: AN LPC-BASED ASR DATA AUGMENTATION ALGORITHM FOR LOW AND ZERO-RESOURCE CHILDREN’S DIALECTS |
4055 | LRPD: LARGE REPLAY PARALLEL DATASET |
3090 | L-SpEx: Localized Target Speaker Extraction |
3526 | M2MeT: THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION CHALLENGE |
9242 | MACRO: Multi-Attention Convolutional Recurrent Model for Subject-Independent ERP Detection |
5662 | MAG+: AN EXTENDED MULTIMODAL ADAPTATION GATE FOR MULTIMODAL SENTIMENT ANALYSIS |
4774 | MAGIC DUST FOR CROSS-LINGUAL ADAPTATION OF MONOLINGUAL WAV2VEC-2.0 |
1548 | MAKD:MULTIPLE AUXILIARY KNOWLEDGE DISTILLATION |
5202 | MAKING THE UNKNOWN MORE CERTAIN: A STACKED ENSEMBLE CLASSIFIER FOR OPEN GESTURE RECOGNITION WITH A SOCIAL ROBOT |
3787 | MA-NET: MULTI-SCALE ATTENTION-AWARE NETWORK FOR OPTICAL FLOW ESTIMATION |
3099 | MANIFOLD LEARNING-SUPPORTED ESTIMATION OF RELATIVE TRANSFER FUNCTIONS FOR SPATIAL FILTERING |
1063 | MANNER: MULTI-VIEW ATTENTION NETWORK FOR NOISE ERASURE |
5279 | MANNET: A LARGE-SCALE MANIPULATED IMAGE DETECTION DATASET AND BASELINE EVALUATIONS |
5657 | MAP: MULTISPECTRAL ADVERSARIAL PATCH TO ATTACK PERSON DETECTION |
1709 | MASK-BASED ATTENTION PARALLEL NETWORK FOR IN-THE-WILD FACIAL EXPRESSION RECOGNITION |
8427 | MASKED ACOUSTIC UNIT FOR MISPRONUNCIATION DETECTION AND CORRECTION |
5061 | MASSIVE UNSOURCED RANDOM ACCESS BASED ON BILINEAR VECTOR APPROXIMATE MESSAGE PASSING |
8995 | MASSIVELY MULTILINGUAL ASR: A LIFELONG LEARNING SOLUTION |
9281 | MATCHED MANIFOLD DETECTION FOR GROUP-INVARIANT REGISTRATION AND CLASSIFICATION OF IMAGES |
1595 | Matching Point Sets with Quantum Circuit Learning |
5653 | MATERIAL-GUIDED SIAMESE FUSION NETWORK FOR HYPERSPECTRAL OBJECT TRACKING |
4811 | MATRIX DECOMPOSITION ON GRAPHS: A SIMPLIFIED FUNCTIONAL VIEW |
5164 | MAXIMIZING AUDIO EVENT DETECTION MODEL PERFORMANCE ON SMALL DATASETS THROUGH KNOWLEDGE TRANSFER, DATA AUGMENTATION, AND PRETRAINING: AN ABLATION STUDY |
2637 | MAXIMUM BATCH FROBENIUS NORM FOR MULTI-DOMAIN TEXT CLASSIFICATION |
9251 | MAXIMUM LIKELIHOOD SENSOR ARRAY CALIBRATION USING NON-APPROXIMATE HESSION MATRIX |
9090 | MBA-RainGAN: A Multi-branch Attention Generative Adversarial Network for Mixture of Rain Removal |
1200 | MBNET: A MULTI-RESOLUTION BRANCH NETWORK FOR SEMANTIC SEGMENTATION OF ULTRA-HIGH RESOLUTION IMAGES |
5468 | MEJIGCLU: MORE EFFECTIVE JIGSAW CLUSTERING FOR UNSUPERVISED VISUAL REPRESENTATION LEARNING |
4101 | MELONS: GENERATING MELODY WITH LONG-TERM STRUCTURE USING TRANSFORMERS AND STRUCTURE GRAPH |
3251 | MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition |
2303 | Memory in Echo State Networks and the Controllability Matrix rank |
4502 | MEMORY-BASED MESSAGE PASSING: DECOUPLING THE MESSAGE FOR PROPAGATION FROM DISCRIMINATION |
1651 | Message Passing-based Cooperative Localization with embedded Particle Flow |
1454 | META TALK: LEARNING TO DATA-EFFICIENTLY GENERATE AUDIO-DRIVEN LIP-SYNCHRONIZED TALKING FACE WITH HIGH DEFINITION |
3783 | MetricBERT: Text Representation Learning via Self-Supervised Triplet Training |
2867 | METRICGAN-U: UNSUPERVISED SPEECH ENHANCEMENT/ DEREVERBERATION BASED ONLY ON NOISY/ REVERBERATED SPEECH |
3973 | MFA: TDNN WITH MULTI-SCALE FREQUENCY-CHANNEL ATTENTION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION WITH SHORT UTTERANCES |
8913 | MIMO Detection by Variational Posterior Inference |
1627 | MINIMIZING RESIDUALS FOR NATIVE-NONNATIVE VOICE CONVERSION IN A SPARSE, ANCHOR-BASED REPRESENTATION OF SPEECH |
8996 | MINIMUM WORD ERROR TRAINING FOR NON-AUTOREGRESSIVE TRANSFORMER-BASED CODE-SWITCHING ASR |
4174 | MINING HARD SAMPLES LOCALLY AND GLOBALLY FOR IMPROVED SPEECH SEPARATION |
2557 | MISMATCHED SUPERVISED LEARNING |
1952 | Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition |
1904 | MIXED IN TIME AND MODALITY: CURSE OR BLESSING? CROSS-INSTANCE DATA AUGMENTATION FOR WEAKLY SUPERVISED MULTIMODAL TEMPORAL FUSION |
5465 | MIXED KNOWLEDGE RELATION TRANSFORMER FOR IMAGE CAPTIONING |
2919 | MIXED PRECISION DNN QUANTIZATION FOR OVERLAPPED SPEECH SEPARATION AND RECOGNITION |
5270 | MIXED TRANSFORMER U-NET FOR MEDICAL IMAGE SEGMENTATION |
4242 | MIXER-TTS: NON-AUTOREGRESSIVE, FAST AND COMPACT TEXT-TO-SPEECH MODEL CONDITIONED ON LANGUAGE MODEL EMBEDDINGS |
4887 | MIXTURE MODEL AUTO-ENCODERS: DEEP CLUSTERING THROUGH DICTIONARY LEARNING |
3994 | MLP-SVNET : A MULTI-LAYER PERCEPTRONS BASED NETWORK FOR SPEAKER VERIFICATION |
4078 | MM-DFN: Multimodal Dynamic Fusion Network For Emotion Recognition in Conversations |
4506 | MMLATCH: BOTTOM-UP TOP-DOWN FUSION FOR MULTIMODAL SENTIMENT ANALYSIS |
8832 | MODEL SELECTION VIA MISSPECIFIED CRAMER-RAO BOUND MINIMIZATION |
1229 | MODEL-BASED APPROACH FOR MEASURING THE FAIRNESS IN ASR |
2978 | MODEL-BASED ONLINE LEARNING FOR RESOURCE SHARING IN JOINT RADAR-COMMUNICATION SYSTEMS |
4684 | MODEL-BASED RECONSTRUCTION FOR COLLIMATED BEAM ULTRASOUND SYSTEMS |
3454 | MODELING BEATS AND DOWNBEATS WITH A TIME-FREQUENCY TRANSFORMER |
6422 | MODELING HUMAN MEMORY IN MULTI-OBJECT TRACKING WITH TRANSFORMERS |
2539 | Modeling Intention, Emotion and External World in Dialogue Systems |
3104 | MODELING OF PRE-TRAINED NEURAL NETWORK EMBEDDINGS LEARNED FROM RAW WAVEFORM FOR COVID-19 INFECTION DETECTION |
3335 | Modeling the Detection Capability of High-Speed Spiking Cameras |
1397 | MODERNN: TOWARDS FINE-GRAINED MOTION DETAILS FOR SPATIOTEMPORAL PREDICTIVE LEARNING |
9154 | MODULO EVENT-DRIVEN SAMPLING: SYSTEM IDENTIFICATION AND HARDWARE EXPERIMENTS |
1084 | Monocular Vehicle 3D Bounding Box Estimation Using Homograhy and Geometry in Traffic Scene |
4799 | Monotonic Generalized Nash Games with Application to the Management of Energy-Aware Aloha Networks |
3977 | MOS Predictor for Synthetic Speech with I-vector Inputs |
2079 | MOTIF-TOPOLOGY AND REWARD-LEARNING IMPROVED SPIKING NEURAL NETWORK FOR EFFICIENT MULTI-SENSORY INTEGRATION |
9254 | Moving Source Localization in Passive Sensor Network With Location Uncertainty |
4693 | MRI RECOVERY WITH A SELF-CALIBRATED DENOISER |
1578 | MSDTRON: A HIGH-CAPABILITY MULTI-SPEAKER SPEECH SYNTHESIS SYSTEM FOR DIVERSE DATA USING CHARACTERISTIC INFORMATION |
5989 | MS-ROCANET: MULTI-SCALE RESIDUAL ORTHOGONAL-CHANNEL ATTENTION NETWORK FOR SCENE TEXT DETECTION |
8471 | MTAF: SHOPPING GUIDE MICRO-VIDEOS POPULARITY PREDICTION USING MULTIMODAL AND TEMPORAL ATTENTION FUSION APPROACH |
4063 | MULTI-ACCDOA: LOCALIZING AND DETECTING OVERLAPPING SOUNDS FROM THE SAME CLASS WITH AUXILIARY DUPLICATING PERMUTATION INVARIANT TRAINING |
3823 | MULTIBAND IMAGE FUSION WITH CONTROLLABLE ERROR GUARANTEES |
1440 | MULTI-CHANNEL ATTENTIVE GRAPH CONVOLUTIONAL NETWORK WITH SENTIMENT FUSION FOR MULTIMODAL SENTIMENT ANALYSIS |
1718 | MULTI-CHANNEL END-TO-END NEURAL DIARIZATION WITH DISTRIBUTED MICROPHONES |
5309 | MULTI-CHANNEL MULTI-SPEAKER ASR USING 3D SPATIAL FEATURE |
2406 | MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING |
7414 | MULTICHANNEL NOISE REDUCTION USING DILATED MULTICHANNEL U-NET AND PRE-TRAINED SINGLE-CHANNEL NETWORK |
1832 | MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS |
4593 | MULTI-CHANNEL SPEAKER VERIFICATION WITH CONV-TASNET BASED BEAMFORMER |
4723 | MULTI-CHANNEL SPEECH DENOISING FOR MACHINE EARS |
5674 | Multichannel Speech Enhancement without Beamforming |
1684 | MULTI-DOMAIN UNPAIRED ULTRASOUND IMAGE ARTIFACT REMOVAL USING A SINGLE CONVOLUTIONAL NEURAL NETWORK |
5719 | MULTI-DOMAIN UNSUPERVISED IMAGE-TO-IMAGE TRANSLATION WITH APPEARANCE ADAPTIVE CONVOLUTION |
4403 | MULTI-FEATURE INTEGRATION FOR SPEAKER EMBEDDING EXTRACTION |
3852 | Multi-Focus Guided Semantic Aggregation for Video Object Detection |
2392 | MULTI-FRAME FULL-RANK SPATIAL COVARIANCE ANALYSIS FOR UNDERDETERMINED BSS IN REVERBERANT ENVIRONMENTS |
3972 | Multi-frame super-resolution with raw images via modified deformable convolution |
5178 | Multi-Head ReLU Implicit Neural Representation Networks |
2384 | MULTI-HIERARCHY PROXY STRUCTURE FOR DEEP METRIC LEARNING |
2102 | MULTI-LEVEL CONTRASTIVE LEARNING FOR CROSS-LINGUAL ALIGNMENT |
5353 | MULTI-LEVEL RELATION AWARE NETWORK FOR PERSON RE-IDENTIFICATION |
3005 | MULTI-LEVEL SPATIAL-TEMPORAL ADAPTATION NETWORK FOR MOTOR IMAGERY CLASSIFICATION |
3413 | MULTI-LINGUAL MULTI-TASK SPEECH EMOTION RECOGNITION USING WAV2VEC 2.0 |
4823 | MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS |
5014 | MULTILINGUAL TEXT-TO-SPEECH TRAINING USING CROSS LANGUAGE VOICE CONVERSION AND SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS |
2091 | MULTI-MODAL ACOUSTIC-ARTICULATORY FEATURE FUSION FOR DYSARTHRIC SPEECH RECOGNITION |
9321 | MULTIMODAL DATA FUSION IN HIGH-DIMENSIONAL HETEROGENEOUS DATASETS VIA GENERATIVE MODELS |
3649 | MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS |
3721 | MULTI-MODAL EMOTION RECOGNITION WITH SELF-GUIDED MODALITY CALIBRATION |
8858 | MULTIMODAL EMOTION RECOGNITION WITH SURGICAL AND FABRIC MASKS |
8835 | MULTIMODAL EVALUATION METHOD FOR SOUND EVENT DETECTION |
5999 | MULTIMODAL GRAPH SIGNAL DENOISING VIA TWOFOLD GRAPH SMOOTHNESS REGULARIZATION WITH DEEP ALGORITHM UNROLLING |
6085 | MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA |
4449 | MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION |
5352 | MULTI-MODAL RECURRENT FUSION FOR INDOOR LOCALIZATION |
4952 | Multimodal Sentiment Analysis on Unaligned Sequences via Holographic Embedding |
5937 | Multimodal Transformer With Learnable Frontend and Self Attention for Emotion Recognition |
4941 | MULTIPLE INSTANCE LEARNING WITH TASK-SPECIFIC MULTI-LEVEL FEATURES FOR WEAKLY ANNOTATED HISTOPATHOLOGICAL IMAGE CLASSIFICATION |
1565 | MULTIPLE KERNEL K-MEANS CLUSTERING WITH SIMULTANEOUS SPECTRAL ROTATION |
4566 | MULTIPLE OFFSETS MULTILATERATION: A NEW PARADIGM FOR SENSOR NETWORK CALIBRATION WITH UNSYNCHRONIZED REFERENCE NODES |
2812 | MULTIPLE PATCH-AWARE NETWORK FOR FASTER REAL-WORLD IMAGE DEHAZING |
2907 | MULTIPLE TEMPORAL CONTEXT EMBEDDING NETWORKS FOR UNSUPERVISED TIME SERIES ANOMALY DETECTION |
4890 | Multiplication-Avoiding Variant of Power Iteration with Applications |
2762 | MULTI-POSE VIRTUAL TRY-ON VIA SELF-ADAPTIVE FEATURE FILTERING |
6973 | MULTI-QUERY MULTI-HEAD ATTENTION POOLING AND INTER-TOPK PENALTY FOR SPEAKER VERIFICATION |
4548 | MULTI-RELATION MESSAGE PASSING FOR MULTI-LABEL TEXT CLASSIFICATION |
2442 | MULTI-ROLE EVENT ARGUMENT EXTRACTION AS MACHINE READING COMPREHENSION WITH ARGUMENT MATCH OPTIMIZATION |
2108 | MULTI-SAMPLE SUBBAND WAVERNN VIA MULTIVARIATE GAUSSIAN |
3426 | Multiscale attention aggregation network for 2D vessel segmentation |
3920 | MULTISCALE CROWD COUNTING AND LOCALIZATION BY MULTITASK POINT SUPERVISION |
2421 | MULTI-SCALE REINFORCEMENT LEARNING STRATEGY FOR OBJECT DETECTION |
3759 | MULTI-SCALE SPEAKER EMBEDDING-BASED GRAPH ATTENTION NETWORKS FOR SPEAKER DIARISATION |
9286 | Multi-Sensor Network Information for Linear-Gaussian Multi-Target Tracking Systems |
1511 | MULTI-SPEAKER PITCH TRACKING VIA EMBODIED SELF-SUPERVISED LEARNING |
1691 | MULTI-STAGE GRAPH REPRESENTATION LEARNING FOR DIALOGUE-LEVEL SPEECH EMOTION RECOGNITION |
5745 | MULTISTREAM NEURAL ARCHITECTURES FOR CUED SPEECH RECOGNITION USING A PRE-TRAINED VISUAL FEATURE EXTRACTOR AND CONSTRAINED CTC DECODING |
8061 | MULTISV: DATASET FOR FAR-FIELD MULTI-CHANNEL SPEAKER VERIFICATION |
4388 | MULTI-TASK FMRI DATA FUSION USING IVA AND PARAFAC2 |
4648 | MULTI-TASK GAUSSIAN PROCESS REGRESSION FOR THE DETECTION OF SLEEP CYCLES IN PREMATURE INFANTS |
1760 | Multitask Gaussian Process with Hierarchical Latent Interactions |
8778 | MULTI-TASK LEARNING IMPROVES SYNTHETIC SPEECH DETECTION |
1183 | MULTI-TASK LEARNING IMPROVES THE BRAIN STOKE LESION SEGMENTATION |
2335 | MULTI-TASK RNN-T WITH SEMANTIC DECODER FOR STREAMABLE SPOKEN LANGUAGE UNDERSTANDING |
1994 | MULTITASK SPARSE NEURAL NETWORK FOR HYPERSPECTRAL IMAGE DENOISING |
2161 | MULTI-TASK VOICE ACTIVATED FRAMEWORK USING SELF-SUPERVISED LEARNING |
8709 | MULTI-TURN INCOMPLETE UTTERANCE RESTORATION AS OBJECT DETECTION |
2596 | MULTI-TURN RNN-T FOR STREAMING RECOGNITION OF MULTI-PARTY SPEECH |
4332 | MULTIVARIATE MULTISCALE COSINE SIMILARITY ENTROPY |
8466 | MULTI-VIEW AND MULTI-MODAL EVENT DETECTION UTILIZING TRANSFORMER-BASED MULTI-SENSOR FUSION |
1682 | MULTI-VIEW DATA REPRESENTATION VIA DEEP AUTOENCODER-LIKE NONNEGATIVE MATRIX FACTORIZATION |
2837 | MULTI-VIEW INFORMATION BOTTLENECK WITHOUT VARIATIONAL APPROXIMATION |
5980 | MULTI-VIEW LEARNING BASED ON NON-REDUNDANT FUSION FOR ICU PATIENT MORTALITY PREDICTION |
1483 | MULTIVIEW LONG-SHORT SPATIAL CONTRASTIVE LEARNING FOR 3D MEDICAL IMAGE ANALYSIS |
4352 | MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION |
5139 | MUSIC ENHANCEMENT VIA IMAGE TRANSLATION AND VOCODING |
9011 | Music Identification Using brain responses to Initial Snippets |
2168 | MUSIC PHRASE INPAINTING USING LONG-TERM REPRESENTATION AND CONTRASTIVE LOSS |
4304 | MUSIC SOURCE SEPARATION WITH DEEP EQUILIBRIUM MODELS |
1352 | MUSICYOLO: A SIGHT-SINGING ONSET/OFFSET DETECTION FRAMEWORK BASED ON OBJECT DETECTION INSTEAD OF SPECTRUM FRAMES |
4328 | Natural-looking Adversarial Examples from Freehand Sketches |
1189 | NAVIGATING AUDIO-VISUAL EVENT DETECTION ACROSS MISMATCHED MODALITIES |
3296 | NEAREST SUBSPACE SEARCH IN THE SIGNED CUMULATIVE DISTRIBUTION TRANSFORM SPACE FOR 1D SIGNAL CLASSIFICATION |
9288 | Near-field Tracking with Large Antenna Arrays: Fundamental Limits and Practical Algorithms |
1893 | NEARTRACKER: ACOUSTIC 2-D TARGET TRACKING WITH NEARBY REFLECTOR IN SISO SYSTEM |
1675 | NEIGHBOR-AUGMENTED TRANSFORMER-BASED EMBEDDING FOR RETRIEVAL |
2194 | NEUFA: NEURAL NETWORK BASED END-TO-END FORCED ALIGNMENT WITH BIDIRECTIONAL ATTENTION MECHANISM |
3179 | Neural Architecture Search for Speech Emotion Recognition |
4569 | NEURAL AUDIO-TO-SCORE MUSIC TRANSCRIPTION FOR UNCONSTRAINED POLYPHONY USING COMPACT OUTPUT REPRESENTATIONS |
4551 | NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION |
4758 | Neural Collapse in Deep Homogeneous Classifiers and the role of Weight Decay |
9302 | NEURAL FULL-RANK SPATIAL COVARIANCE ANALYSIS FOR BLIND SOURCE SEPARATION |
3826 | NEURAL GRAPHEME-TO-PHONEME CONVERSION WITH PRE-TRAINED GRAPHEME MODELS |
1031 | NEURAL HMMS ARE ALL YOU NEED (FOR HIGH-QUALITY ATTENTION-FREE TTS) |
5748 | NEURAL NETWORK-BASED COMPRESSION FRAMEWORK FOR DOA ESTIMATION EXPLOITING DISTRIBUTED ARRAY |
4600 | Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet |
5184 | NEURAL-FST CLASS LANGUAGE MODEL FOR END-TO-END SPEECH RECOGNITION |
1728 | NEW IMPROVED CRITERION FOR MODEL SELECTION IN SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION MODELS |
5690 | NEWS RECOMMENDATION VIA MULTI-INTEREST NEWS SEQUENCE MODELLING |
8951 | NEX+: NOVEL VIEW SYNTHESIS WITH NEURAL REGULARISATION OVER MULTI-PLANE IMAGES |
8773 | NFT-K: NON-FUNGIBLE TANGENT KERNELS |
1745 | NN3A: NEURAL NETWORK SUPPORTED ACOUSTIC ECHO CANCELLATION, NOISE SUPPRESSION AND AUTOMATIC GAIN CONTROL FOR REAL-TIME COMMUNICATIONS |
5059 | nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech |
8841 | No More Than 6ft Apart: Robust K-Means via Radius Upper Bounds |
1882 | NODE SLICING BROAD LEARNING SYSTEM FOR TEXT CLASSIFICATION |
2482 | NODE-SCREENING TESTS FOR THE L0-PENALIZED LEAST-SQUARES PROBLEM |
1234 | NOISE SUPPRESSION FOR IMPROVED FEW-SHOT LEARNING |
4340 | NOISE-ROBUST SPEECH RECOGNITION WITH 10 MINUTES UNPARALLELED IN-DOMAIN DATA |
6103 | NON-AUTOREGRESSIVE ASR WITH SELF-CONDITIONED FOLDED ENCODERS |
5005 | NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING |
1114 | NON-AUTOREGRESSIVE TRANSFORMER WITH UNIFIED BIDIRECTIONAL DECODER FOR AUTOMATIC SPEECH RECOGNITION |
9235 | NON-BAYESIAN ESTIMATION FRAMEWORK FOR SIGNAL RECOVERY ON GRAPHS |
8639 | NON-INVASIVE BLOOD PRESSURE MONITORING WITH MULTI-MODAL IN-EAR SENSING |
9000 | Nonlinear signal decomposition based on block sparse approximation |
2839 | NON-RIGID TRANSFORMATION BASED ADVERSARIAL ATTACK AGAINST 3D OBJECT TRACKING |
4472 | NONVERBAL SOUND DETECTION FOR DISORDERED SPEECH |
2350 | NO-REFERENCE QUALITY ASSESSMENT OF VARIABLE FRAME-RATE VIDEOS USING TEMPORAL BANDPASS STATISTICS |
4780 | NOT ALL FEATURES ARE EQUAL: SELECTION OF ROBUST FEATURES FOR SPEECH EMOTION RECOGNITION IN NOISY ENVIRONMENTS |
5707 | Novel Class Discovery: A Dependency Approach |
4557 | NOVEL INSTANCE MINING WITH PSEUDO-MARGIN EVALUATION FOR FEW-SHOT OBJECT DETECTION |
2020 | NVC-NET: END-TO-END ADVERSARIAL VOICE CONVERSION |
4826 | OBJECT DETECTION AND TRACKING IN ULTRASOUND SCANS USING AN OPTICAL FLOW AND SEMANTIC SEGMENTATION FRAMEWORK BASED ON CONVOLUTIONAL NEURAL NETWORKS |
3792 | OBJECT-ORIENTED BACKDOOR ATTACK AGAINST IMAGE CAPTIONING |
3515 | OCCLUDED PERSON RE-IDENTIFICATION VIA RELATIONAL ADAPTIVE FEATURE CORRECTION LEARNING |
2981 | OFF-THE-GRID COVARIANCE-BASED SUPER-RESOLUTION FLUCTUATION MICROSCOPY |
2925 | OFF-THE-SHELF DEEP INTEGRATION FOR RESIDUAL-ECHO SUPPRESSION |
5247 | OMNI-SPARSITY DNN: FAST SPARSITY OPTIMIZATION FOR ON-DEVICE STREAMING E2E ASR VIA SUPERNET |
3234 | On Adversarial Robustness of Large-scale Audio Visual Learning |
4875 | ON CONTINUOUS-DOMAIN INVERSE PROBLEMS WITH SPARSE SUPERPOSITIONS OF DECAYING SINUSOIDS AS SOLUTIONS |
9314 | ON DATA AUGMENTATION FOR GAN TRAINING |
3897 | ON FEDERATED LEARNING WITH ENERGY HARVESTING CLIENTS |
2772 | ON IDENTIFIABLE POLYTOPE CHARACTERIZATION FOR POLYTOPIC MATRIX FACTORIZATION |
4436 | On Language Model Integration for RNN Transducer based Speech Recognition |
3636 | ON LOSS FUNCTIONS AND EVALUATION METRICS FOR MUSIC SOURCE SEPARATION |
3720 | ON MINI-BATCH TRAINING WITH VARYING LENGTH TIME SERIES |
2452 | ON SPECTRAL AND TEMPORAL SPARSIFICATION OF SPEECH SIGNALS FOR THE IMPROVEMENT OF SPEECH PERCEPTION IN CI LISTENERS |
9256 | ON STABILITY AND CONVERGENCE OF DISTRIBUTED FILTERS |
8839 | On Submodular Set Cover Problems For Near-Optimal Online Kernel Basis Selection |
4381 | ON SYNCHRONIZATION OF WIRELESS ACOUSTIC SENSOR NETWORKS IN THE PRESENCE OF TIME-VARYING SAMPLING RATE OFFSETS AND SPEAKER CHANGES |
2292 | ON THE ACQUISITION OF STATIONARY SIGNALS USING UNIFORM ADCS |
4491 | ON THE CONVERGENCE OF ADAM-TYPE ALGORITHMS FOR SOLVING STRUCTURED SINGLE NODE AND DECENTRALIZED MIN-MAX SADDLE POINT GAMES |
4650 | On the Effectiveness of Active Learning by Uncertainty Sampling in Classification of High-Dimensional Gaussian Mixture Data |
2237 | On the false alarm probability of the Normalized Matched Filter for off-grid target detection |
6132 | ON THE IMPACT OF NORMALIZATION STRATEGIES IN UNSUPERVISED ADVERSARIAL DOMAIN ADAPTATION FOR ACOUSTIC SCENE CLASSIFICATION |
3902 | ON THE IMPORTANCE OF DIFFERENT FREQUENCY BINS FOR SPEAKER VERIFICATION |
4652 | ON THE INTERPLAY BETWEEN SPARSITY, NATURALNESS, INTELLIGIBILITY, AND PROSODY IN SPEECH SYNTHESIS |
1122 | ON THE OBSERVABILITY IN VISUAL SLAM NETWORKS |
1713 | ON THE POTENTIAL OF SPATIALLY-SPREAD ORTHOGONAL TIME FREQUENCY SPACE MODULATION FOR ISAC TRANSMISSIONS |
4762 | ON THE PREDICTION OF THE FREQUENCY RESPONSE OF A WOODEN PLATE FROM ITS MECHANICAL PARAMETERS |
1311 | ON THE RELAXATION OF ORTHOGONAL TENSOR RANK AND ITS NONCONVEX RIEMANNIAN OPTIMIZATION FOR TENSOR COMPLETION |
9258 | ON THE SIZE AND REDUNDANCY OF THE FOURTH-ORDER DIFFERENCE CO-ARRAY |
5208 | ON THE STABILITY OF LOW PASS GRAPH FILTER WITH A LARGE NUMBER OF EDGE REWIRES |
3353 | ON THE USE OF COMPONENT STRUCTURAL CHARACTERISTICS FOR VOXEL SEGMENTATION IN SEMICON 3D IMAGES |
2808 | ON THE USE OF GEODESIC TRIANGLES BETWEEN GAUSSIAN DISTRIBUTIONS FOR CLASSIFICATION PROBLEMS |
1522 | ONE MODEL TO ENHANCE THEM ALL: ARRAY GEOMETRY AGNOSTIC MULTI-CHANNEL PERSONALIZED SPEECH ENHANCEMENT |
3314 | ONE TTS ALIGNMENT TO RULE THEM ALL |
9280 | ONE-CLASS LEARNING TOWARDS SYNTHETIC VOICE SPOOFING DETECTION |
3954 | ONE-SHOT VOICE CONVERSION FOR STYLE TRANSFER BASED ON SPEAKER ADAPTATION |
3969 | ONLINE CONTINUAL LEARNING USING ENHANCED RANDOM VECTOR FUNCTIONAL LINK NETWORKS |
9030 | ONLINE DETECTION OF SCALP-INVISIBLE MESIAL-TEMPORAL BRAIN INTERICTAL EPILEPTIFORM DISCHARGES FROM EEG |
2948 | ONLINE ECG BIOMETRICS VIA HADAMARD CODE |
4725 | ONLINE LEARNING FOR LATENT YULE-SIMON PROCESSES |
3457 | Online Learning with Probabilistic Feedback |
9311 | ONLINE TRAINING OF STEREO SELF-CALIBRATION USING MONOCULAR DEPTH ESTIMATION |
3143 | OPENFEAT: IMPROVING SPEAKER IDENTIFICATION BY OPEN-SET FEW-SHOT EMBEDDING ADAPTATION WITH TRANSFORMER |
8813 | Operator Formulation for Linear Transformations and Signal Estimation in the Joint Spatial-Slepian Domain |
2223 | OPTE: ONLINE PER-TITLE ENCODING FOR LIVE VIDEO STREAMING |
1704 | OPTIMAL COMBINATION POLICIES FOR ADAPTIVE SOCIAL LEARNING |
2730 | OPTIMAL QOS-AWARE NETWORK SLICING FOR SERVICE-ORIENTED NETWORKS WITH FLEXIBLE ROUTING |
3176 | OPTIMAL RESOURCE ALLOCATION AND BEAMFORMING FOR TWO-USER MISO WPCNs FOR A NON-LINEAR CIRCUIT-BASED EH MODEL |
4896 | Optimization Guarantees for ISTA and ADMM based Unfolded Networks |
8975 | Optimization of a Fixed Virtual Sensing Feedback ANC Controller for In-Ear Headphones with Multiple Loudspeakers |
8164 | Optimization of compressive light field display in dual-guided learning |
5422 | OPTIMIZE WAV2VEC2S ARCHITECTURE FOR SMALL TRAINING SET THROUGH ANALYZING ITS PRE-TRAINED MODELS ATTENTION PATTERN |
4227 | OPTIMIZING ALIGNMENT OF SPEECH AND LANGUAGE LATENT SPACES FOR END-TO-END SPEECH RECOGNITION AND UNDERSTANDING |
2786 | Optimizing Latent Space Directions For GAN-based Local Image Editing |
2533 | OPTIMIZING THE CONSUMPTION OF SPIKING NEURAL NETWORKS WITH ACTIVITY REGULARIZATION |
4736 | OPTM3SEC: OPTIMIZING MULTICAST IRS-AIDED MULTIANTENNA DFRC SECRECY CHANNEL WITH MULTIPLE EAVESDROPPERS |
2605 | ORCA-PARTY: AN AUTOMATIC KILLER WHALE SOUND TYPE SEPARATION TOOLKIT USING DEEP LEARNING |
3427 | ORTHOGONAL NONNEGATIVE MATRIX TRI-FACTORIZATION FOR COMMUNITY DETECTION IN MULTIPLEX NETWORKS |
2521 | OT CLEANER: LABEL CORRECTION AS OPTIMAL TRANSPORT |
5245 | OUT-OF-DISTRIBUTION AS A TARGET CLASS IN SEMI-SUPERVISED LEARNING |
5847 | OVER-PARAMETERIZED NETWORK SOLVES PHASE RETRIEVAL EFFECTIVELY |
4888 | OVER-THE-AIR PERSONALIZED FEDERATED LEARNING |
1544 | PAIR-LEVEL SUPERVISED CONTRASTIVE LEARNING FOR NATURAL LANGUAGE INFERENCE |
1867 | PAMA-TTS: PROGRESSION-AWARE MONOTONIC ATTENTION FOR STABLE SEQ2SEQ TTS WITH ACCURATE PHONEME DURATION CONTROL |
4006 | PANCHROMATIC IMAGERY COPY-PASTE LOCALIZATION THROUGH DATA-DRIVEN SENSOR ATTRIBUTION |
1612 | Parallel Composition of Weighted Finite-State Transducers |
3027 | Parameter Estimation in Sparse Inverse Problems using Bernoulli-Gaussian Prior |
1047 | PARAMETER-FREE STYLE PROJECTION FOR ARBITRARY IMAGE STYLE TRANSFER |
1526 | PARAMETRIC MODELING OF HUMAN WRIST FOR BIOIMPEDANCE-BASED PHYSIOLOGICAL SENSING |
1664 | PARAMETRIC MODELS FOR DOA TRAJECTORY LOCALIZATION |
6324 | PARTIAL ARITHMETIC CONSENSUS BASED DISTRIBUTED INTENSITY PARTICLE FLOW SMC-PHD FILTER FOR MULTI-TARGET TRACKING |
4753 | PARTIAL VARIABLE TRAINING FOR EFFICIENT ON-DEVICE FEDERATED LEARNING |
2577 | PARTIALLY RELAXED ORTHOGONAL LEAST SQUARES WEIGHTED SUBSPACE FITTING DIRECTION-OF-ARRIVAL ESTIMATION |
2655 | PART-OF-SPEECH MODELS COMPRESSION METHODS FOR ON-DEVICE GRAPHEME-TO-PHONEME CONVERSION |
1256 | PAS-MEF: MULTI-EXPOSURE IMAGE FUSION BASED ON PRINCIPAL COMPONENT ANALYSIS, ADAPTIVE WELL-EXPOSEDNESS AND SALIENCY MAP |
4987 | PASSTRANS: AN IMPROVED PASSWORD REUSE MODEL BASED ON TRANSFORMER |
1956 | PATCH STEGANALYSIS: A SAMPLING BASED DEFENSE AGAINST ADVERSARIAL STEGANOGRAPHY |
1872 | PATH SIGNATURES FOR NON-INTRUSIVE LOAD MONITORING |
1585 | PDD-NET: A PRECISE DEFECT DETECTION NETWORK BASED ON POINT SET REPRESENTATION |
2809 | PEAR: Photographic Embedding for Aesthetic Rating |
5873 | PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION |
9234 | PERCEPTUAL-SIMILARITY-AWARE DEEP SPEAKER REPRESENTATION LEARNING FOR MULTI-SPEAKER GENERATIVE MODELING |
1971 | PERFECT RECONSTRUCTION OF CLASSES OF NON-BANDLIMITED SIGNALS FROM PROJECTIONS WITH UNKNOWN ANGLES |
3257 | Performance Optimization for Wireless Semantic Communications over Energy Harvesting Networks |
1530 | Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition |
3304 | PERSONALIZED AUTOMATIC SPEECH RECOGNITION TRAINED ON SMALL DISORDERED SPEECH DATASETS |
9085 | Personalized PageRank Graph Attention Networks |
2673 | PERSONALIZED SPEECH ENHANCEMENT: NEW MODELS AND COMPREHENSIVE EVALUATION |
3288 | PGTRNET: TWO-PHASE WEAKLY SUPERVISED OBJECT DETECTION WITH PSEUDO GROUND TRUTH REFINEMENT |
3664 | PHASE CONTINUITY: LEARNING DERIVATIVES OF PHASE SPECTRUM FOR SPEECH ENHANCEMENT |
5548 | Phase Control of Parametric Array Loudspeaker by Optimizing Sideband Weights |
4216 | PHASE SHIFTED BEDROSIAN FILTERBANK: AN INTERPRETABLE AUDIO FRONT-END FOR TIME-DOMAIN AUDIO SOURCE SEPARATION |
4708 | Phase-Only Reconfigurable Sparse Array Beamforming using Deep Learning |
2723 | PHONE-INFORMED REFINEMENT OF SYNTHESIZED MEL SPECTROGRAM FOR DATA AUGMENTATION IN SPEECH RECOGNITION |
9292 | PHONEME LEVEL LYRICS ALIGNMENT AND TEXT-INFORMED SINGING VOICE SEPARATION |
3723 | PHONEME MISPRONUNCIATION DETECTION BY JOINTLY LEARNING TO ALIGN |
3232 | Phone-to-audio alignment without text: A Semi-supervised Approach |
2007 | PHONOLOGY RECOGNITION IN AMERICAN SIGN LANGUAGE |
5307 | PHONOTACTIC LANGUAGE RECOGNITION USING A UNIVERSAL PHONEME RECOGNIZER AND A TRANSFORMER ARCHITECTURE |
4160 | PHOTON-LIMITED DEBLURRING USING ALGORITHM UNROLLING |
1399 | PHYSICAL LAYER ANONYMOUS COMMUNICATIONS: AN ANONYMITY ENTROPY ORIENTED PRECODING DESIGN |
4631 | PICKNET: REAL-TIME CHANNEL SELECTION FOR AD HOC MICROPHONE ARRAYS |
9276 | PITCH ESTIMATION BY MULTIPLE OCTAVE DECODERS |
2751 | PIXEL-LEVEL AND AFFINITY-LEVEL KNOWLEDGE DISTILLATION FOR UNSUPERVISED SEGMENTATION OF COVID-19 LESIONS |
1731 | PIXINWAV: RESIDUAL STEGANOGRAPHY FOR HIDING PIXELS IN AUDIO |
1312 | PLUG-AND-PLAY AND RELAY REGULARIZATIONS ON NOISY LOW RANK TENSOR COMPLETION FOR SNAPSHOT MULTISPECTRAL IMAGE RESTORATION |
2894 | PMP-NET: RETHINKING VISUAL CONTEXT FOR SCENE GRAPH GENERATION |
5359 | POINT CLOUD ATTRIBUTE COMPRESSION VIA CHROMA SUBSAMPLING |
5572 | POINT CLOUD DENOISING USING NORMAL VECTOR-BASED GRAPH WAVELET SHRINKAGE |
1958 | POINT-MASS FILTER WITH DECOMPOSITION OF TRANSIENT DENSITY |
3440 | POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END |
4431 | Polyphonic audio event detection: multi-label or multi-class multi-task classification problem? |
1331 | POPO: PESSIMISTIC OFFLINE POLICY OPTIMIZATION |
1538 | POSITION-INVARIANT ADVERSARIAL ATTACKS ON NEURAL MODULATION RECOGNITION |
3582 | POSTGAN: A GAN-BASED POST-PROCESSOR TO ENHANCE THE QUALITY OF CODED SPEECH |
4846 | Power allocation for wireless federated learning using graph neural networks |
3827 | POWER-EFFICIENT HYBRID MIMO RECEIVER WITH TASK-SPECIFIC BEAMFORMING USING LOW-RESOLUTION ADCS |
4256 | PREDICTING FLAT-FADING CHANNELS VIA META-LEARNED CLOSED-FORM LINEAR FILTERS AND EQUILIBRIUM PROPAGATION |
2096 | PREDICTING HUMAN MOTION USING KEY SUBSEQUENCES |
5251 | PREDICTING THE GENERALIZATION GAP IN DEEP MODELS USING ANCHORING |
2633 | PRELIMINARY RESULTS ON THE GENERATION OF ARTIFICIAL HANDWRITING DATA USING A DECOMPOSITION-RECOMBINATION STRATEGY |
9320 | PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks |
8060 | PRESERVING TRAJECTORY PRIVACY IN DRIVING DATA RELEASE |
5373 | PRIME KNOWLEDGE WITH LOCAL PATTERN CONSISTENCY FOR KNOWLEDGE DISTILLATION |
3638 | PRIOR-BERT AND MULTI-TASK LEARNING FOR TARGET-ASPECT-SENTIMENT JOINT DETECTION |
4690 | PRIVACY ATTACKS FOR AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS IN A FEDERATED LEARNING FRAMEWORK |
2038 | PRIVACY PROTECTION IN LEARNING FAIR REPRESENTATIONS |
2371 | PRIVACY SENSITIVE SPEECH ANALYSIS USING FEDERATED LEARNING TO ASSESS DEPRESSION |
8929 | PRIVACY-AWARE COMMUNICATION OVER A WIRETAP CHANNEL WITH GENERATIVE NETWORKS |
5960 | PRIVACY-ENHANCING APPLIANCE FILTERING FOR SMART METERS |
2362 | PRIVACY-PRESERVING ACTION RECOGNITION |
1974 | PRIVACY-PRESERVING DISTRIBUTED EXPECTATION MAXIMIZATION FOR GAUSSIAN MIXTURE MODEL USING SUBSPACE PERTURBATION |
1318 | PRIVACY-PRESERVING FEDERATED MULTI-TASK LINEAR REGRESSION: A ONE-SHOT LINEAR MIXING APPROACH INSPIRED BY GRAPH REGULARIZATION |
8961 | PRIVATE LEARNING VIA KNOWLEDGE TRANSFER WITH HIGH-DIMENSIONAL TARGETS |
5513 | PROBABILISTIC FINE-GRAINED URBAN FLOW INFERENCE WITH NORMALIZING FLOWS |
5234 | PROBABLY PLEASANT? A NEURAL-PROBABILISTIC APPROACH TO AUTOMATIC MASKER SELECTION FOR URBAN SOUNDSCAPE AUGMENTATION |
2838 | PROGRESSIVE CONTINUAL LEARNING FOR SPOKEN KEYWORD SPOTTING |
5126 | PROGRESSIVE IMAGE SUPER-RESOLUTION VIA NEURAL DIFFERENTIAL EQUATION |
5460 | PROGRESSIVE MULTI-STAGE NEURAL AUDIO CODING WITH GUIDED REFERENCES |
1638 | PROGRESSIVE TEACHER-STUDENT TRAINING FRAMEWORK FOR MUSIC TAGGING |
1754 | Progressive-Granularity Retrieval via Hierarchical Feature Alignment for Person Re-Identification |
4518 | PROSODYSPEECH: TOWARDS ADVANCED PROSODY MODEL FOR NEURAL TEXT-TO-SPEECH |
1586 | ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech |
4243 | PROTOTYPE LEARNING FOR INTERPRETABLE RESPIRATORY SOUND ANALYSIS |
4859 | PROTOTYPE-BASED INTER-CAMERA LEARNING FOR PERSON RE-IDENTIFICATION |
5084 | PROVABLE SAMPLE COMPLEXITY GUARANTEES FOR LEARNING OF CONTINUOUS-ACTION GRAPHICAL GAMES WITH NONPARAMETRIC UTILITIES |
1751 | Provable Second-order Riemannian Gauss-Newton Method for Low-rank Tensor Estimation |
6676 | PROXIMAL-BASED ADAPTIVE SIMULATED ANNEALING FOR GLOBAL OPTIMIZATION |
9324 | PRUNING BY TRAINING: A NOVEL DEEP NEURAL NETWORK COMPRESSION FRAMEWORK FOR IMAGE PROCESSING |
1737 | PSEUDO STRONG LABELS FOR LARGE SCALE WEAKLY SUPERVISED AUDIO TAGGING |
2003 | PSEUDO-INTERACTING GUIDED NETWORK FOR FEW-SHOT SEGMENTATION |
5829 | PSEUDO-LABEL TRANSFER FROM FRAME-LEVEL TO NOTE-LEVEL IN A TEACHER-STUDENT FRAMEWORK FOR SINGING TRANSCRIPTION FROM POLYPHONIC MUSIC |
4873 | Pseudo-Labeling for Massively Multilingual Speech Recognition |
9274 | PSLA: IMPROVING AUDIO TAGGING WITH PRETRAINING, SAMPLING, LABELING, AND AGGREGATION |
9273 | PSYCHOACOUSTIC CALIBRATION OF LOSS FUNCTIONS FOR EFFICIENT END-TO-END NEURAL AUDIO CODING |
7552 | PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION |
1337 | PU-REFINER: A GEOMETRY REFINER WITH ADVERSARIAL LEARNING FOR POINT CLOUD UPSAMPLING |
3547 | PVAE-TTS: ADAPTIVE TEXT-TO-SPEECH VIA PROGRESSIVE STYLE ADAPTATION |
4172 | PYRAMID FUSION ATTENTION NETWORK FOR SINGLE IMAGE SUPER-RESOLUTION |
5506 | PYXIS: AN OPEN-SOURCE PERFORMANCE DATASET OF SPARSE ACCELERATORS |
9180 | QA4QG: Using Question Answering to Constrain Multi-Hop Question Generation |
3295 | QRELATION: AN AGENT RELATION-BASED APPROACH FOR MULTI-AGENT REINFORCEMENT LEARNING VALUE FUNCTION FACTORIZATION |
2261 | QUANTIFYING DISCRIMINABILITY BETWEEN NMF BASES |
2457 | QUANTIZATION-AWARE PRECODING FOR MU-MIMO WITH LIMITED-CAPACITY FRONTHAUL |
1100 | QUANTIZED WINOGRAD ACCELERATION FOR CONV1D EQUIPPED ASR MODELS ON MOBILE DEVICES |
2300 | QUANTUM FEDERATED LEARNING WITH QUANTUM DATA |
4001 | QUANTUM LONG SHORT-TERM MEMORY |
4603 | QUICKEST DETECTION OF COMPOSITE AND NON-STATIONARY CHANGES WITH APPLICATION TO PANDEMIC MONITORING |
9266 | RADAR TARGET DETECTION AIDED BY RECONFIGURABLE INTELLIGENT SURFACES |
4159 | Randomized Smoothing Under Attack: How Good Is It In Practice? |
2767 | RANGEINET: FAST LIDAR POINT CLOUD TEMPORAL INTERPOLATION |
6369 | RANK-BASED LOSS FOR LEARNING HIERARCHICAL REPRESENTATIONS |
4069 | RATE CODING OR DIRECT CODING: WHICH ONE IS BETTER FOR ACCURATE, ROBUST, AND ENERGY-EFFICIENT SPIKING NEURAL NETWORKS? |
3346 | RATE CONTROL FOR LEARNED VIDEO COMPRESSION |
1525 | RATIONAL ARRAYS FOR DOA ESTIMATION |
5487 | RAW PLENOPTIC VIDEO CODING UNDER HEXAGONAL LATTICE RESOLUTION OF MOTION VECTORS |
5599 | Raw source and filter modelling for dysarthric speech recognition |
3103 | RAWBOOST: A RAW DATA BOOSTING AND AUGMENTATION METHOD APPLIED TO AUTOMATIC SPEAKER VERIFICATION ANTI-SPOOFING |
3386 | RAWNEXT: SPEAKER VERIFICATION SYSTEM FOR VARIABLE-DURATION UTTERANCES WITH DEEP LAYER AGGREGATION AND EXTENDED DYNAMIC SCALING POLICIES |
9289 | Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation |
6885 | RCANET: ROW-COLUMN ATTENTION NETWORK FOR SEMANTIC SEGMENTATION |
8885 | REAL ADDITIVE MARGIN SOFTMAX FOR SPEAKER VERIFICATION |
2218 | REALISTIC MONOCULAR-TO-3D VIRTUAL TRY-ON VIA MULTI-SCALE CHARACTERISTICS CAPTURE |
2324 | REAL-M: TOWARDS SPEECH SEPARATION ON REAL MIXTURES |
9325 | Real-Time Audio-Guided Multi-Face Reenactment |
4005 | REAL-TIME FALL DETECTION USING MMWAVE RADAR |
1130 | Real-World Adversarial Examples via Makeup |
1860 | REAL-WORLD ON-BOARD UAV AUDIO DATA SET FOR PROPELLER ANOMALIES |
9310 | RECEIVER DESIGN WITH REDUCED DOF IN FREQUENCY DOMAIN FOR TARGET DETECTION UNDER GAUSSIAN CLUTTER |
1517 | RECOGNITION OF SILENTLY SPOKEN WORD FROM EEG SIGNALS USING DENSE ATTENTION NETWORK (DAN). |
9315 | RECONSTRUCTING SPEECH FROM CNN EMBEDDINGS |
2748 | RECOVERY OF GRAPH SIGNALS FROM SIGN MEASUREMENTS |
5214 | RECOVERY OF NOISY POOLED TESTS VIA LEARNED FACTOR GRAPHS WITH APPLICATION TO COVID-19 TESTING |
2562 | Recurrent Design of Probing Waveform for Sparse Bayesian Learning Based DOA Estimation |
2494 | REFEREE: TOWARDS REFERENCE-FREE CROSS-SPEAKER STYLE TRANSFER WITH LOW-QUALITY DATA FOR EXPRESSIVE SPEECH SYNTHESIS |
4239 | REFERENCE MICROPHONE SELECTION AND LOW-RANK APPROXIMATION BASED MULTICHANNEL WIENER FILTER WITH APPLICATION TO SPEECH RECOGNITION |
3134 | Reformulating Speaker Diarization as Community Detection With Emphasis On Topological Structure |
1967 | REGION-TO-REGION KERNEL INTERPOLATION OF ACOUSTIC TRANSFER FUNCTION WITH DIRECTIONAL WEIGHTING |
4722 | REGRESSION ASSISTED MATRIX COMPLETION FOR RECONSTRUCTING A PROPAGATION FIELD WITH APPLICATION TO SOURCE LOCALIZATION |
3527 | REGULARIZATION USING DENOISING: EXACT AND ROBUST SIGNAL RECOVERY |
1838 | REGULARIZED LATENT SPACE EXPLORATION FOR DISCRIMINATIVE FACE SUPER-RESOLUTION |
1516 | RELATION DISCOVERY IN NONLINEARLY RELATED LARGE-SCALE SETTINGS |
2441 | RELATIVE VIEWPOINT ESTIMATION BASED ON STRUCTURED 3D REPRESENTATION ALIGNMENT |
3900 | REMIX-CYCLE-CONSISTENT LEARNING ON ADVERSARIALLY LEARNED SEPARATOR FOR ACCURATE AND STABLE UNSUPERVISED SPEECH SEPARATION |
5763 | REPEAT AFTER ME: SELF-SUPERVISED LEARNING OF ACOUSTIC-TO-ARTICULATORY MAPPING BY VOCAL IMITATION |
4385 | REPETITION ASSESSMENT FOR SPEECH AND LANGUAGE DISORDERS: A STUDY OF THE LOGOPENIC VARIANT OF PRIMARY PROGRESSIVE APHASIA |
4676 | REPRESENTATION LEARNING THROUGH CROSS-MODAL CONDITIONAL TEACHER-STUDENT TRAINING FOR SPEECH EMOTION RECOGNITION |
5399 | RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT |
3142 | RESIDUAL RECOVERY ALGORITHM FOR MODULO SAMPLING |
3620 | RESIDUAL-GUIDED PERSONALIZED SPEECH SYNTHESIS BASED ON FACE IMAGE |
9249 | RESOURCE ALLOCATION AND DITHERING OF BAYESIAN PARAMETER ESTIMATION USING MIXED-RESOLUTION DATA |
2251 | RESTLESS MULTI-ARMED BANDITS UNDER EXOGENOUS GLOBAL MARKOV PROCESS |
5265 | RETHINKING COMPUTER-AIDED PELVIS SEGMENTATION |
2972 | Rethinking Two-B-Real Net for Real-Time Salient Object Detection |
2520 | RETRIEVAL BIAS AWARE ENSEMBLE MODEL FOR CONDITIONAL SENTENCE GENERATION |
1492 | RETRIEVAL ENHANCED SEGMENT GENERATION NEURAL NETWORK FOR TASK-ORIENTED DIALOGUE SYSTEMS |
4645 | RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION |
5215 | R-G2P: EVALUATING AND ENHANCING ROBUSTNESS OF GRAPHEME TO PHONEME CONVERSION BY CONTROLLED NOISE INTRODUCING AND CONTEXTUAL INFORMATION INCORPORATION |
3668 | RIS-AIDED MONOSTATIC MIMO RADAR WITH CO-LOCATED ANTENNAS |
4745 | r-LOCAL UNLABELED SENSING: IMPROVED ALGORITHM AND APPLICATIONS |
2043 | ROBUST ADAPTIVE BEAMFORMING BASED ON POWER METHOD PROCESSING AND SPATIAL SPECTRUM MATCHING |
4855 | Robust adaptive beamforming maximizing the worst-case SINR over distributional uncertainty sets for random INC matrix and signal steering vector |
4146 | ROBUST ADAPTIVE NOISE CANCELLER ALGORITHM WITH SNR-BASED STEPSIZE CONTROL and NOISE-PATH GAIN COMPENSATION |
2831 | ROBUST AND EFFICIENT UNCERTAINTY AWARE BIOSIGNAL CLASSIFICATION VIA EARLY EXIT ENSEMBLES |
3655 | ROBUST BAYESIAN RECONSTRUCTION OF MULTISPECTRAL SINGLE-PHOTON 3D LIDAR DATA WITH NON-UNIFORM BACKGROUND |
5814 | ROBUST CLASSIFICATION WITH FLEXIBLE DISCRIMINANT ANALYSIS IN HETEROGENEOUS DATA |
9171 | ROBUST COLLABORATIVE LEARNING FOR SEQUENCE MODELLING |
9111 | Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion |
9250 | ROBUST DYNAMIC MULTI-MODAL DATA FUSION: A MODEL UNCERTAINTY PERSPECTIVE |
1138 | Robust High-Order Tensor Recovery via Nonconvex Low-Rank Approximation |
4681 | ROBUST NONPARAMETRIC DISTRIBUTION FORECAST WITH BACKTEST-BASED BOOTSTRAP AND ADAPTIVE RESIDUAL SELECTION |
2226 | ROBUST PARAMETER ESTIMATION BASED ON THE K-DIVERGENCE |
3908 | ROBUST PRESSURE MATCHING WITH ATF PERTURBATION CONSTRAINTS FOR SOUND FIELD CONTROL |
9253 | Robust Recovery of Jointly-Sparse Signals Using Minimax Concave Loss Function |
1307 | Robust self-supervised speaker representation learning via instance mix regularization |
8802 | ROBUST SIGNAL PROCESSING OVER SIMPLICIAL COMPLEXES |
2469 | Robust speaker verification using Population-based Data Augmentation |
1205 | ROBUST SPEAKER VERIFICATION WITH JOINT SELF-SUPERVISED AND SUPERVISED LEARNING |
9261 | Robust TDOA Source Localization Based on Lagrange Programming Neural Network |
2881 | ROBUST THERMAL INFRARED PEDESTRIAN DETECTION BY ASSOCIATING VISIBLE PEDESTRIAN KNOWLEDGE |
2080 | ROBUST UNSTRUCTURED KNOWLEDGE ACCESS IN CONVERSATIONAL DIALOGUE WITH ASR ERRORS |
4889 | ROBUST VIDEO HASHING BASED ON LOCAL FLUCTUATION PRESERVING FOR TRACKING DEEP FAKE VIDEOS |
2297 | RTSNET: DEEP LEARNING AIDED KALMAN SMOOTHING |
3769 | RUN-AND-BACK STITCH SEARCH: NOVEL BLOCK SYNCHRONOUS DECODING FOR STREAMING ENCODER-DECODER ASR |
3370 | S2 REDUCER: HIGH-PERFORMANCE SPARSE COMMUNICATION TO ACCELERATE DISTRIBUTED DEEP LEARNING |
3169 | S3PRL-VC: OPEN-SOURCE VOICE CONVERSION FRAMEWORK WITH SELF-SUPERVISED SPEECH REPRESENTATIONS |
6024 | S3T: SELF-SUPERVISED PRE-TRAINING WITH SWIN TRANSFORMER FOR MUSIC CLASSIFICATION |
3495 | SADN: LEARNED LIGHT FIELD IMAGE COMPRESSION WITH SPATIAL-ANGULAR DECORRELATION |
4402 | SAFARI FROM VISUAL SIGNALS: RECOVERING VOLUMETRIC 3D SHAPES |
3718 | SAFEGUARDING UAV NETWORKS THROUGH INTEGRATED SENSING, JAMMING, AND COMMUNICATIONS |
5944 | SAGA: SELF-AUGMENTATION WITH GUIDED ATTENTION FOR REPRESENTATION LEARNING |
9293 | SAGRNN: SELF-ATTENTIVE GATED RNN FOR BINAURAL SPEAKER SEPARATION WITH INTERAURAL CUE PRESERVATION |
1507 | SAIN: SIMILARITY-AWARE VIDEO FRAME INTERPOLATION |
3656 | SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays |
5881 | SAMPLING SET SELECTION FOR GRAPH SIGNALS UNDER ARBITRARY SIGNAL PRIORS |
3676 | SAR-ShipNet: SAR-Ship Detection Neural Network via Bidirectional Coordinate Attention and Multi-resolution Feature Fusion |
5951 | SA-SDR: A NOVEL LOSS FUNCTION FOR SEPARATION OF MEETING STYLE DATA |
4414 | Scalable Data Association and Multi-target Tracking under a Poisson Mixture Measurement Process |
4200 | SCALABLE NEURAL ARCHITECTURES FOR END-TO-END ENVIRONMENTAL SOUND CLASSIFICATION |
9127 | SCALABLE RIDGE LEVERAGE SCORE SAMPLING FOR THE NYSTRÖM METHOD |
3263 | Scattering Statistics of Generalized Spatial Poisson Point Processes |
7937 | SCORE DIFFICULTY ANALYSIS FOR PIANO PERFORMANCE EDUCATION BASED ON FINGERING |
2487 | SCREEN & RELAX: ACCELERATING THE RESOLUTION OF ELASTIC-NET BY SAFE IDENTIFICATION OF THE SOLUTION SUPPORT |
4106 | S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement |
4311 | SDETR: Attention-guided Salient Object Detection with Transformer |
2938 | SDNET: LIGHTWEIGHT FACIAL EXPRESSION RECOGNITION FOR SAMPLE DISEQUILIBRIUM |
2578 | SDR — MEDIUM RARE WITH FAST COMPUTATIONS |
2662 | SECMPNN: 3-PARTY PRIVACY-PRESERVING MOLECULAR STRUCTURE PROPERTIES INFERENCE |
5288 | SEED: SOUND EVENT EARLY DETECTION VIA EVIDENTIAL UNCERTAINTY |
5466 | SEGNET-BASED DEEP REPRESENTATION LEARNING FOR DYSPHAGIA CLASSIFICATION |
4667 | SEISMIC FAULT IDENTIFICATION USING GRAPH HIGH-FREQUENCY COMPONENTS AS INPUT TO GRAPH CONVOLUTIONAL NETWORK |
5539 | SELECTIVE MULTI-TASK LEARNING FOR SPEECH EMOTION RECOGNITION USING CORPORA OF DIFFERENT STYLES |
3480 | SELECTIVE MUTUAL LEARNING: AN EFFICIENT APPROACH FOR SINGLE CHANNEL SPEECH SEPARATION |
7339 | SELECTIVE SCALE CASCADE ATTENTION NETWORK FOR BREAST CANCER HISTOPATHOLOGY IMAGE CLASSIFICATION |
5864 | Self supervised representation learning with deep clustering for acoustic unit discovery from raw speech |
1817 | SELF-ATTENTION FOR INCOMPLETE UTTERANCE REWRITING |
8897 | SELF-CRITICAL SEQUENCE TRAINING FOR AUTOMATIC SPEECH RECOGNITION |
5706 | Self-Ensemble Variance Regularization for Domain Adaptation |
1206 | SELF-KNOWLEDGE DISTILLATION BASED SELF-SUPERVISED LEARNING FOR COVID-19 DETECTION FROM CHEST X-RAY IMAGES |
2179 | SELF-KNOWLEDGE DISTILLATION VIA FEATURE ENHANCEMENT FOR SPEAKER VERIFICATION |
3095 | SELF-LEARNED VIDEO SUPER-RESOLUTION WITH AUGMENTED SPATIAL AND TEMPORAL CONTEXT |
4315 | SELF-SUPERVISED ACOUSTIC ANOMALY DETECTION VIA CONTRASTIVE LEARNING |
2617 | Self-supervised Contrastive Learning for Cross-domain Hyperspectral Image Representation |
1218 | SELF-SUPERVISED LEARNING FOR SENTIMENT ANALYSIS VIA IMAGE-TEXT MATCHING |
9068 | SELF-SUPERVISED LEARNING METHOD USING MULTIPLE SAMPLING STRATEGIES FOR GENERAL-PURPOSE AUDIO REPRESENTATION |
3508 | Self-supervised learning on a lightweight low-light image enhancement model with curve refinement |
6153 | SELF-SUPERVISED REPRESENTATION LEARNING FOR UNSUPERVISED ANOMALOUS SOUND DETECTION UNDER DOMAIN SHIFT |
1388 | Self-supervised Speaker Recognition Training Using Human-Machine Dialogues |
2945 | SELF-SUPERVISED SPEAKER RECOGNITION WITH LOSS-GATED LEARNING |
2157 | SELF-SUPERVISED SPEAKER VERIFICATION WITH SIMPLE SIAMESE NETWORK AND SELF-SUPERVISED REGULARIZATION |
4485 | SEMANTIC ASSOCIATION NETWORK FOR VIDEO CORPUS MOMENT RETRIEVAL |
4155 | SEMANTICALLY PROPORTIONAL PATCHMIX FOR FEW-SHOT LEARNING |
1806 | SEMIDEFINITE RELAXATION METHOD FOR MOVING OBJECT LOCALIZATION USING A STATIONARY TRANSMITTER AT UNKNOWN POSITION |
3349 | SEMI-SUPERVISED 360° DEPTH ESTIMATION FROM MULTIPLE FISHEYE CAMERAS WITH PIXEL-LEVEL SELECTIVE LOSS |
6349 | SEMI-SUPERVISED GAUSSIAN MIXTURE VARIATIONAL AUTOENCODER FOR PULSE SHAPE DISCRIMINATION |
9277 | SEMI-SUPERVISED NEURAL CHORD ESTIMATION BASED ON A VARIATIONAL AUTOENCODER WITH LATENT CHORD LABELS AND FEATURES |
1726 | SEMI-SUPERVISED SOURCE LOCALIZATION WITH RESIDUAL PHYSICAL LEARNING |
2871 | SEMI-SUPERVISED STANDARDIZED DETECTION OF PERIODIC SIGNALS WITH APPLICATION TO EXOPLANET DETECTION |
2374 | SENSING-ASSISTED BEAM TRACKING IN V2I NETWORKS: EXTENDED TARGET CASE |
9177 | SENSORS TO SIGN LANGUAGE: A NATURAL APPROACH TO EQUITABLE COMMUNICATION |
3158 | Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition |
2385 | SENTIMENT-AWARE DISTILLATION FOR BITCOIN TREND FORECASTING UNDER PARTIAL OBSERVABILITY |
4543 | SEQUENCE TRANSDUCTION WITH GRAPH-BASED SUPERVISION |
8168 | SEQUENTIAL MCMC METHODS FOR AUDIO SIGNAL ENHANCEMENT |
4461 | SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION |
3013 | SHORT-AND-SPARSE DECONVOLUTION VIA RANK-ONE CONSTRAINED OPTIMIZATION (ROCO) |
2916 | SIGNAL COMPRESSION VIA NEURAL IMPLICIT REPRESENTATIONS |
4233 | SIGNAL PROCESSING ON CELL COMPLEXES |
1032 | SIGNAL RECOVERY FROM INCONSISTENT NONLINEAR OBSERVATIONS |
6266 | SIG-VC: A SPEAKER INFORMATION GUIDED ZERO-SHOT VOICE CONVERSION SYSTEM FOR BOTH HUMAN BEINGS AND MACHINES |
2876 | Simple Attention Module based Speaker Verification with Iterative noisy label detection |
2010 | SIMPLER IS BETTER: SPECTRAL REGULARIZATION AND UP-SAMPLING TECHNIQUES FOR VARIATIONAL AUTOENCODERS |
3559 | SIMPLICIAL CONVOLUTIONAL NEURAL NETWORKS |
1874 | SIMULATION-AND-MINING: TOWARDS ACCURATE SOURCE-FREE UNSUPERVISED DOMAIN ADAPTIVE OBJECT DETECTION |
1462 | Simultaneous Nonlocal Low-Rank and Deep Priors for Poisson Denoising |
1710 | SINGLE IMAGE DE-RAINING WITH HIGH-LOW FREQUENCY GUIDANCE |
9328 | SINGLE IMAGE SUPER-RESOLUTION USING ASYNCHRONOUS MULTI-SCALE NETWORK |
5015 | SINGLE-SHOT BALANCED DETECTOR FOR GEOSPATIAL OBJECT DETECTION |
9135 | Sketch storytelling |
2021 | SKETCHED RT3D: HOW TO RECONSTRUCT BILLIONS OF PHOTONS PER SECOND |
4114 | SKIM: SKIPPING MEMORY LSTM FOR LOW-LATENCY REAL-TIME CONTINUOUS SPEECH SEPARATION |
8940 | SLEEPGAN: TOWARDS PERSONALIZED SLEEP THERAPY MUSIC |
1188 | SLIM: EXPLICIT SLOT-INTENT MAPPING WITH BERT FOR JOINT MULTI-INTENT DETECTION AND SLOT FILLING |
4500 | SLUE: NEW BENCHMARK TASKS FOR SPOKEN LANGUAGE UNDERSTANDING EVALUATION ON NATURAL SPEECH |
1129 | SOCIAL WELFARE MAXIMIZATION IN CROSS-SILO FEDERATED LEARNING |
2017 | SODA: Self-organizing data augmentation in deep neural networks - Application to biomedical image segmentation tasks |
4772 | SOLVING THE LONG-TAILED PROBLEM VIA INTRA- AND INTER-CATEGORY BALANCE |
3455 | SOUND EVENT DETECTION GUIDED BY SEMANTIC CONTEXTS OF SCENES |
9301 | SOUND EVENT DETECTION: A TUTORIAL |
2735 | SOURCE MIXING AND SEPARATION ROBUST AUDIO STEGANOGRAPHY |
2602 | SOURCE SEPARATION BY STEERING PRETRAINED MUSIC MODELS |
2369 | SP ATTACK: SINGLE-PERSPECTIVE ATTACK FOR GENERATING ADVERSARIAL OMNIDIRECTIONAL IMAGES |
9183 | SPAIN-NET: SPATIALLY-INFORMED STEREOPHONIC MUSIC SOURCE SEPARATION |
5115 | Sparse Adversarial Attack for video via Gradient-Based Keyframe Selection |
9295 | SPARSE ANALYSIS MODEL BASED DICTIONARY LEARNING FOR SIGNAL DECLIPPING |
3660 | SPARSE ARRAY SOURCE ENUMERATION VIA COARRAY SUBSPACE OPTIMIZATION |
3971 | SPARSE MODELING OF THE EARLY PART OF NOISY ROOM IMPULSE RESPONSES WITH SPARSE BAYESIAN LEARNING |
1348 | SPARSE MULTI-REFERENCE ALIGNMENT: SAMPLE COMPLEXITY AND COMPUTATIONAL HARDNESS |
3161 | Sparse Recovery of Acoustic Waves |
5177 | SPARSE SELF-ATTENTION FOR SEMI-SUPERVISED SOUND EVENT DETECTION |
1801 | SPARSE SUBSPACE TRACKING IN HIGH DIMENSIONS |
4644 | SPARSEBFA: ATTACKING SPARSE DEEP NEURAL NETWORKS WITH THE WORST-CASE BIT FLIPS ON COORDINATES |
3264 | Sparse-Group Log-Sum Penalized Graphical Model Learning For Time Series |
3397 | SPARSITY IMPROVES UNSUPERVISED ATTRIBUTE DISCOVERY IN STYLEGAN |
8883 | SPARSITY-BASED SOUND FIELD SEPARATION IN THE SPHERICAL HARMONICS DOMAIN |
2070 | SPATIAL ACTIVE NOISE CONTROL BASED ON INDIVIDUAL KERNEL INTERPOLATION OF PRIMARY AND SECONDARY SOUND FIELDS |
3487 | SPATIAL ACTIVE NOISE CONTROL WITH THE REMOTE MICROPHONE TECHNIQUE: AN APPROACH WITH A MOVING HIGHER ORDER MICROPHONE |
4057 | SPATIAL DATA AUGMENTATION WITH SIMULATED ROOM IMPULSE RESPONSES FOR SOUND EVENT LOCALIZATION AND DETECTION |
4578 | SPATIAL MIXUP: DIRECTIONAL LOUDNESS MODIFICATION AS DATA AUGMENTATION FOR SOUND EVENT LOCALIZATION AND DETECTION |
3366 | SPATIAL PROCESSING FRONT-END FOR DISTANT ASR EXPLOITING SELF-ATTENTION CHANNEL COMBINATOR |
2789 | SPATIAL-CONTEXT-AWARE DEEP NEURAL NETWORK FOR MULTI-CLASS IMAGE CLASSIFICATION |
8752 | SPATIAL-TEMPORAL GRAPH CONVOLUTION NETWORK FOR MULTICHANNEL SPEECH ENHANCEMENT |
3871 | SPATIO-TEMPORAL ATTENTION GRAPH CONVOLUTION NETWORK FOR FUNCTIONAL CONNECTOME CLASSIFICATION |
5850 | SPATIO-TEMPORAL GRAPH COMPLEMENTARY SCATTERING NETWORKS |
4541 | SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORKS FOR CONTINUOUS SIGN LANGUAGE RECOGNITION |
1104 | SPATIO-TEMPORAL MOTION AGGREGATION NETWORK FOR VIDEO ACTION DETECTION |
2698 | Spatio-Temporal PRRS Epidemic Forecasting via Factorized Deep Generative Modeling |
3190 | SPEAKER EMBEDDING CONVERSION FOR BACKWARD AND CROSS-CHANNEL COMPATIBILITY |
3669 | SPEAKER GENERATION |
7128 | SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION |
2885 | SPEAKER NORMALIZATION FOR SELF-SUPERVISED SPEECH EMOTION RECOGNITION |
4598 | SPEAKER REINFORCEMENT USING TARGET SOURCE EXTRACTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION |
3467 | SPEAKER-TARGETED AUDIO-VISUAL SPEECH RECOGNITION USING A HYBRID CTC/ATTENTION MODEL WITH INTERFERENCE LOSS |
4335 | SPECIALISED VIDEO QUALITY MODEL FOR ENHANCED USER GENERATED CONTENT (UGC) WITH SPECIAL EFFECTS |
5167 | Spectral permutation test on persistence diagrams |
2579 | SPECTRAL-SPATIAL SYMMETRICAL AGGREGATION CROSS-LINKING MULTI-MODAL DATA FUSION NETWORK |
6151 | SPEECH DENOISING IN THE WAVEFORM DOMAIN WITH SELF-ATTENTION |
4927 | SPEECH EMOTION RECOGNITION USING SELF-SUPERVISED FEATURES |
8977 | SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION |
4238 | SPEECH EMOTION RECOGNITION WITH GLOBAL-AWARE FUSION ON MULTI-SCALE FEATURE REPRESENTATION |
4767 | SPEECH ENHANCEMENT FOR LOW BIT RATE SPEECH CODEC |
8904 | Speech enhancement with neural homomorphic synthesis |
3927 | SPEECH PATTERN BASED BLACK-BOX MODEL WATERMARKING FOR AUTOMATIC SPEECH RECOGNITION |
4711 | SPEECH RECOGNITION USING BIOLOGICALLY-INSPIRED NEURAL NETWORKS |
3750 | SPEECH RECOVERY FOR REAL-WORLD SELF-POWERED INTERMITTENT DEVICES |
4931 | SPEECH TASKS RELEVANT TO SLEEPINESS DETERMINED WITH DEEP TRANSFER LEARNING |
4023 | SPEECHMOE2: MIXTURE-OF-EXPERTS MODEL WITH IMPROVED ROUTING |
5930 | SPEECHSPLIT2.0: UNSUPERVISED SPEECH DISENTANGLEMENT FOR VOICE CONVERSION WITHOUT TUNING AUTOENCODER BOTTLENECKS |
4017 | SPELL MY NAME: KEYWORD BOOSTED SPEECH RECOGNITION |
1253 | SPHERICAL CONVOLUTIONAL RECURRENT NEURAL NETWORK FOR REAL-TIME SOUND SOURCE TRACKING |
9278 | SPLIT BREGMAN APPROACH TO LINEAR PREDICTION BASED DEREVERBERATION WITH ENFORCED SPEECH SPARSITY |
4118 | Spoken language recognition with cluster-based modeling |
2304 | SQAPP: No-Reference Speech Quality Assessment via Pairwise Preference |
2489 | SRP-DNN: LEARNING DIRECT-PATH PHASE DIFFERENCE FOR MULTIPLE MOVING SOUND SOURCE LOCALIZATION |
4929 | SRU++: PIONEERING FAST RECURRENCE WITH ATTENTION FOR SPEECH RECOGNITION |
4672 | STABILITY ANALYSIS OF UNFOLDED WMMSE FOR POWER ALLOCATION |
4626 | STABILITY OF NEURAL NETWORKS ON MANIFOLDS TO RELATIVE PERTURBATIONS |
4810 | STABLE AND TRANSFERABLE WIRELESS RESOURCE ALLOCATION POLICIES VIA MANIFOLD NEURAL NETWORKS |
6165 | STACKED MULTI-SCALE ATTENTION NETWORK FOR IMAGE COLORIZATION |
3641 | STATISTICAL PYRAMID DENSE TIME DELAY NEURAL NETWORK FOR SPEAKER VERIFICATION |
7986 | STATISTICAL, SPECTRAL AND GRAPH REPRESENTATIONS FOR VIDEO-BASED FACIAL EXPRESSION RECOGNITION IN CHILDREN |
2310 | STEALTHY BACKDOOR ATTACK WITH ADVERSARIAL TRAINING |
1224 | STGAT-MAD : Spatial-Temporal Graph Attention Network for Multivariate Time Series Anomaly Detection |
2830 | STPointGCN: Spatial Temporal Graph Convolutional Network for Multiple People Recognition Using Millimeter-Wave Radar |
5175 | STREAMING ON-DEVICE DETECTION OF DEVICE DIRECTED SPEECH FROM VOICE AND TOUCH-BASED INVOCATION |
2389 | STREAMING TRANSFORMER TRANSDUCER BASED SPEECH RECOGNITION USING NON-CAUSAL CONVOLUTION |
1782 | STRUCTURAL PRIOR MODELS FOR 3-D DEEP VESSEL SEGMENTATION |
3380 | STUDY OF POSITIONAL ENCODING APPROACHES FOR AUDIO SPECTROGRAM TRANSFORMERS |
9245 | STUDY OF PRE-PROCESSING DEFENSES AGAINST ADVERSARIAL ATTACKS ON STATE-OF-THE-ART SPEAKER RECOGNITION SYSTEMS |
2995 | STUDY OF THE NULL DIRECTIONS ON THE PERFORMANCE OF DIFFERENTIAL BEAMFORMERS |
1789 | STUDY ON TIME-OF-FLIGHT ESTIMATION IN ULTRASONIC WELL LOGGING TOOL: MODEL-DRIVEN TRANSFER LEARNING |
3953 | STUDYING THREE FAMILIES OF DIVERGENCES TO COMPARE WIDE-SENSE STATIONARY GAUSSIAN ARMA PROCESSES |
4638 | StyleGAN-induced data-driven regularization for inverse problems |
9113 | Subgraph Representation Learning With Hard Negative Samples for Inductive Link Prediction |
1262 | Subjective and Objective Quality Assessment of Mobile Gaming Video |
3651 | SUBSPACE CLUSTERING USING UNSUPERVISED DATA AUGMENTATION |
9260 | SUBSPACE DETECTION AND BLIND SOURCE SEPARATION OF MULTIVARIATE SIGNALS BY DYNAMICAL COMPONENT ANALYSIS (DYCA) |
8981 | SUPERRESOLUTION AND SEGMENTATION OF OCT SCANS USING MULTI-STAGE ADVERSARIAL GUIDED ATTENTION TRAINING |
5227 | SUPER-RESOLUTION OF SATELLITE IMAGES BY TWO-DIMENSIONAL RRDB AND EDGE-ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK |
4765 | SUPERVISED AND SELF-SUPERVISED PRETRAINING BASED COVID-19 DETECTION USING ACOUSTIC BREATHING/COUGH/SPEECH SIGNALS |
3220 | SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION |
5595 | Supervised Learning based Sparse Channel Estimation for RIS aided Communications |
4082 | SUPERVISED TRAINING OF SIAMESE SPIKING NEURAL NETWORKS WITH EARTH MOVER’S DISTANCE |
2200 | SYMBOL-LEVEL ONLINE CHANNEL TRACKING FOR DEEP RECEIVERS |
2075 | Synergistic Network Learning and Label Correction for Noise-robust Image Classification |
2529 | SYNPOSE: A LARGE-SCALE AND DENSELY ANNOTATED SYNTHETIC DATASET FOR HUMAN POSE ESTIMATION IN CLASSROOM |
4435 | SYNT++: UTILIZING IMPERFECT SYNTHETIC DATA TO IMPROVE SPEECH RECOGNITION |
4995 | SYNTAX-BASED GRAPH MATCHING FOR KNOWLEDGE BASE QUESTION ANSWERING |
4755 | SYNTHESIS OF ADVERSARIAL SAMPLES IN TWO-STAGE CLASSIFIERS |
4207 | SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DYSARTHRIC SPEECH RECOGNITION |
2949 | TACKLING DATA SCARCITY IN SPEECH TRANSLATION USING ZERO-SHOT MULTILINGUAL MACHINE TRANSLATION TECHNIQUES |
4800 | TACKLING THE SCORE SHIFT IN CROSS-LINGUAL SPEAKER VERIFICATION BY EXPLOITING LANGUAGE INFORMATION |
1637 | TALKINGFLOW: TALKING FACIAL LANDMARK GENERATION WITH MULTI-SCALE NORMALIZING FLOW NETWORK |
3877 | TARGET-AWARE AUTO-AUGMENTATION FOR UNSUPERVISED DOMAIN ADAPTIVE OBJECT DETECTION |
1339 | TARGETDROP: A TARGETED REGULARIZATION METHOD FOR CONVOLUTIONAL NEURAL NETWORKS |
1293 | TCRNET: MAKE TRANSFORMER, CNN AND RNN COMPLEMENT EACH OTHER |
3532 | TEACHING CNNS TO MIMIC HUMAN VISUAL COGNITIVE PROCESS & REGULARISE TEXTURE-SHAPE BIAS |
4528 | TED TALK TEASER GENERATION WITH PRE-TRAINED MODELS |
4949 | TEMPO: IMPROVING TRAINING PERFORMANCE IN CROSS-SILO FEDERATED LEARNING |
5058 | TEMPORAL CONTRASTIVE-LOSS FOR AUDIO EVENT DETECTION |
2816 | TEMPORAL CROSS-GRAPH NETWORK FOR BRAIN FUNCTIONAL ACTIVITY PREDICTION |
2778 | Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemic Analysis |
4798 | Temporal Early Exiting for Streaming Speech Commands Recognition |
5998 | TEMPORAL KNOWLEDGE DISTILLATION FOR ON-DEVICE AUDIO CLASSIFICATION |
2152 | TENSOR-BASED ORTHOGONAL MATCHING PURSUIT WITH PHASE ROTATION FOR CHANNEL ESTIMATION IN HYBRID BEAMFORMING MIMO-OFDM SYSTEMS |
2567 | Terahertz Image Restoration Benchmarking Dataset |
5437 | TEST-TIME DETECTION OF BACKDOOR TRIGGERS FOR POISONED DEEP NEURAL NETWORKS |
5739 | TEXT ADAPTIVE DETECTION FOR CUSTOMIZABLE KEYWORD SPOTTING |
3611 | Text2Poster: Laying out Stylized Texts on Retrieved Images |
1272 | Text2Video: Text-driven Talking-head Video Synthesis with Personalized Phoneme - Pose Dictionary |
4419 | TEXT-FREE NON-PARALLEL MANY-TO-MANY VOICE CONVERSION USING NORMALISING FLOWS |
4535 | TEXT-IMAGE DE-CONTEXTUALIZATION DETECTION USING VISION-LANGUAGE MODELS |
4170 | Texture Information Boosts Video Quality Assessment |
1828 | TFPSNET: TIME-FREQUENCY DOMAIN PATH SCANNING NETWORK FOR SPEECH SEPARATION |
5009 | THE COCKTAIL FORK PROBLEM: THREE-STEM AUDIO SEPARATION FOR REAL-WORLD SOUNDTRACKS |
3113 | THE CORAL++ ALGORITHM FOR UNSUPERVISED DOMAIN ADAPTATION OF SPEAKER RECOGNITION |
2583 | THE DATA/IDENTITY TRADEOFF WITH CENSORED SENSORS |
2061 | THE DAWN OF QUANTUM NATURAL LANGUAGE PROCESSING |
9317 | THE EFFECT OF PARTIAL TIME-FREQUENCY MASKING OF THE DIRECT SOUND ON THE PERCEPTION OF REVERBERANT SPEECH |
2436 | The impact of cross language on acoustic-to-articulatory inversion and its influence on articulatory speech synthesis |
4523 | THE IMPACT OF JPEG COMPRESSION ON PRIOR IMAGE NOISE |
2541 | THE IMPACT OF REMOVING HEAD MOVEMENTS ON AUDIO-VISUAL SPEECH ENHANCEMENT |
4418 | THE MIRRORNET : LEARNING AUDIO SYNTHESIZER CONTROLS INSPIRED BY SENSORIMOTOR INTERACTION |
2370 | THE PROTOTYPE CO-PRIME ARRAY WITH A ROBUST DIFFERENCE CO-ARRAY |
8267 | THE REPRESENTATION JENSEN-RÉNYI DIVERGENCE |
4471 | The Second DiCOVA Challenge: Dataset and performance analysis for Diagnosis of COVID-19 using acoustics |
4571 | THIN SLICES OF DEPRESSION: IMPROVING DEPRESSION DETECTION PERFORMANCE THROUGH DATA SEGMENTATION |
2170 | TH-NET: A METHOD OF SINGLE 3D OBJECT TRACKING BASED ON TRANSFORMERS AND HAUSDORFF DISTANCE |
7914 | Threshold Independent Evaluation of Sound Event Detection Scores |
5407 | TIE YOUR EMBEDDINGS DOWN: CROSS-MODAL LATENT SPACES FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING |
8941 | Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model |
2677 | TIME DOMAIN RADIAL FILTER DESIGN FOR SPHERICAL WAVES |
3178 | TIME-BALANCED FOCAL LOSS FOR AUDIO EVENT DETECTION |
2363 | TIME-DOMAIN ACOUSTIC CONTRAST CONTROL WITH A SPATIAL UNIFORMITY CONSTRAINT FOR PERSONAL AUDIO SYSTEMS |
9236 | TIME-DOMAIN AUDIO SOURCE SEPARATION WITH NEURAL NETWORKS BASED ON MULTIRESOLUTION ANALYSIS |
7978 | TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS |
1322 | TIME-FREQUENCY AND GEOMETRIC ANALYSIS OF TASK-DEPENDENT LEARNING IN RAW WAVEFORM BASED ACOUSTIC MODELS |
1181 | TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT |
3213 | TINYS2I: A SMALL-FOOTPRINT UTTERANCE CLASSIFICATION MODEL WITH CONTEXTUAL SUPPORT FOR ON-DEVICE SLU |
3277 | TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context |
4272 | T-NGA: TEMPORAL NETWORK GRAFTING ALGORITHM FOR LEARNING TO PROCESS SPIKING AUDIO SENSOR EVENTS |
2342 | TNTC: two-stream network with transformer-based complementarity for gait-based emotion recognition |
5064 | TO CATCH A CHORUS, VERSE, INTRO, OR ANYTHING ELSE: ANALYZING A SONG WITH STRUCTURAL FUNCTIONS |
4801 | TONET: TONE-OCTAVE NETWORK FOR SINGING MELODY EXTRACTION FROM POLYPHONIC MUSIC |
5186 | Topological correlation of brain signals |
1505 | TORCHAUDIO: BUILDING BLOCKS FOR AUDIO AND SPEECH PROCESSING |
4303 | TOWARD DEGRADATION-ROBUST VOICE CONVERSION |
3163 | TOWARD MMWAVE-BASED SOUND ENHANCEMENT AND SEPARATION |
4659 | TOWARDS A COMMON SPEECH ANALYSIS ENGINE |
4577 | TOWARDS ACCURATE CROSS-DOMAIN IN-BED HUMAN POSE ESTIMATION |
4957 | TOWARDS AUTOMATIC TRANSCRIPTION OF POLYPHONIC ELECTRIC GUITAR MUSIC: A NEW DATASET AND A MULTI-LOSS TRANSFORMER MODEL |
5123 | TOWARDS BETTER META-INITIALIZATION WITH TASK AUGMENTATION FOR KINDERGARTEN-AGED SPEECH RECOGNITION |
3738 | TOWARDS CLOSED-LOOP SPEECH SYNTHESIS FROM STEREOTACTIC EEG: A UNIT SELECTION APPROACH |
4925 | TOWARDS CONTROLLABLE AND PHYSICAL INTERPRETABLE UNDERWATER SCENE SIMULATION |
4022 | TOWARDS END-TO-END INTEGRATION OF DIALOG HISTORY FOR IMPROVED SPOKEN LANGUAGE UNDERSTANDING |
5204 | Towards End-to-End Speaker Diarization with Generalized Neural Speaker Clustering |
8925 | TOWARDS EXPRESSIVE SPEAKING STYLE MODELLING WITH HIERARCHICAL CONTEXT INFORMATION FOR MANDARIN SPEECH SYNTHESIS |
4934 | TOWARDS FAST AND CONVENIENT END-TO-END HRTF PERSONALIZATION |
2886 | TOWARDS FASTER CONTINUOUS MULTI-CHANNEL HRTF MEASUREMENTS BASED ON LEARNING SYSTEM MODELS |
3144 | TOWARDS IDENTITY PRESERVING NORMAL TO DYSARTHRIC VOICE CONVERSION |
3139 | Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning |
5735 | Towards Interpreting Deep Learning Models to Understand Loss of Speech Intelligibility in Speech Disorders Step 2: Contribution of the emergence of phonetic traits |
1696 | TOWARDS JOINT FRAME-LEVEL AND MOS QUALITY PREDICTIONS WITH LOW-COMPLEXITY OBJECTIVE MODELS |
3416 | TOWARDS LEARNING UNIVERSAL AUDIO REPRESENTATIONS |
2055 | TOWARDS LIFELONG LEARNING OF MULTILINGUAL TEXT-TO-SPEECH SYNTHESIS |
1575 | Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification |
2696 | TOWARDS MEASURING FAIRNESS IN SPEECH RECOGNITION: CASUAL CONVERSATIONS DATASET TRANSCRIPTIONS |
1085 | Towards Practical and Efficient Long Video Summary |
3199 | TOWARDS REDUCING THE NEED FOR SPEECH TRAINING DATA TO BUILD SPOKEN LANGUAGE UNDERSTANDING SYSTEMS |
4669 | TOWARDS ROBUST SPEECH-TO-TEXT ADVERSARIAL ATTACK |
4604 | Towards Robust Visual Transformer Networks via K-Sparse Attention |
4940 | TOWARDS SPEAKER AGE ESTIMATION WITH LABEL DISTRIBUTION LEARNING |
8843 | TOWARDS TRANSFERABLE SPEECH EMOTION REPRESENTATION: ON LOSS FUNCTIONS FOR CROSS-LINGUAL LATENT REPRESENTATIONS |
3633 | Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation |
5586 | TPARN: Triple-path attentive recurrent network for time-domain multichannel speech enhancement |
4803 | TP-VIT: A TWO-PATHWAY VISION TRANSFORMER FOR VIDEO ACTION RECOGNITION |
4947 | TRACKING THE DIMENSIONS OF LATENT SPACES OF GAUSSIAN PROCESS LATENT VARIABLE MODELS |
9323 | TRADE-OFFS IN DECENTRALIZED MULTI-ANTENNA ARCHITECTURES: THE WAX DECOMPOSITION |
4881 | TRAINING PRIVACY-PRESERVING VIDEO ANALYTICS PIPELINES BY SUPPRESSING FEATURES THAT REVEAL INFORMATION ABOUT PRIVATE ATTRIBUTES |
3465 | TRAINING ROBUST ZERO-SHOT VOICE CONVERSION MODELS WITH SELF-SUPERVISED FEATURES |
5114 | TRAINING STABLE GRAPH NEURAL NETWORKS THROUGH CONSTRAINED LEARNING |
4916 | TRAINING STRATEGIES FOR AUTOMATIC SONG WRITING: A UNIFIED FRAMEWORK PERSPECTIVE |
5963 | Training Strategies For Improved Lip-reading |
1630 | TRANSCRIBE-TO-DIARIZE: NEURAL SPEAKER DIARIZATION FOR UNLIMITED NUMBER OF SPEAKERS USING END-TO-END SPEAKER-ATTRIBUTED ASR |
1891 | Transducer-Based Streaming Deliberation For Cascaded Encoders |
3350 | TRANSDUCTIVE CLIP WITH CLASS-CONDITIONAL CONTRASTIVE LEARNING |
3330 | Transformer-based Domain Adaptation for Event Data Classification |
4211 | Transformer-Based Estimation of Spoken Sentences using Electrocorticography |
4851 | TRANSFORMER-BASED MULTI-ASPECT MULTI-GRANULARITY NON-NATIVE ENGLISH SPEAKER PRONUNCIATION ASSESSMENT |
3793 | TRANSFORMER-BASED PERSON SEARCH MODEL WITH SYMMETRIC ONLINE INSTANCE MATCHING |
4656 | TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION |
4168 | TRANSFORMER-S2A: ROBUST AND EFFICIENT SPEECH-TO-ANIMATION |
3334 | TRANSIENT ANALYSIS OF CLUSTERED MULTITASK DIFFUSION RLS ALGORITHM |
2236 | TRANSIENT DETECTION WITH UNKNOWN STATISTICS VIA SOURCE CODING |
2488 | Transmit Beamforming with Fixed Covariance for Integrated MIMO Radar and Multiuser Communications |
4231 | TranSTL: Spatial-Temporal Localization Transformer for Multi-Label Video Classification |
1917 | TRIBYOL: TRIPLET BYOL FOR SELF-SUPERVISED REPRESENTATION LEARNING |
9284 | Triply Complementary Priors for Image Restoration |
1840 | T-SVD BASED BROADBAND NON-SYNCHRONOUS MEASUREMENTS |
4014 | Tts4pretrain 2.0: Advancing the use of text and speech in ASR pretraining with consistency and contrastive losses |
3485 | TUNET: A BLOCK-ONLINE BANDWIDTH EXTENSION MODEL BASED ON TRANSFORMERS AND SELF-SUPERVISED PRETRAINING |
1347 | TURN-TO-DIARIZE: ONLINE SPEAKER DIARIZATION CONSTRAINED BY TRANSFORMER TRANSDUCER SPEAKER TURN DETECTION |
3569 | TWO STRATEGIES TOWARD LIGHTWEIGHT IMAGE SUPER-RESOLUTION |
3009 | TWO-PATH GMM-RESNET AND GMM-SENET FOR ASV SPOOFING DETECTION |
4373 | Two-snapshot DOA Estimation via Hankel-structured Matrix Completion |
7896 | TYPE-AWARE MEDICAL VISUAL QUESTION ANSWERING |
3219 | UBILUNG: MULTI-MODAL PASSIVE-BASED LUNG HEALTH ASSESSMENT |
9128 | UBIQUITOUS PHYSIOLOGICAL PREDICTION OF SUD PATIENTS’ WELLNESS STATE USING MEMORY-BASED CONVOLUTIONAL MODELS |
1775 | UFORMER: A UNET BASED DILATED COMPLEX & REAL DUAL-PATH CONFORMER NETWORK FOR SIMULTANEOUS SPEECH ENHANCEMENT AND DEREVERBERATION |
2278 | U-GAT-VC: Unsupervised Generative Attentional Networks for Non-parallel Voice Conversion |
4527 | UNCERTAINTY ESTIMATION WITH A VAE-CLASSIFIER HYBRID MODEL |
2587 | UNCERTAINTY IN DATA-DRIVEN KALMAN FILTERING FOR PARTIALLY KNOWN STATE-SPACE MODELS |
9305 | Underdetermined Direction-of-Arrival Estimation Using Sparse Circular Arrays on a Rotating Platform |
3632 | UNDERDETERMINED TWO-DIMENSIONAL LOCALIZATION FOR WIDEBAND SOURCES BASED ON DISTRIBUTED SENSOR ARRAY NETWORKS |
3921 | UNDERWATER IMAGE ENHANCEMENT VIA LEARNING WATER TYPE DESENSITIZED REPRESENTATIONS |
5020 | UNDERWATER SMALL TARGET DETECTION BASED ON DEFORMABLE CONVOLUTIONAL PYRAMID |
2913 | UNDERWATER STEREO MATCHING VIA UNSUPERVISED APPEARANCE AND FEATURE ADAPTATION NETWORKS |
1166 | UNET-TTS: IMPROVING UNSEEN SPEAKER AND STYLE TRANSFER IN ONE-SHOT VOICE CLONING |
3566 | UNFOLDING MODEL-BASED BEAMFORMING FOR HIGH QUALITY ULTRASOUND IMAGING |
5290 | UNIFIED MATRIX CODING FOR NN ORIGINATED MIP IN H.266/VVC |
5796 | Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus |
3290 | UNIFIED SPECULATION, DETECTION, AND VERIFICATION KEYWORD SPOTTING |
8894 | UNIMODULAR WAVEFORM DESIGN WITH LOW CORRELATION LEVELS: A FAST ALGORITHM DEVELOPMENT TO SUPPORT LARGE-SCALE CODE LENGTHS |
2884 | UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING |
1379 | UNIVERSAL EFFICIENT VARIABLE-RATE NEURAL IMAGE COMPRESSION |
2641 | UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS |
9074 | UNLIMITED SAMPLING WITH LOCAL AVERAGES |
4468 | UNLIMITED SAMPLING WITH SPARSE OUTLIERS: EXPERIMENTS WITH IMPULSIVE AND JUMP OR RESET NOISE |
4319 | UNROLLING PARTICLES: UNSUPERVISED LEARNING OF SAMPLING DISTRIBUTIONS |
2777 | UNSUPERVISED AND UNTRAINED UNDERWATER IMAGE RESTORATION BASED ON PHYSICAL IMAGE FORMATION MODEL |
2451 | UNSUPERVISED ANOMALY DETECTION FOR CONTAINER CLOUD VIA BILSTM-BASED VARIATIONAL AUTO-ENCODER |
3634 | UNSUPERVISED AUDIO-CAPTION ALIGNING LEARNS CORRESPONDENCES BETWEEN INDIVIDUAL SOUND EVENTS AND TEXTUAL PHRASES |
5195 | UNSUPERVISED CLUSTERING AND ANALYSIS OF CONTRACTION-DEPENDENT FETAL HEART RATE SEGMENTS |
8957 | UNSUPERVISED CONTRASTIVE HASHING FOR CROSS-MODAL RETRIEVAL IN REMOTE SENSING |
6273 | UNSUPERVISED DATA SELECTION FOR SPEECH RECOGNITION WITH CONTRASTIVE LOSS RATIOS |
4973 | UNSUPERVISED DEEP LEARNING NETWORK FOR DEFORMABLE FUNDUS IMAGE REGISTRATION |
1503 | UNSUPERVISED HIERARCHICAL TRANSLATION-BASED MODEL FOR MULTI-MODAL MEDICAL IMAGE REGISTRATION |
3475 | UNSUPERVISED MODEL ADAPTATION FOR END-TO-END ASR |
4325 | UNSUPERVISED SPEECH ENHANCEMENT WITH SPEECH RECOGNITION EMBEDDING AND DISENTANGLEMENT LOSSES |
3004 | UNSUPERVISED WORD-LEVEL PROSODY TAGGING FOR CONTROLLABLE SPEECH SYNTHESIS |
4030 | Upmixing via style transfer: a variational autoencoder for disentangling spatial images and musical content |
4706 | URBAN SOUND & SIGHT: DATASET AND BENCHMARK FOR AUDIO-VISUAL URBAN SCENE UNDERSTANDING |
2041 | USER SCHEDULING USING GRAPH NEURAL NETWORKS FOR RECONFIGURABLE INTELLIGENT SURFACE ASSISTED MULTIUSER DOWNLINK COMMUNICATIONS |
1561 | USING A SINGLE INPUT TO FORECAST HUMAN ACTION KEYSTATES IN EVERYDAY PICK AND PLACE ACTIONS |
3925 | USING ACOUSTIC DEEP NEURAL NETWORK EMBEDDINGS TO DETECT MULTIPLE SCLEROSIS FROM SPEECH |
5154 | USING MULTIPLE REFERENCE AUDIOS AND STYLE EMBEDDING CONSTRAINTS FOR SPEECH SYNTHESIS |
5679 | USING SPECTRAL SEQUENCE-TO-SEQUENCE AUTOENCODERS TO ASSESS MILD COGNITIVE IMPAIRMENT |
5434 | USTED: IMPROVING ASR WITH A UNIFIED SPEECH AND TEXT ENCODER-DECODER |
4734 | VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION |
2035 | VARARRAY: ARRAY-GEOMETRY-AGNOSTIC CONTINUOUS SPEECH SEPARATION |
3015 | VARIABLE SPAN TRADE-OFF FILTER FOR SOUND ZONE CONTROL WITH KERNEL INTERPOLATION WEIGHTING |
2667 | VARIANCE REDUCTION-BOOSTED BYZANTINE ROBUSTNESS IN DECENTRALIZED STOCHASTIC OPTIMIZATION |
2896 | VarianceFlow: High-quality and Controllable Text-to-Speech Using Variance Information via Normalizing Flow |
3568 | VARIATIONAL BAYESIAN FRAMEWORK FOR ADVANCED IMAGE GENERATION WITH DOMAIN-RELATED VARIABLES |
5716 | VARIATIONAL BAYESIAN GRAPH CONVOLUTIONAL NETWORK FOR ROBUST COLLABORATIVE FILTERING |
1649 | VARIATIONAL BAYESIAN TENSOR NETWORKS WITH STRUCTURED POSTERIORS |
2882 | VCD: VIEW-CONSTRAINT DISENTANGLEMENT FOR ACTION RECOGNITION |
9142 | VCVTS: MULTI-SPEAKER VIDEO-TO-SPEECH SYNTHESIS VIA CROSS-MODAL KNOWLEDGE TRANSFER FROM VOICE CONVERSION |
2569 | VIDEO ANOMALY DETECTION VIA PREDICTION NETWORK WITH ENHANCED SPATIO-TEMPORAL MEMORY EXCHANGE |
1922 | VIDEO FRAME INTERPOLATION VIA LOCAL LIGHTWEIGHT BIDIRECTIONAL ENCODING WITH CHANNEL ATTENTION CASCADE |
5171 | VIOLINIST IDENTIFICATION USING NOTE-LEVEL TIMBRE FEATURE DISTRIBUTIONS |
3963 | VISINGER: VARIATIONAL INFERENCE WITH ADVERSARIAL LEARNING FOR END-TO-END SINGING VOICE SYNTHESIS |
1683 | VISION TRANSFORMER EQUIPPED WITH NEURAL RESIZER ON FACIAL EXPRESSION RECOGNITION TASK |
5683 | VISION TRANSFORMER-BASED RETINA VESSEL SEGMENTATION WITH DEEP ADAPTIVE GAMMA CORRECTION |
4393 | VISUAL REPRESENTATION LEARNING WITH SELF-SUPERVISED ATTENTION FOR LOW-LABEL HIGH-DATA REGIME |
4197 | VISUALTTS: TTS WITH ACCURATE LIP-SPEECH SYNCHRONIZATION FOR AUTOMATIC VOICE OVER |
4584 | VOCALSOUND: A DATASET FOR IMPROVING HUMAN VOCAL SOUNDS RECOGNITION |
5297 | VOCBENCH: A NEURAL VOCODER BENCHMARK FOR SPEECH SYNTHESIS |
4428 | VOICE FILTER: FEW-SHOT TEXT-TO-SPEECH SPEAKER ADAPTATION USING VOICE CONVERSION AS A POST-PROCESSING MODULE |
3088 | VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING |
8800 | VR-FAM: VARIANCE-REDUCED ENCODER WITH NONLINEAR TRANSFORMATION FOR FACIAL ATTRIBUTE MANIPULATION |
1052 | VSEGAN: VISUAL SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK |
5318 | VU-BERT: A UNIFIED FRAMEWORK FOR VISUAL DIALOG |
7434 | W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION |
5082 | Wasserstein Cross-lingual Alignment for Named Entity Recognition |
1209 | WASSERTRAIN: AN ADVERSARIAL TRAINING FRAMEWORK AGAINST WASSERSTEIN ADVERSARIAL ATTACKS |
3205 | WATERMARKING IMAGES IN SELF-SUPERVISED LATENT SPACES |
2600 | WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP |
3222 | WAV2VEC-SWITCH: CONTRASTIVE LEARNING FROM ORIGINAL-NOISY SPEECH PAIRS FOR ROBUST SPEECH RECOGNITION |
9291 | WAVE DIGITAL MODELING AND IMPLEMENTATION OF NONLINEAR AUDIO CIRCUITS WITH NULLORS |
9060 | WAVEBENDER GAN: AN ARCHITECTURE FOR PHONETICALLY MEANINGFUL SPEECH MANIPULATION |
4478 | WAVE-DOMAIN APPROACH FOR CANCELLING NOISE ENTERING OPEN WINDOWS |
1195 | WAVEFORM OPTIMIZATION FOR WIRELESS POWER TRANSFER WITH POWER AMPLIFIER AND ENERGY HARVESTER NON-LINEARITIES |
2597 | WAVELET-BASED UNSUPERVISED LABEL-TO-IMAGE TRANSLATION |
3659 | WEAK TARGET DETECTION IN MASSIVE MIMO RADAR VIA AN IMPROVED REINFORCEMENT LEARNING APPROACH |
1643 | Weakly Supervised Point Cloud Upsampling via Optimal Transport |
6607 | WEARABLE SELD DATASET: DATASET FOR SOUND EVENT LOCALIZATION AND DETECTION USING WEARABLE DEVICES AROUND HEAD |
3847 | WEIGHTED GRAPH EMBEDDED LOW-RANK PROJECTION LEARNING FOR FEATURE EXTRACTION |
1115 | WEIGHTED WAVELET-BASED SPECTRAL-SPATIAL TRANSFORMS FOR CFA-SAMPLED RAW CAMERA IMAGE COMPRESSION CONSIDERING IMAGE FEATURES |
5156 | WENETSPEECH: A 10000+ HOURS MULTI-DOMAIN MANDARIN CORPUS FOR SPEECH RECOGNITION |
2478 | What is the Patient Looking at? Robust Gaze-Scene Intersection under free-viewing conditions |
1523 | When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing |
5338 | WHEN DOES BACKDOOR ATTACK SUCCEED IN IMAGE RECONSTRUCTION? A STUDY OF HEURISTICS VS. BI-LEVEL SOLUTION |
3476 | WIDE-SENSE STATIONARITY AND SPECTRAL ESTIMATION FOR GENERALIZED GRAPH SIGNAL |
4627 | wikiTAG: Wikipedia-based knowledge embeddings towards improved acoustic event classification |
4608 | Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning |
8833 | WISHART LOCALIZATION PRIOR ON SPATIAL COVARIANCE MATRIX IN AMBISONIC SOURCE SEPARATION USING NON-NEGATIVE TENSOR FACTORIZATION |
1436 | WLINKER: MODELING RELATIONAL TRIPLET EXTRACTION AS WORD LINKING |
4375 | WLS DESIGN OF ARMA GRAPH FILTERS USING ITERATIVE SECOND-ORDER CONE PROGRAMMING |
9159 | WORD ORDER DOES NOT MATTER FOR SPEECH RECOGNITION |
4451 | WORDMARKOV: A NEW PASSWORD PROBABILITY MODEL OF SEMANTICS |
2009 | Zero-shot Cross-lingual Transfer using multi-stream encoder and efficient speaker representation |
8847 | ZEROTH-ORDER RANDOMIZED SUBSPACE NEWTON METHODS |