ICIP 2021 will showcase 12 special sessions on the following topics.
With a number of breakthroughs in autonomous system technology over the past decade, the race to commercialize self-driving cars has become fiercer than ever. The integration of advanced sensing, computer vision, signal/image processing, and machine/deep learning into autonomous vehicles enables them to perceive the environment intelligently and navigate safely. Autonomous driving is required to ensure safe, reliable, and efficient automated mobility in complex uncontrolled real-world environments. Various applications range from automated transportation and farming to public safety and environmental exploration. Visual perception is a critical component of autonomous driving. Enabling technologies include: a) affordable sensors that can acquire useful data under varying environmental conditions, b) reliable simultaneous localization and mapping, c) machine learning that can effectively handle varying real-world conditions and unforeseen events, as well as “machine-learning friendly” signal processing to enable more effective classification and decision making, d) hardware and software co-design for efficient real-time performance, e) resilient and robust platforms that can withstand adversarial attacks and failures, and f) end-to-end system integration of sensing, computer vision, signal/image processing and machine/deep learning. The special session will cover all these topics. Research papers are solicited in, but not limited to, the following topics:
Due to recent advances in visual capture technology, point clouds (PC) have been recognized as a crucial data structure for 3D content. In particular, PC's are essential for numerous applications such as virtual and mixed reality, 3D content production, sensing for autonomous vehicle navigation, architecture and cultural heritage, etc. Point clouds are sets of 3D points identified by their coordinates, which constitute the geometry of the point cloud. In addition, each point can be associated with attributes like colors, normals and reflections. Point clouds can have a massive number of points, especially in high precision or large scale captures, leading to a huge storage and transmission cost. As a result, efficient PC coding schemes, as well as quality metrics to assess their coding performance, have been recently studied in both the research and standardization communities. As an example, MPEG recently started activities towards the second generation of geometry-based PC coding. Despite the advances in this field in the past few years, several challenges still need to be solved, e.g., designing efficient schemes to deal with low-density point clouds; constructing large-scale subjectively annotated datasets to train and benchmark perceptual quality metrics; learning good signal representations for PC's using deep convolutional neural networks. In this special session, we aim at gathering together and discussing some of the most recent and significant results on methods for coding and quality prediction of 3D point clouds. Topics of interest include, but are not limited to:
The emergence of 360°-video capture devices, light field camera arrays and head-mounted displays (HMDs) has created new opportunities for content creators for delivering truly immersive experiences to audiences. Furthermore, it is getting more and more relevant due to the recent pandemic which boosted remote working. This means that immersive imaging technology research still is and going to be a hot topic in near future. Handling the sheer amount of data in all processing steps of light field imaging, 360-video and volumetric video from capture to display is a major challenge. Especially, the streaming of the data to the audiences in high quality is still an unsolved problem. Furthermore, technical limitations of the capture devices, like incorrect 3D-to-2D mapping and optical distortions in omnidirectional image acquisition, optical distortions or low spatial and angular resolution of light field images as well as segmentation or time-consistent 3D reconstruction in dynamic volumetric video reduce the quality of experience on the consumer side. To solve these limitations, novel models and representations need to be proposed for immersive imaging technologies. The special session should address these issues and contribute solutions to tackle the problems with highly advanced immersive imaging technologies. The special session contributes to the following aspects in particular (but not limited to):
New generation video coding standards, e.g., VVC, AVS3 and AV1 have been finalized in recent years, which have offered up to 50% bit-rate savings over the preceding HEVC at the same perceptual quality. Coding efficiency gains are achieved at a cost of significant complexity increase for both encoder and decoder. For example, it is reported that the encoding time of VVC reference software VTM is more than 10x to the HEVC reference software HM under same encoder configuration. To make the latest video coding standards applicable to real-life products, more efforts shall be paid to codec optimization techniques for complexity reduction as well as the quality improvement. On the other hand, pixel-wise distortion metric, e.g., peak signal-to-noise ratio (PSNR) is still dominantly used in these emerging standards for performance optimization. Note that human visual system (HVS) is the ultimate receiver of video content, and the pixel-wise objective metric, e.g., PSNR, is not well correlated with it. Therefore, this special session seeks for technical submissions to improve the performance of VVC/AVS3/AV1 codec, objectively and perceptually. Topics of interest of this special session include, but are not limited to complexity optimization, and perceptual compression.
Imaging Science, with its emphasis on computational imaging, has had a great impact on a number of fields such as Radiology, Biology, and National Security. Since going digital, Microscopy has also benefited from adopting modern methods of computational imaging. Materials Science, with its microscopy-intensive approach, is now poised to make similar advances. With automated acquisition, Materials Science is expanding beyond its traditional strengths in optical and electron microscopy into areas such as tomography, hyperspectral methods, dynamic sensing, and numerous diffraction modalities. Large volume data is routinely acquired with laboratory based equipment as well as in the larger user centers with synchrotron X-ray and neutron bright sources. Additionally Materials Science has a rich, physics-based understanding of the origins of structures, which promises to go beyond expert opinions towards unbiased assessments of computational results based on physical principles. While tremendous opportunities exist, the reality is that methods for image analysis in Materials Science lag behind other fields, relying mainly on classical image processing. Recently, some materials researchers have begun to adopt modern optimization- and learning-based approaches, making significant advances over the state of the art of their field. It is anticipated that cross-disciplinary collaborations like these between materials scientists and imaging scientists will continue to grow and lead to more significant progress in this field. This forum will bring together researchers engaged in computational imaging methods with those engaged in materials science applications of modern imaging science, with an eye toward advancing both fields.
Deep learning has become a widely used ingredient of many methods in particular in signal processing, computer vision, audio analysis and other related fields. While impressive progress has been shown in academic research, putting these methods into practical applications brings new challenges, such as the size of trained neural networks to be deployed, the size of features (compared to the size of the media content from which they were extracted), restricted target device capabilities, and the fact that end-to-end learning based approaches produce similar yet non-interoperable features for each particular task. While efficient deep learning has been the subject of workshops and special sessions in recent years, these more application focused aspects, that also address questions of interoperability have only been marginally covered.
This special session will present novel research on compact/compressed representation of neural networks, interoperable and compact deep features and related areas. In particular, we are interested in bridging the gap between academic research and standardization efforts in this area, such as the MPEG activities on compression of neural networks for multimedia content description and analysis and video coding for machines. This special session will thus bring together researchers and practitioners from application domains such as signal processing, computer vision, as well as from machine learning and data compression.
While many traditional approaches are investigated to provide information security, machine learning, as the driving force of this wave of AI, provides powerful solutions to many real-world technical and scientific challenges. Multimedia security-oriented applications have recently received a new boost, particularly due to the development of powerful methods relying on the advancements of deep learning (DL). On the other hand, AI has raised the problem of counterfeiting multimedia data to an unprecedented level. High-quality fake videos and audios generated by AI algorithms (the deep fakes) have started to challenge the status of videos and audios as definitive evidence of events. Together with the concerns raised by the inherent security of deep networks, new serious threats of vulnerability and fragility are then posed, with the consequent need for advanced systems capable of working under more challenging conditions. This special session aims at drawing the attention of researchers towards new challenges posed by the use of AI in multimedia security, including Deepfake generation and its detection, the susceptibility to adversarial attacks, the need for a huge amount of labeled training data, the risk of data overfitting with consequent failures in the presence of unforeseen situations at test time, etc. We plan to put together papers that focus on Deepfake generation and its detection, exploit DL methods for various multimedia security applications, investigate more general problems of the systematic development of secure AI tools, and develop novel approaches capable of overcoming limitations of state-of- the-art DL methods while keeping the superiority of modern AI tools.
The advances of computing techniques, graphics hardware, and networks have witnessed the wide applications of 3D data in various domains, such as 3D graphics, entertainment, medical industry and 3D model design. The proliferation of such applications has led to large scale 3D visual data, while effective 3D processing tools to manipulate these data are still at their infancy. Generally, how to effectively and efficiently percept and understand such 3D visual data has become an urgent but challenging task in recent years. Lidar, monocular and traditional cameras have played important roles on this. To facilitate the applications in practice, it is still important to further exploit advanced and hybrid mechanism on 3D visual perception. Recent years have also witnessed the rapid progress of deep neural networks on 3D visual analysis, such as 3D visual representation, recognition, reconstruction, and content understanding, which have wide applications in unmanned driving, medical diagnosis assistance, and virtual reality. However, there is still a long way towards effective 3D semantic understanding and applications, especially confronting the multi-modal 3D data and complex application scenarios. The primary objective of this special session fosters focused attention on the latest research progress in the area of 3D visual perception and understanding and seeks original contribution of works which addresses the challenges from 3D data acquisition, representation, recognition, semantic analysis and applications in various applications, such as unmanned driving and medical field.
Recent advancements in Deep Neural Networks (DNNs) have raised critical concerns on trusting decisions made by such advanced Machine Learning (ML) models. Explainability of DNNs for image/video processing tasks refers to the revelation of what has compelled the model to make a specific decision. Explainability not only results in improving DNN's performance by knowing what exactly happened inside the network, but also facilitates detecting the failure points of the model. No matter how powerful DNNs are, they will not be used in practice unless they can be interpreted and related to the image landmarks used by humans. Explainable Artificial Intelligence (XAI) becomes even more important for image/video processing in specific application domains such as medical imaging, as not a single mistake can be tolerated. Potential mistakes may lead to irreparable loss or injury, and knowing the logic behind the outcome of the model is the key for image-based prognostics and diagnostics. Although there have been significant advancements in improving explainability of DNNs, it is still only heuristically understood, and further reliable explanations need to be developed. The objective of this special session is to collect novel ideas and experiments on how to enhance explainability of DNNs and solve the black box problem, which is a barrier to make use of these models in real world applications.
Deep learning approaches have been attracted a considerable attention over the last few years in various image and video processing applications. While these approaches have been widely used in some specific fields like classification and recognition, little research has been devoted to image and video coding. The recent deep learning based coding methods have shown promising results and led to significant improvements compared to different compression standards such as JPEG2000 and HEVC. Although the recent progress in deep learning based image and video coding, there are still some challenges that need to be addressed. For example, instead of using the conventional transforms (Discrete Cosine Transform in HEVC, and Discrete Wavelet Transform in JPEG2000), nonlinear transforms should be further investigated. Moreover, while most of the neural-network based coding methods are optimized using mean square error based loss function, other types of loss functions that better reflects the human perception should be considered. This special session aims to address these and other challenges related to the use of deep learning for the design of optimized image and video coding schemes. Topics of interest include (but are not limited to) learning wavelet filters, designing optimized nonlinear transform coding techniques, training with new loss functions, proposing new techniques for intra/inter prediction, rate distortion optimization, and entropy coding.
Medical imaging (MI), in its diversity and richness, has propelled the development of a large pool of automated computer-aided diagnosis pipelines. Such automated solutions have witnessed a technological outburst with the emergence of advanced artificial intelligence (AI) techniques and in particular deep learning (DL). So far, remarkable strides have mapped the field of medical imaging starting with MI data analysis to prediction (e.g., predicting missing imaging modalities) for assisting clinicians in performing an AI-assisted diagnosis and eventually improving disease prognosis. However, existing AI solutions for healthcare using MI still have formidable challenges that remain untackled including model reproducibility across multi-source datasets, scalability to large-scale medical images, generalizability across a wide spectrum of clinical applications, the scarcity of medical images for rare diseases, the noisy and poor quality of clinical imaging datasets —just to name a few. In this special session, we aim to draw the attention of researches towards these unsolved challenges by designing new solutions to specific MI problems that can be reliable for clinical translation. We encourage submissions proposing advanced and novel AI techniques which exploit machine learning and DL methods for a wide variety of MI applications. The designed technical solutions (pipeline) are expected to overcome the limitations of state-of- the-art methods, and offer a reproducible, scalable and generalizable AI tools for MI.
With the advances of Earth observation techniques, people are able to gain much easier access to massive airborne and spaceborne remote sensing data. These data provide us a bird’s eye view to understand our planet. However, with such a huge number of remote sensing images, critical data processing challenges need to be addressed. Manually processing the data is time- and labor-consuming, while a more efficient way is through computer vision and machine learning methods. This special session aims at drawing the attention of researchers toward new challenges posed by the use of artificial intelligence (AI) in remote sensing applications. Topics of interest include, but are not limited to, the following: remote sensing image classification, object detection, semantic segmentation, anomaly detection, change detection, hyperspectral image band selection, dimension reduction, and time series/video analysis. Contributions to this special session should propose advanced AI technologies which explore computer vision and machine/deep learning methods for a wide variety of remote sensing applications. The proposed approaches can be transferred from the processing of other types of images, or methods specifically for remote sensing data. Innovative algorithms which are capable of overcoming the limitations of state-of-the-art machine/deep learning methods when applied on remote sensing images are also welcomed.
Special session papers can be submitted through the normal paper submission system. The deadline for submitting special session papers is 14 February 2021.