The Visual Innovation Award was initiated to recognize pioneers of transformative technologies and business models in areas related to visual processing, visual communications, computer vision, and visual-based applications. The Award showcases innovations that have had great impact on human experiences with technology or are anticipated to do so in the near future. An Award Committee was formed consisting of well-known industrial executives, visionary entrepreneurs, and scholars, to vote for the finalists of the Award, who will be presenting at IEEE ICIP 2016 in a plenary session. Information about the Award Committee can be found here.
Out of the received nominations, seven finalists were selected through voting by the Award Committee. The seven finalists will be competing on-site at IEEE ICIP 2016 for the Visual Innovation Award on Tuesday, September 27th. Conference attendees can vote for the top winner using the IEEE ICIP 2016 App, with the deadline to cast all votes before 4pm on Tuesday, September 27th. The top-voted finalist will be announced and recognized at the IEEE ICIP 2016 Award Dinner & Show that same evening.
The seven finalists selected by the Award Committee to enter the final competition, are listed below (in alphabetical order)
The IEEE ICIP 2016 Innovation Program Chairs arranged a half-day Innovation Program with prominent Keynote Speakers to share with attendees their insights on future innovations. The Speakers are: Hanno Basse (CTO, 20th Century Fox), Bo Begole (VP, Huawei), Achin Bhowmik (VP, Intel), Bill Dally (SVP, nVidia), Michael Antonov (Co-Founder, Oculus), C.–C. Jay Kuo (Dean’s Professor, University of Southern California), John Harding (VP, Google), Tim Milliron (VP, Lytro), Anthony Park (VP, Netflix), Jamie Shotton (Co-Inventor of Kinect, Microsoft), Raj Talluri (SVP, Qualcomm), and Susie Wee (VP and CTO, Cisco). You can download a flyer about finalists and keynotes here.
Vision Innovation Program, IEEE ICIP 2016
September 27 (Tuesday) 12:30-17:10, Phoenix
Plenary Forum – The Impact of Visual Innovations – Award Competition
Time: 12:30-14:55, Sept. 27 (Tuesday)
Chair: Haohong Wang, General Manager, TCL Research America, USA
Keynote Speakers: Michael Antonov, Co-Founder, Oculus, USA Achin Bhowmik, VP, Intel, USA Bill Dally, SVP, nVidia, USA John Harding, VP, Google, USA Tim Milliron, VP, Lytro, USA Anthony Park, VP, Netflix, USA Jamie Shotton, Co-Inventor of Kinect, Microsoft, USA
Bio: Michael’s professional career started after meeting Brendan
at the University of Maryland and co-founding Scaleform, a user interface
technology company for games. As Scaleform CTO, Michael led software
development – working on GPU accelerated vector graphics and integrating
them into 3D engines. By the time Scaleform was sold to Autodesk in 2011,
it was the leading game UI solution, shipping in hundreds of titles.
Michael fell in love with virtual reality when he met Palmer in 2012 and became the Chief Software Architect of Oculus VR. There, he put together the Oculus software team, led development of the DK1/DK2 software stack, and focused on the challenge of stable, low-cost positional tracking, as well as interaction between tracking, sensor fusion, and optimized rendering to achieve lowest latency and the greatest feeling of presence.
Keynote Title: Bringing People Closer Through Virtual Reality
Abstract: Virtual reality lets us experience anything, anywhere. This unique potential makes VR set to become the next major computing platform. Now, with an increased prevalence in 360 cameras, immersive videos and 360 photo experiences are accessible to more consumers around the world. The latest advancements in image capturing hardware and high-end VR headsets—for both PC and Mobile—make it possible for people everywhere to connect in powerful new ways.
Oculus is poised to expand its mission of true immersion and human connectivity over the next several years. Michael Antonov, Chief Software Architect for Oculus, will walk you through some of the technical challenges and solutions we've encountered on our VR journey so far, as well as share some details and thoughts about what's next for VR hardware, its capabilites, and the future for what "social" means for VR.
Bio: Dr. Achin Bhowmik is vice president and general manager of the perceptual computing group at Intel, where he leads the development and deployment of Intel® RealSense™ Technology. His responsibilities include creating and growing new businesses in the areas of interactive computing systems, immersive virtual reality devices, autonomous robots and unmanned aerial vehicles.
Previously, he served as the chief of staff of the personal computing group, Intel’s largest business unit with over $30B revenues. Prior to that, he led the development of advanced video and display processing technologies for Intel’s computing products. His prior work includes liquid-crystal-on-silicon microdisplay technology and integrated electro-optical devices. As an adjunct and guest professor, Dr. Bhowmik has advised graduate research and taught courses at the Liquid Crystal Institute of the Kent State University, Stanford University, University of California, Berkeley, Kyung Hee University, Seoul, and the Indian Institute of Technology, Gandhinagar. He has >100 publications including two books and >100 granted and pending patents. He is a Fellow of the Society for Information Display (SID), and serves on the board of directors for OpenCV, the organization behind the open source computer vision library.
Keynote Title: Intel® RealSenseTM Technology: Adding Human-Like Sensing and Interactions to Devices
Abstract: The world of intelligent and interactive systems is undergoing a revolutionary transformation. With rapid advances in natural sensing and perceptual computing technologies, devices are being endowed with abilities to “sense”, “understand”, and “interact” with us and the physical world. This keynote will describe and demonstrate the Intel® RealSenseTM Technology, which is enabling a new class of applications based on real-time 3D-sensing, including interactive computing devices, autonomous machines such as robots and drones, as well as immersive mixed-reality devices, blurring the border between the real and the virtual words.
Bio: Bill Dally joined NVIDIA in January 2009 as chief scientist, after spending 12 years at Stanford University, where he was chairman of the computer science department. Dally and his Stanford team developed the system architecture, network architecture, signaling, routing and synchronization technology that is found in most large parallel computers today. Dally was previously at the Massachusetts Institute of Technology from 1986 to 1997, where he and his team built the J-Machine and the M-Machine, experimental parallel computer systems that pioneered the separation of mechanism from programming models and demonstrated very low overhead synchronization and communication mechanisms. From 1983 to 1986, he was at California Institute of Technology (CalTech), where he designed the MOSSIM Simulation Engine and the Torus Routing chip, which pioneered “wormhole” routing and virtual-channel flow control. He is a member of the National Academy of Engineering, a Fellow of the American Academy of Arts & Sciences, a Fellow of the IEEE and the ACM, and has received the IEEE Seymour Cray Award and the ACM Maurice Wilkes award. He has published over 200 papers, holds over 50 issued patents, and is an author of two textbooks. Dally received a bachelor's degree in Electrical Engineering from Virginia Tech, a master’s in Electrical Engineering from Stanford University and a Ph.D. in Computer Science from CalTech. He is a cofounder of Velio Communications and Stream Processors.
Keynote Title: GPU Computing from CUDA to Deep Learning
Abstract: The CUDA programming system enables programmers to harness the tremendous computational power of GPUs to a variety of tasks. Enabled by CUDA, GPUs now power the fastest supercomputers in the US and Europe and have enabled the recent revolution in deep learning. This talk will trace the history of CUDA from stream processing research at Stanford to the present.
Bio: John Harding is the VP of Engineering for Emerging Experiences at YouTube where he leads the Engineering efforts for Emerging Markets, Gaming, Kids, Living Room, Music, and VR. He joined YouTube shortly after it was acquired by Google, and has worked on most aspects of the product over the years. Prior to Google, John worked at Microsoft on Internet Explorer and Xbox.
Keynote Title: The Promise of YouTube
Abstract: In 2005, two friends stood in front of the elephant pen at the San Francisco Zoo and filmed the very first clip that would appear on their new video-hosting website. The video was utterly unremarkable: 19 seconds of unsteady footage shot on a camcorder in low definition. But after the video went live on their new site called YouTube, media and entertainment would never be the same. All of a sudden, anyone in the world could share a video with everyone in the world. You didn’t have to audition at a casting call; you didn’t have to pitch a screenplay to an executive; you didn’t have to beam a signal into people’s homes; and you didn’t need a budget. With YouTube, you suddenly had access to free and instant global distribution. With over a billion viewers around the world visiting YouTube every single month, it's taken unbelievable feats of engineering to keep that promise alive. But ever year more content is uploaded (400 hours every single minute), higher quality and more complex formats are supported (4K, HDR, 360º, VR) and more video is served to more users around the world. And we're just getting started.
Bio: Tim Milliron is Vice President of Engineering at Lytro, where he leverages his broad experience in computer graphics and cloud computing to drive engineering across hardware and software.
Tim began his career at Pixar, where he first specialized in large procedural set pieces and character rigging for films like Toy Story 2, Monsters, Inc., and Finding Nemo. He led the characters and crowds group for Cars, which built hundreds of unique characters for the film as well as the software used to animate and simulate them. After Cars, Tim led software development for Pixar’s next-generation character articulation, animation, and simulation systems, first used on Brave and now used studio-wide. Most recently, Tim spent four and a half years at Twilio, serving in senior leadership roles in engineering and product and tackling the scaling challenges of a hypergrowth cloud startup. During his tenure, Tim helped grow Twilio’s team, revenue, and infrastructure more than 10x.
Keynote Title: How Light Field will Revolutionize Image Capture and Playback
Abstract: For over 100 years, imaging technology has been flat. We've captured and displayed the color and brightness of light with a myriad of technologies: film, print, digital, still and moving - but always as a two dimensional images. Light Field technology has the promise to revolutionize this most basic aspect of imaging. Very soon, we will view and capture imagery in three dimensions - capturing the play of light and depth in any scene. In this talk, I will discuss how this transformation from flat to volumetric imaging will impact media, consumer experiences, and ultimately the way we think about imagery itself.
Bio: Anthony Park is VP of Engineering at Netflix and is responsible for video streaming on consumer devices like smart TVs, set-top boxes, phones, laptops, and game consoles. With over 20 years of software engineering experience, he's spent the last eight years at Netflix implementing and improving video streaming on a variety of devices. Recently, Anthony has helped bring innovations like 4K and HDR to millions of Netflix customers around the world. Anthony has a Master of Engineering (MEng) and a Master of Business Administration (MBA) from Arizona State University.
Keynote Title: Netflix - Inventing Internet TV
Abstract: Entertainment and technology are continuing to transform each other as they have been doing for over a hundred years. Netflix has been a pioneer in inventing Internet TV over the last decade. We can now put consumers across the world in the driver’s seat when it comes to how, when, and where they watch. In this talk, I will discuss some of the technology Netflix has brought to millions of consumers to make Internet TV a reality, including new user experiences, video streaming, and personalized recommendations.
Bio: Jamie Shotton leads the Machine Intelligence & Perception group at Microsoft Research Cambridge. He studied Computer Science at the University of Cambridge, where he remained for his PhD in computer vision and machine learning for visual object recognition. He joined Microsoft Research in 2008 where he is now a Principal Researcher. His research focuses at the intersection of computer vision, AI, machine learning, and graphics, with particular emphasis on systems that allow people to interact naturally with computers. He has received multiple Best Paper and Best Demo awards at top academic conferences. His work on machine learning for Kinect was awarded the Royal Academy of Engineering's gold medal MacRobert Award 2011, and he shares Microsoft's Outstanding Technical Achievement Award for 2012 with the Kinect product team. In 2014 he received the PAMI Young Researcher Award, and in 2015 the MIT Technology Review Innovator Under 35 Award ("TR35").
Keynote Title: The Impact of Visual Innovations: Kinect
Abstract: Microsoft launched Kinect for Xbox 360 in 2010. Kinect combined a depth sensing camera with novel machine learning algorithms to bring full-body controller-free motion gaming to the living room. Kinect broadened the world of Xbox gaming to a new consumer audience, and garnered a Guinness World Record for the fastest selling consumer electronics device in history. But the creation of a market for consumer-grade depth cameras has arguably had even greater impact: Kinect and other depth sensors have become indispensable tools in widespread use in companies and research labs across the world. This short talk will both tell the story of the original Kinect product, as well as highlighting some of the exciting advances that Kinect is enabling in 3D scanning, mixed reality, healthcare, and more.
Keynote Talk – Data-Driven Perceptual Coding: A Collaborative Example between Academia and Industry
Time: 15:00-15:20, Sept. 27 (Tuesday)
Chair: Khaled El-Maleh, Sr. Director, Qualcomm, USA
Keynote Speaker: C. C. Jay Kuo, Dean’s Professor, University of Southern California, USA
Bio: Dr. C.-C. Jay Kuo received his Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as Director of the Media Communications Laboratory and Dean’s Professor in Electrical Engineering-Systems. His research interests are in the areas of digital media processing, compression, communication and networking technologies. Dr. Kuo was the Editor-in-Chief for the IEEE Trans. on Information Forensics and Security in 2012-2014. He was the Editor-in-Chief for the Journal of Visual Communication and Image Representation in 1997-2011, and served as Editor for more than 10 other international journals. Dr. Kuo was the recipient of the Electronic Imaging Scientist of the Year Award in 2010 and the holder of the 2010-2011 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies. He also received the USC Associates Award for Excellence in Teaching, the IEEE Computer Society Taylor L. Booth Education Award, the IEEE Circuits and Systems Society John Choma Education Award, and the IS&T Raymond C. Bowman Award in 2016. Dr. Kuo is a Fellow of AAAS, IEEE and SPIE. Dr. Kuo has guided 134 students to their Ph.D. degrees and supervised 25 postdoctoral research fellows. He is a co-author of about 250 journal papers, 900 conference papers and 14 books.
Keynote Title: Data-Driven Perceptual Coding: A Collaborative Example between Academia and Industry
Abstract: There has been a significant progress in image/video coding in the last 50 years, and many visual coding standards have been established, including JPEG, MPEG-1, MPEG-2, H.264/AVC and H.265, in the last three decades. The visual coding research field has reached a mature stage, and the question “is there anything left for image/video coding?” arises in recent years. To address this question, we need to examine the visual coding problem from a new angle – a data driven approach based on human subjective test results. In particular, I will describe a new methodology that uses a just-noticeable-difference (JND) approach to measure the subjective visual experience and takes a statistical approach to characterize joint visual experiences of a test group. This new methodology builds a bridge between the traditional visual coding problem and modern big data analytics. I have collaborated with a couple of companies on solving this problem together, and will use this example to talk about challenges and tips for academia-industry collaboration for visual innovation.
Plenary Forum – The Future of Visual Innovations
Time: 15:50-17:10, Sept. 27 (Tuesday)
Moderator: Jeff Bier, President, BDTI & Embedded Vision Alliance, USA
Bio: Hanno Basse, chief technology officer (CTO) at 20th Century Fox Film Corp., oversees technology strategy and engineering, including home entertainment, theatrical distribution, and postproduction. At Fox, Hanno and his team of engineers are developing new distribution methods, are working on next generation entertainment technologies like High Dynamic Range and Ultra-HD as well as interactive platforms, and are involved with many other initiatives, including Content Protection, Immersive Audio etc. He earlier spent more than 14 years at DIRECTV, ultimately as senior vice president of broadcast systems engineering, with accomplishments including the 2005 successful launch of the largest HD channel rollout to date and the 2009 implementation of DIRECTV’s video-on-demand infrastructure, as well as significant contributions to DIRECTV’s broadcast infrastructure and construction of its Los Angeles Broadcast Center. Hanno began his career in 1991 as a scientist-engineer at the Institut für Rundfunktechnik (IRT) in Munich, Germany, and worked as a systems engineer at ProSieben Media AG, also in Germany. He has been awarded 22 patents and was named a Fellow of the Society of Motion Picture and Television Engineers in 2014. Hanno currently serves as the president and chairman of the board of the UHD Alliance, an organization that brings together major content, consumer electronics and distribution companies with the goal of defining a next generation premium audio-visual experience.
Keynote Title: Market implementation of HDR technology – a case study
Abstract: 20th Century Fox worked with Samsung and other leading CE companies to introduce displays with High Dynamic Range capability into the consumer market. This presentation will discuss how studio and CE representatives collaborated to develop display as well as content mastering requirements. It also describes the benefits of starting such collaboration at a very early stage, in order to ensure that CE product development and the creation of matching content are aligned and products and content are introduced to the market at the same time.
Bio: Dr. Bo Begole is VP and Global Head of Huawei Technologies’ Media Lab whose mission is to create the future of networked media technologies and user experiences through innovations in ultra-high-efficiency compression, computer vision/hearing, augmented/virtual reality, full field communications and personalized, responsive media. Previously, he was a Sr. Director at Samsung Electronics’ User Experience Center America where he directed a team to develop new contextually intelligent services for wearable, mobile and display devices. Prior to that, he was a Principal Scientist and Area Manager at Xerox PARC where he directed the Ubiquitous Computing research program creating behavior-modeling technologies, responsive media and intelligent mobile agents. An inventor of 30 issued patents, he is also the author of Ubiquitous Computing for Business (FT Press, 2011) and dozens of peer-reviewed research papers. Dr. Begole is an ACM Distinguished Scientist, active in many research conferences and was co-Chair of the 2015 ACM conference on human factors in computing systems (CHI 2015) in Seoul, Korea. Dr. Begole received a Ph.D. in computer science from Virginia Tech in 1998.
Keynote Title: Responsive Media in the Future of Thinking Machines
Abstract: Perception and Cognition technologies have evolved to a point where systems need not simply react to user input, so that now systems can proactively deliver personalized media that responds dynamically to the users' attention, engagement and context: Responsive Media. Media experiences will be dramatically changed by the next generation of these technologies embedded into smartphones, VR goggles, robots, smart homes and vehicles so that they not only sense the audience's engagement in real time, but they can also predict disengagement and prevent it by dynamically shifting the content to appeal to an individual's preferences, emotion state and situation. Media technologies no longer simply deliver entertainment: imagine robots that can sense a child's frustration and actively assist in the homework, digital assistants that do not interrupt inappropriately, semi-autonomous vehicles that use media to maximize driver engagement, and other intelligent media experiences. Responsive media will be more like an engaging conversation among humans, rather than just passive consumption. This talk will paint a picture and challenge the audience to identify the remaining technology barriers, architectures, business ecosystems, threats, and yes, killer applications.
Bio: Raj Talluri serves as senior vice president of product management for Qualcomm, where he is currently responsible for managing IoT, mobile computing and Qualcomm Snapdragon Sense ID 3D finger print technology businesses. Prior to this role, he was responsible for product management of Qualcomm Snapdragon application processor technologies. Talluri has more than 20 years of experience spanning across business management, strategic marketing, and engineering management. He has published more than 35 journal articles, papers, and book chapters in many leading electrical engineering publications. Raj Talluri was chosen as No. 5, in Fast Company's list of 100 Most Creative People in business in 2014.
Keynote Title: Future Innovations in Visual Processors for Embedded Vision Applications
Abstract: In the last couple of decades we have seen tremendous advances in processors for visual computing. This has led to an explosion in the use of computer vision in many embedded applications - including self driving cars, virtual reality headsets, smart cameras, autonomous robots etc. This talk will highlight some of the key innovations in the area of visual processors and drill deeper into what future innovations to expect and the impact of these processing innovations on future vision applications.
Bio: Susie is the Vice President and Chief Technology Officer of Networked Experiences and DevNet at Cisco Systems. She is the founder and lead of DevNet, Cisco's developer program, which aims to make the evolving Internet an innovation platform for the developer ecosystem. Susie and her team are developing UX and technology innovations that improve the operational experience, end user experience, and developer experience with the network. They are developing technologies and systems for the Internet of Things, software-defined networking, augmented collaboration and co-creation, and network visualization. Prior to this, Susie was the Vice President and Chief Technology and Experience Officer of Cisco’s Collaboration Technology Group where she was responsible for driving innovation and experience design in Cisco's collaboration products and software services, including unified communications, telepresence, web and video conferencing, and cloud collaboration. Susie received Technology Review’s Top 100 Young Innovators award, ComputerWorld's Top 40 Innovators under 40 award, the INCITs Technical Excellence award, the Women In Technology International Hall of Fame award, and was on the Forbes Most Powerful Women list. She is an IEEE Fellow for her contributions in multimedia technology and has over 50 international publications and over 45 granted patents. Susie received her B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology.
Keynote Title: The Next Wave of Visual Innovation with Visual Microservices and the Internet of Things
Abstract: Visual technologies have made tremendous advances over the last few decades: from QCIF and CIF resolution video conferencing in the 1990s to video streaming and HDTV in the 2000s and to widespread use of mobile video and 4K video, high-dynamic range video, and augmented and virtual reality now in the 2010s. Each of these innovations required technology advancements in the full stack, including video capture and display, compression, streaming, and in the network itself. With a proper deployment of today’s technologies, it is feasible that within the decade the 7 billion people in the world will be able to create and consume video. In parallel to advances in video technologies, there have also been tremendous advancements in the development and deployment of software with open source, app stores, virtualization, devops, and more recently containers and micro services.
The next wave of visual innovation will be driven by the need to capture, deliver, and analyze video that exceeds what 7 billion people can consume. Video is no longer captured just for the purpose of viewing by people, but it will be captured for the purpose of extracting information and making intelligent decisions. With advances in the Internet of Things and machine-to-machine communication, video will be increasingly used for sensing, automation, surveillance, and event detection. Analytics will be used to extract intelligence from captured video streams to make decisions that reach far beyond the video application itself. These applications require the global network to carry and process not billions of video streams, but trillions of video streams. This next chapter of visual innovation once again requires full stack technology advancements and a network architecture that allows video to be captured and processed at the edge of the network, along with an application framework that allows the flexible deployment of visual microservices.