Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

Jae Shin Yoon; Kihwan Kim; Orazio Gallo; Hyun Soo Park; Jan Kautz

doi:10.1109/CVPR42600.2020.00538

Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

Jae Shin Yoon, Kihwan Kim, Orazio Gallo, Hyun Soo Park, Jan Kautz

Computer Science and Engineering

Research output: Contribution to journal › Conference article › peer-review

62 Scopus citations

Abstract

This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete. Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents. We cast this problem as learning to correct the scale of DSV, and to refine each depth with locally consistent motions between views to form a coherent depth estimation. We integrate these tasks into a depth fusion network in a self-supervised fashion. Given the fused depth maps, we synthesize a photorealistic virtual view in a specific location and time with our deep blending network that completes the scene and renders the virtual view. We evaluate our method of depth estimation and view synthesis on diverse real-world dynamic scenes and show the outstanding performance over existing methods.

Original language	English (US)
Article number	9156445
Pages (from-to)	5335-5344
Number of pages	10
Journal	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs	https://doi.org/10.1109/CVPR42600.2020.00538
State	Published - 2020
Event	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: Jun 14 2020 → Jun 19 2020

Bibliographical note

Funding Information:
Acknowledgement This work was partly supported by the NSF under IIS 1846031 and CNS 1919965.

Publisher Copyright:
© 2020 IEEE

Access

10.1109/CVPR42600.2020.00538

OpenUrl availability

Full text

Cite this

@article{a8dae508ec764934b496a987b1c2a6eb,

title = "Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera",

abstract = "This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete. Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents. We cast this problem as learning to correct the scale of DSV, and to refine each depth with locally consistent motions between views to form a coherent depth estimation. We integrate these tasks into a depth fusion network in a self-supervised fashion. Given the fused depth maps, we synthesize a photorealistic virtual view in a specific location and time with our deep blending network that completes the scene and renders the virtual view. We evaluate our method of depth estimation and view synthesis on diverse real-world dynamic scenes and show the outstanding performance over existing methods.",

author = "Yoon, {Jae Shin} and Kihwan Kim and Orazio Gallo and Park, {Hyun Soo} and Jan Kautz",

note = "Funding Information: Acknowledgement This work was partly supported by the NSF under IIS 1846031 and CNS 1919965. Publisher Copyright: {\textcopyright} 2020 IEEE; 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 ; Conference date: 14-06-2020 Through 19-06-2020",

year = "2020",

doi = "10.1109/CVPR42600.2020.00538",

language = "English (US)",

pages = "5335--5344",

journal = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

issn = "1063-6919",

publisher = "IEEE Computer Society",

}

TY - JOUR

T1 - Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

AU - Yoon, Jae Shin

AU - Kim, Kihwan

AU - Gallo, Orazio

AU - Park, Hyun Soo

AU - Kautz, Jan

PY - 2020

Y1 - 2020

N2 - This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete. Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents. We cast this problem as learning to correct the scale of DSV, and to refine each depth with locally consistent motions between views to form a coherent depth estimation. We integrate these tasks into a depth fusion network in a self-supervised fashion. Given the fused depth maps, we synthesize a photorealistic virtual view in a specific location and time with our deep blending network that completes the scene and renders the virtual view. We evaluate our method of depth estimation and view synthesis on diverse real-world dynamic scenes and show the outstanding performance over existing methods.

AB - This paper presents a new method to synthesize an image from arbitrary views and times given a collection of images of a dynamic scene. A key challenge for the novel view synthesis arises from dynamic scene reconstruction where epipolar geometry does not apply to the local motion of dynamic contents. To address this challenge, we propose to combine the depth from single view (DSV) and the depth from multi-view stereo (DMV), where DSV is complete, i.e., a depth is assigned to every pixel, yet view-variant in its scale, while DMV is view-invariant yet incomplete. Our insight is that although its scale and quality are inconsistent with other views, the depth estimation from a single view can be used to reason about the globally coherent geometry of dynamic contents. We cast this problem as learning to correct the scale of DSV, and to refine each depth with locally consistent motions between views to form a coherent depth estimation. We integrate these tasks into a depth fusion network in a self-supervised fashion. Given the fused depth maps, we synthesize a photorealistic virtual view in a specific location and time with our deep blending network that completes the scene and renders the virtual view. We evaluate our method of depth estimation and view synthesis on diverse real-world dynamic scenes and show the outstanding performance over existing methods.

UR - http://www.scopus.com/inward/record.url?scp=85094319041&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85094319041&partnerID=8YFLogxK

U2 - 10.1109/CVPR42600.2020.00538

DO - 10.1109/CVPR42600.2020.00538

M3 - Conference article

AN - SCOPUS:85094319041

SN - 1063-6919

SP - 5335

EP - 5344

JO - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

JF - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

M1 - 9156445

T2 - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020

Y2 - 14 June 2020 through 19 June 2020

ER -

Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera

Abstract

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this