Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera

Jae Shin Yoon; Duygu Ceylan; Tuanfeng Y. Wang; Jingwan Lu; Jimei Yang; Zhixin Shu; Hyun Soo Park

doi:10.1109/CVPR52688.2022.00340

Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera

Jae Shin Yoon, Duygu Ceylan, Tuanfeng Y. Wang, Jingwan Lu, Jimei Yang, Zhixin Shu, Hyun Soo Park

Computer Science and Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

2 Scopus citations

Abstract

Appearance of dressed humans undergoes a complex geometric transformation induced not only by the static pose but also by its dynamics, i.e., there exists a number of cloth geometric configurations given a pose depending on the way it has moved. Such appearance modeling conditioned on motion has been largely neglected in existing human rendering methods, resulting in rendering of physically implausible motion. A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. In this paper, we present a compact motion representation by enforcing equivariance - a representation is expected to be transformed in the way that the pose is transformed. We model an equivariant encoder that can generate the generalizable representation from the spatial and temporal derivatives of the 3D body surface. This learned representation is decoded by a compositional multi-task decoder that renders high fidelity time-varying appearance. Our experiments show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.

Original language	English (US)
Title of host publication	Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Publisher	IEEE Computer Society
Pages	3397-3407
Number of pages	11
ISBN (Electronic)	9781665469463
DOIs	https://doi.org/10.1109/CVPR52688.2022.00340
State	Published - 2022
Event	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States Duration: Jun 19 2022 → Jun 24 2022

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2022-June
ISSN (Print)	1063-6919

Conference

Conference	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/Territory	United States
City	New Orleans
Period	6/19/22 → 6/24/22

Bibliographical note

Funding Information:
We would like to thank Julien Philip for providing useful feedback on our paper draft. Jae Shin Yoon is supported by Doctoral Dissertation Fellowship from University of Minnesota. This work is partially supported by NSF CNS-1919965.

Publisher Copyright:
© 2022 IEEE.

Keywords

3D from single images
Image and video synthesis and generation

Access

10.1109/CVPR52688.2022.00340

OpenUrl availability

Full text

Cite this

Yoon, J. S., Ceylan, D., Wang, T. Y., Lu, J., Yang, J., Shu, Z., & Park, H. S. (2022). Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 (pp. 3397-3407). (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June). IEEE Computer Society. https://doi.org/10.1109/CVPR52688.2022.00340

Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera. / Yoon, Jae Shin; Ceylan, Duygu; Wang, Tuanfeng Y. et al.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society, 2022. p. 3397-3407 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2022-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yoon, JS, Ceylan, D, Wang, TY, Lu, J, Yang, J, Shu, Z & Park, HS 2022, Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera. in Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2022-June, IEEE Computer Society, pp. 3397-3407, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, United States, 6/19/22. https://doi.org/10.1109/CVPR52688.2022.00340

Yoon JS, Ceylan D, Wang TY, Lu J, Yang J, Shu Z et al. Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society. 2022. p. 3397-3407. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR52688.2022.00340

Yoon, Jae Shin ; Ceylan, Duygu ; Wang, Tuanfeng Y. et al. / Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera. Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022. IEEE Computer Society, 2022. pp. 3397-3407 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{557bc7c70e024e3798a2ba28462a9c97,

title = "Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera",

abstract = "Appearance of dressed humans undergoes a complex geometric transformation induced not only by the static pose but also by its dynamics, i.e., there exists a number of cloth geometric configurations given a pose depending on the way it has moved. Such appearance modeling conditioned on motion has been largely neglected in existing human rendering methods, resulting in rendering of physically implausible motion. A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. In this paper, we present a compact motion representation by enforcing equivariance - a representation is expected to be transformed in the way that the pose is transformed. We model an equivariant encoder that can generate the generalizable representation from the spatial and temporal derivatives of the 3D body surface. This learned representation is decoded by a compositional multi-task decoder that renders high fidelity time-varying appearance. Our experiments show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.",

keywords = "3D from single images, Image and video synthesis and generation",

author = "Yoon, {Jae Shin} and Duygu Ceylan and Wang, {Tuanfeng Y.} and Jingwan Lu and Jimei Yang and Zhixin Shu and Park, {Hyun Soo}",

note = "Funding Information: We would like to thank Julien Philip for providing useful feedback on our paper draft. Jae Shin Yoon is supported by Doctoral Dissertation Fellowship from University of Minnesota. This work is partially supported by NSF CNS-1919965. Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 ; Conference date: 19-06-2022 Through 24-06-2022",

year = "2022",

doi = "10.1109/CVPR52688.2022.00340",

language = "English (US)",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "3397--3407",

booktitle = "Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022",

}

TY - GEN

T1 - Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera

AU - Yoon, Jae Shin

AU - Ceylan, Duygu

AU - Wang, Tuanfeng Y.

AU - Lu, Jingwan

AU - Yang, Jimei

AU - Shu, Zhixin

AU - Park, Hyun Soo

N1 - Funding Information: We would like to thank Julien Philip for providing useful feedback on our paper draft. Jae Shin Yoon is supported by Doctoral Dissertation Fellowship from University of Minnesota. This work is partially supported by NSF CNS-1919965. Publisher Copyright: © 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - Appearance of dressed humans undergoes a complex geometric transformation induced not only by the static pose but also by its dynamics, i.e., there exists a number of cloth geometric configurations given a pose depending on the way it has moved. Such appearance modeling conditioned on motion has been largely neglected in existing human rendering methods, resulting in rendering of physically implausible motion. A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. In this paper, we present a compact motion representation by enforcing equivariance - a representation is expected to be transformed in the way that the pose is transformed. We model an equivariant encoder that can generate the generalizable representation from the spatial and temporal derivatives of the 3D body surface. This learned representation is decoded by a compositional multi-task decoder that renders high fidelity time-varying appearance. Our experiments show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.

AB - Appearance of dressed humans undergoes a complex geometric transformation induced not only by the static pose but also by its dynamics, i.e., there exists a number of cloth geometric configurations given a pose depending on the way it has moved. Such appearance modeling conditioned on motion has been largely neglected in existing human rendering methods, resulting in rendering of physically implausible motion. A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations. In this paper, we present a compact motion representation by enforcing equivariance - a representation is expected to be transformed in the way that the pose is transformed. We model an equivariant encoder that can generate the generalizable representation from the spatial and temporal derivatives of the 3D body surface. This learned representation is decoded by a compositional multi-task decoder that renders high fidelity time-varying appearance. Our experiments show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.

KW - 3D from single images

KW - Image and video synthesis and generation

UR - http://www.scopus.com/inward/record.url?scp=85141815615&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85141815615&partnerID=8YFLogxK

U2 - 10.1109/CVPR52688.2022.00340

DO - 10.1109/CVPR52688.2022.00340

M3 - Conference contribution

AN - SCOPUS:85141815615

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 3397

EP - 3407

BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

PB - IEEE Computer Society

T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022

Y2 - 19 June 2022 through 24 June 2022

ER -

Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this