When participants engage in a video call, discrepancies between the viewing direction from which a participant views other participants and the displayed view of the other participants can lead to a less than satisfactory experience. This disclosure describes techniques to select a subset of available cameras based on head and/or eye movements of participants in a video call to render a corresponding direct view. Cameras capture images of a viewing user that is viewing a display on which a 3D video of a second user (e.g., generated from images captured by a subset of a plurality of cameras) is displayed. Per techniques of this disclosure, with user permission, the head and/or eye movements of the viewing user are tracked based on the captured images of the viewing user. Relationship between the tracked movements of the viewing user and a view of the display of the device is determined. The view of the first display is then updated to render a 3D video based on a subset of individual cameras of a second user’s device that match the movement (which corresponds to an updated viewing position) of the viewing user. The view on the first display is updated to show the second user from a corresponding perspective. The techniques can provide a more accurate experience during a three-dimensional (3D) video conference. Suitable techniques such as machine learning can be utilized to predict the user’s movement and adjust the view accordingly.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.