The present disclosure is generally directed towards systems and methods to provide a more immersive video conferencing experience. More particularly, aspects of the present disclosure include a computer-implemented method for immersive video conferencing. The method can include displaying a plurality of display elements corresponding to a plurality of listening participants on a display device. The method can also include obtaining information indicative of a speaking participant’s gaze. The method can further include determining which of the plurality of display elements the speaking participant is focused on based on the speaking participant’s gaze. Additionally, the method can provide information indicating which of the plurality of listening participants the speaking participant is focused on. A computing system can modify, in response to the information, at least one of: a volume of the speaking participant, a reverberation effect of the speaking participant, or a quality of an audio and/or video stream.

