So. I found a Solution. Not the best but I think that's the only available.
I managed to send an ajax event to my application whenever the user is playing a media on the conversation (video or audio) and while the media is playing I am putting the new messages on a queue so when the media stops playing I insert the newly arrived messages and scroll down.
If anyone found a better solution for this, please let me know.
Here is the code I inject when a audio message arrives
data = "<audio id="+iid+" src=\"data:audio/" + ext.LowerCase() + ";base64, " + b64 + "\" width=\"90%\" controls>" + fnameOnly + "</audio>"
data += "<script> let aud = document.getElementById(\""+iid+"\");";
data += "aud.onplaying = function() { ajaxRequest("+compName+", \"MediaPlaying\",[]); };";
data += "aud.onended = function() { ajaxRequest("+compName+", \"MediaEnded\",[]);};";
data += "aud.onpause = function() { ajaxRequest("+compName+", \"MediaPaused\",[]);};";
data += "</script>";
iid is generated by CreateGUID.