In a telepresence scenario with remote users discussing a document, it can be difficult to follow which parts are being discussed. One way to address this is by showing the user’s hand position on the document, which also enables expressive gestural communication. An important practical problem is how to capture and transmit the hand movements efficiently with high resolution document images. We propose a tabletop system with two channels that integrates document capture with a 4K video camera and hand tracking with a webcam, in which the document image and hand skeleton data are transmitted at different rates and handled by a lightweight Web browser client at remote sites. To enhance the rendering, we employ velocity based smoothing and ephemeral motion traces. We tested our prototype over long distances from USA to Japan and to Italy, and report on latency and jitter performance. Our system achieves relatively low latency over a long distance in comparison with a tele-immersive system that transmits mesh data over much shorter distances.