We present a system for capturing ink strokes written with ordinary pen and paper using a fast camera with a frame rate comparable to a stylus digitizer. From the video frames, ink strokes are extracted and used as input to an online handwriting recognition engine. A key component in our system is a pen up/down detection model for detecting the contact of the pen-tip with the paper in the video frames. The proposed model consists of feature representation with convolutional neural networks and classification with a recurrent neural network. We also use a high speed tracker with kernelized correlation filters to track the pen-tip. For training and evaluation, we collected labeled video data of users writing English and Japanese phrases from public datasets, and we report on character accuracy scores for different frame rates in the two languages.