To research the security impact of side-channel keylogging attacks, we need suitable datasets containing the sensor data and the pressed keys. However, when our side-channel targets the user through acceleration, EMG, or other wearable sensors, we might want additional ground truth about the users’ activity, e.g., a representation of which finger was used to type a certain key. This data makes it possible to directly correlate the sensor readings with the activity that caused them, which could help develop more accurate and robust keylogging models. Previous work in this area focused more on stand-alone virtual input devices that do not reflect real-world keyboards or require expensive motion tracking hardware to track finger positions. In this thesis, we design, implement and evaluate a system that can infer finger usage from a monocular RGB video of a user typing on an unmodified keyboard. Our evaluation shows that our implementation can accurately label the hand usage for over 96 % of keystrokes and the finger usage for over 97 % of keystrokes. As such, our system can be a helpful aid in the creation of new datasets for research into keylogging side-channels.