Network penetration testing involves experienced techniques that require consideration of environment specific parameters and planning of conduct. Penetration testers should focus on novel vulnerabilities and spend their attention to interrelations regarding possible threats and risks to not lose time on repeating tasks. Reinforcement Learning (RL) is the key approach to make autonomous penetration testing practically applicable inside real-world computer networks. The literature describes attack path generation with a priori knowledge about the environment, simulation-only approaches without applicability to real-world computer networks or emulation-only approaches with no RL integration. This thesis optimizes, trains and evaluates RL agents for four benchmark scenarios with increasing size, complexity and heterogeneity of hosts, and a Proof of Concept (PoC) demonstrates the transferability of a simulation environment into an emulation environment. Creating a realistic emulation environment in which RL agents can apply their learned knowledge from the fast simulation environment allows delegation of repeatable tasks to the learned agent and let penetration testers focus on novel and individual aspects of the target network.