This paper expands on the framework of SeqGAN to explore enhancements within sequence generation using reinforcement learning and GANs. SeqGAN, designed for generating discrete token sequences predominantly used in natural language processing, operates by modeling the data generator as a stochastic policy. By utilizing reinforcement learning techniques, it avoids the differentiation problems typically faced by generators, directly performing gradient policy updates. Notably, the reward signal originates from a GAN discriminator based on a complete sequence assessment, integrating feedback through Monte Carlo search to intermediate state-action steps. We explore further enhancements with WGAN, utilizing Earth Mover’s distance to potentially refine guidance for the generator over traditional methods such as KL divergence. Additionally, proximal policy optimization is considered to potentially enhance generator performance significantly.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.