This paper builds upon the SeqGAN framework, aiming to introduce targeted enhancements to improve its efficacy. SeqGAN, a pivotal algorithm for generating discrete token sequences, is extensively utilized across various NLP applications. By conceptualizing the data generator as a stochastic policy within the domain of reinforcement learning (RL), SeqGAN adeptly sidesteps the challenges associated with generator differentiation through direct gradient policy updates. The RL reward signal is derived from the evaluations of a GAN discriminator assessing complete sequences, subsequently informing state-action decisions via Monte Carlo search. Notable enhancements include the incorporation of WGAN, employing Earth Mover’s distance rather than KL divergence for better generator guidance. Additionally, the adoption of proximal policy optimization is explored to further refine generator performance.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.