This paper builds upon the SeqGAN framework, aiming to explore and implement strategic enhancements. SeqGAN facilitates the generation of discrete token sequences, a common requirement in NLP applications, by employing a stochastic policy in reinforcement learning (RL). This approach effectively circumvents the differentiation challenges typically faced by generators by directly applying gradient policy updates. The RL reward mechanism is driven by a GAN discriminator that evaluates complete sequences, facilitating backward integration of feedback to earlier state-action pairs through Monte Carlo search. Key improvements considered include the adoption of WGAN, which employs Earth Mover's distance instead of KL divergence, offering potentially more effective guidance for generator refinement. Further, the application of proximal policy optimization is examined to enhance generator performance.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.