This was (is) my attempt on building a Transformer Architecture (almost) from scratch using PyTorch, and training it on the entirety of shakespeare's work.
after resolving all the bugs and getting the model to actually train, I achieved a Cross Entropy loss of 1.847, which is not that bad, but still not low enough. The text generated by this model looked like the following:
""" Her ofere their wret corns own to reall.
PRISOMPERD: Thy beaars fratheres back; when, he should dreadson, And, Man long-how I headful to; if retlesive, All boty thou to the dive him to are touch the eed. How is for here is mine feal seend feellf, To him offor doud shall, the seempls wo do musp; be hind; But mother will stick my self.
GRET:
No, my lob, as forwell on myselver: Well, I'll deerelly so sending we would fext: Thou art it Marcined, evemberlause I award you tellfs op this shand at vo """
Coming Soon...