A user used the MySentences class for extracting sentences from all files in a directory and used these sentences for training a word2vec model?
My dataset is unlabeled. Below is the code
class MySentences(object):
def __init__(self, dirname):
self.dirname = dirname
def __iter__(self):
for fname in os.listdir(self.dirname):
for line in open(os.path.join(self.dirname, fname)):
yield line.split()
sentences = MySentences('wos_abstracts') # a memory-friendly iterator
model = gensim.models.Word2Vec(sentences)
But he gets the following error
This problem can be solved by a new function TaggedLineDocument which is updated in the library, added to transform sentences to vectors.
Now we can train the model