A user used the MySentences class for extracting sentences from all files in a directory and used these sentences for training a word2vec model?

1.1K Asked by SumikoLacoste in Data Science , Asked on Dec 23, 2019

My dataset is unlabeled. Below is the code

class MySentences(object):

def __init__(self, dirname):

self.dirname = dirname

def __iter__(self):

for fname in os.listdir(self.dirname):

for line in open(os.path.join(self.dirname, fname)):

yield line.split()

sentences = MySentences('wos_abstracts') # a memory-friendly iterator

model = gensim.models.Word2Vec(sentences)

But he gets the following error

This problem can be solved by a new function TaggedLineDocument which is updated in the library, added to transform sentences to vectors.

Now we can train the model

Your Answer