A user wants to perform tf-idf on a very large dataset and want to make a one column in csv format that will contain each term with its tfidf, in non-decreasing. How to do that?

2.7K Asked by shivangiMehta in Data Science , Asked on Dec 26, 2019

The above code works only in small size but crashes in large document.

For solving this problem, we should not coerce the TDM to a matrix. That will most likely cause an integer overflow issue with so many documents. The tm package uses the slam package to represent the tdm/dtm's. It has some functions for doing row- or column-wise operations without having to coerce to dense matrix.

The following code should work to fix the problem

A user wants to perform tf-idf on a very large dataset and want to make a one column in csv format that will contain each term with its tfidf, in non-decreasing. How to do that?

Your Answer