I used DeepLearning4j to train word2vec model. Then I had to save the dictionary to CSV so I can run some clustering algorithms on it.
Sounded like a simple task, but it took a while, and here is the code to do this:
private void writeIndexToCsv(String csvFileName, Word2Vec model) {
CSVWriter writer = null;
try {
writer = new CSVWriter(new FileWriter(csvFileName));
} catch (IOException e) {
e.printStackTrace();
}
VocabCache<VocabWord> vocCache = model.vocab();
Collection<VocabWord> wrds = vocCache.vocabWords();
for (VocabWord w : wrds) {
String s = w.getWord();
System.out.println("Looking into the word:");
System.out.println(s);
StringBuilder sb = new StringBuilder();
sb.append(s).append(",");
double[] wordVector = model.getWordVector(s);
for (int i = 0; i < wordVector.length; i++) {
sb.append(wordVector[i]).append(",");
}
writer.writeNext(sb.toString().split(","), false);
}
try {
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}