Book a meeting
Close

Contacts

31 August 1989 Street 76,
Chișinău, Republic of Moldova

hello@cyberwhale.tech

Remove duplicate lines from a file in Scala

Remove duplicate lines from a file in Scala

How to remove duplicate lines from csv or txt file?

The answer is quite straightforward: you basically need BufferedReader and BufferedWriter, and this also works for large files quite well.

def removeDuplicatesFromFile(fileName : String) {
    val reader = new BufferedReader(new FileReader(fileName))
    val lines = new mutable.HashSet[String]()
    var line: String = null
    while ({line = reader.readLine; line != null}) {
      lines.add(line)
    }
    reader.close

    val writer = new BufferedWriter(new FileWriter(fileName))
    for (unique <- lines) {
      writer.write(unique)
      writer.newLine()
    }
    writer.close
}