Category Archives: Spark – scala

Spark quick commands – Scala

Save file to HDFS with custom delimiter in Spark: import spark.sql val df = sql(""" select * from test_db.test_table1 """) df.write.format("csv").partitionBy("year", "month").mode('overwrite').option("delimiter", "|").save("/user/cloudera/project/workspace/test/test_table1")

Spark – Hadoop – Programs in Scala

Lets say our input file contains below data: $ cat sample1.txt 1,aa 2,bb 3,cc Upload this file to hdfs. $hadoop fs -put sample1.txt /user/puneetha/ Program – 1: Read the text file from HDFS. $spark-shell scala> val myfile = sc.textFile("sample1.txt") 14/09/12 16:23:19 INFO MemoryStore: ensureFreeSpace(126588) called with curMem=0, maxMem=308713881 14/09/12 16:23:19 INFO MemoryStore: Block broadcast_0 stored… Read More »