Python

text = sc.textFile(“hdfs/nfs/filename.txt”,minPartitions=10, useUnicode)

UTF8 is faster and more compact. By default, uniCode is true. If we are certain that we dont have unicode characters, then, we should set useUnicode=false.

numbers=sc.parallelize(xrange(20))
numbers.collect()   pulls all data to driver program
numbers.sum()
numbers.take(3)
numbers.saveAsTextFile(“ourNumbers.out”)

small_numbers=sc.parallelize(xrange(5))
numbers.toDebugString()
combined = numbers.union(small_numbers)

numbers=sc.parallelize(xrange(1000))
result=input.map(lambda x: x**5)
result.saveAsTextFile(“firstFormat.txt”)
result.map(lambda x: “number:” +x).saveAsTextFile(“secondFromat.txt”)

languagy

jo reggelt  ( pronounce as yoo reggelt )

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!