Sonny不讀不行: Scala Parallel Programming筆記 4 - Parallel Sorting and Parallel Collection

Parallel Merge sort

1. 把array xs分成兩等分，每個等分recursively merge sort parallelly
2. sequentially merge to a auxilliary buffer ys
3. copy y to destination

sorting的implementation如下：

// Without the merge step, the sort phase parallelizes almost linearly.// This is because the memory pressure is much lower than during copying in the third step.def sort(from: Int, until: Int, depth: Int): Unit = {
  if (depth == maxDepth) {
    quickSort(xs, from, until - from)
  } else {
    val mid = (from + until) / 2    val right = task {
      sort(mid, until, depth + 1)
    }
    sort(from, mid, depth + 1)
    right.join()

    val flip = (maxDepth - depth) % 2 == 0    val src = if (flip) ys else xs
    val dst = if (flip) xs else ys
    merge(src, dst, from, mid, until)
  }
}
sort(0, xs.length, 0)

他用個depth int來控制recursion深度，在base case用了quicksort，因為base case可能還有相當的size，所以不能隨便用個bubble sort。

另外比較奇怪的是，他為了要有效率使用auxilliary array，變成要把xs ys 交替使用？這邊有點看不懂為何要這樣做。

這個平行版本的mergesort約比sequential快兩倍。

Collection parallelism

對一個collection做平行運算是很多時候必須面對的，在scala中，list的implementation不適合parallelism，因為找到middle element以及merge 2 lists都需要linear time。

collection mapping:

Parallel Map

map有兩個特性有助於parallelism:

function composition看起來就是一個可以拆解的點。

list的map的sequential definition如下：

這不適合parallelism，原因就是linear time的search middle以及merge。

如果用array? array可以O(1) time access memory:
sequential array map版本：

parallel版本跟之前類似：

這邊要注意threshold的值要夠大，要不然parallelism帶來的overhead將大於節省的時間，特別是當f的computation相當簡單的時候。

parallel map也可以作用在tree collection上，得到的加速效果跟array差不多：

parallelism on tree vs array的優缺點：

array可以快速access memory，而且memory位置集中，加速access時間，缺點是要小心寫道shared memory，以及merge results花費時間

immutable tree可以快速merge subtrees，不用擔心寫到shared memory，缺點就是memory overhead大，位置不集中，拖慢access時間。

Parallel Fold ＆ Reduce

fold很容易忘記的定義：

initial value的位置不是在極左(fold left)，就是在極右(fold right)，這樣就容易記了。

reduce是沒有initial value的fold:

要做parallel版本的fold/reduce，就要考慮operator associativity，也就是結合律，也就是
(a op b) op c = a op (b op c)

所以加法可以，減法就不行。associativity可以讓我們拆開operation順序來情形作業，否則就只能sequential（例如要等到a-b結果出來才能去減c)。

association relationship可以用tree來表示：

Reduce a Tree

假設有一個tree定義如下：

reduce的sequential implementation如下：

parallel版本如下：

associative operators保證parallelism會得到正確的結果，因為對每個associative operator（視為node) 組成的tree來說，不管怎樣的rotate，經過reduce / fold之後的結果都一樣，而且任何tree structure都能轉變成list-like：

所以把tree轉換成list，可以得到一個ordering，如果兩個tree轉換成list得到的ordering一樣的話，reduce/fold的結果也會一樣。

Sonny不讀不行

code

2017年7月28日星期五

Scala Parallel Programming筆記 4 - Parallel Sorting and Parallel Collection

Parallel Merge sort

Collection parallelism

Parallel Map

Parallel Fold ＆ Reduce

Reduce a Tree

沒有留言:

張貼留言

code

2017年7月28日 星期五

Scala Parallel Programming筆記 4 - Parallel Sorting and Parallel Collection

Parallel Merge sort

Collection parallelism

Parallel Map

Parallel Fold ＆ Reduce

Reduce a Tree

沒有留言:

張貼留言

2017年7月28日星期五