程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

Processing the same array, dask.array is too slow compared to numpy.array

发布于2025-01-05 09:15     阅读(880)     评论(0)     点赞(0)     收藏(3)


import BWStest as bws
import numpy as np
from skimage.measure import label
import dask.array
from tqdm import tqdm
CalWin = [7,25]
stack = []
thershold = 0.05
for i in range(5):
  image = np.random.rand(3000, 4000)
  dask_image = dask.array.from_array(image, chunks=(1000, 1000))
  stack.append(dask_image)
stacks = dask.array.stack(stack, axis=0)

npages,nlines,nwidths = np.shape(stacks)
RadiusRow=(CalWin[0] - 1) // 2
RadiusCol=(CalWin[1] -1 ) // 2
InitRow=(CalWin[0] + 1) // 2
InitCol=(CalWin[1] + 1) // 2
mlistack = dask.array.pad(dask.array.abs(stacks), ((0, 0), (RadiusRow, RadiusRow), (RadiusCol, RadiusCol)), mode='symmetric').compute()
meanmli = dask.array.mean(mlistack, axis=0)
nlines_EP,nwidths_EP = meanmli.shape
del meanmli
PixelInd = np.zeros((CalWin[0] * CalWin[1],nlines*nwidths), dtype=np.bool)
num = 0
compar_pixel = np.zeros((CalWin[0] * CalWin[1],nlines*nwidths))
for kk in tqdm(range(InitCol, nwidths_EP-RadiusCol), desc="Processing columns", ascii=' ='):
    for ll in tqdm(range(InitRow, nlines_EP-RadiusRow), desc="Processing rows", 
                  leave=False, ascii=' ='):
        Matrix = mlistack[:,ll-RadiusRow-1:ll+RadiusRow,kk-RadiusCol-1:kk+RadiusCol]
        # print(Matrix)
        Ref = Matrix[:,InitRow-1,InitCol-1]
        Xarray = np.tile(Ref[:, np.newaxis], (1, CalWin[0] * CalWin[1]))
        Matrix_T = np.transpose(Matrix, (0, 2, 1))
        Yarray = np.reshape(Matrix_T, (Matrix_T.shape[0], CalWin[0] * CalWin[1]))
        T = bws.BWS(Xarray, Yarray, thershold)
        SeedPoint = np.transpose(np.reshape(~T, (CalWin[1], CalWin[0])))
        LL = label(SeedPoint, 2)
        LL_flat = np.transpose(LL).flatten()
        compar_pixel[:,num] = (LL_flat == LL[InitRow - 1, InitCol - 1])
        PixelInd[:,num] = (LL_flat == LL[InitRow - 1, InitCol - 1])
        num = num + 1

the processing speed, it is probably 2400

dask version:

import BWStest as bws
import numpy as np
from skimage.measure import label
import dask.array
from tqdm import tqdm
from dask.distributed import Client, LocalCluster, progress
CalWin = [7,25]
stack = []
thershold = 0.05
for i in range(5):

    image = np.random.rand(3000, 4000)
    dask_image = dask.array.from_array(image, chunks=(1000, 1000))
    stack.append(dask_image) 
stacks = dask.array.stack(stack, axis=0)
npages,nlines,nwidths = np.shape(stacks)
RadiusRow=(CalWin[0] - 1) // 2
RadiusCol=(CalWin[1] -1 ) // 2
InitRow=(CalWin[0] + 1) // 2
InitCol=(CalWin[1] + 1) // 2
mlistack = dask.array.pad(dask.array.abs(stacks), ((0, 0), (RadiusRow, RadiusRow), (RadiusCol, RadiusCol)),mode='symmetric').compute()
meanmli = dask.array.mean(mlistack, axis=0)
nlines_EP,nwidths_EP = meanmli.shape
del meanmli

PixelInd = dask.array.zeros((CalWin[0] * CalWin[1],nlines*nwidths),
                                  chunks=(CalWin[0] * CalWin[1],2048*nlines),dtype=np.bool)
num = 0
compar_pixel = dask.array.zeros((CalWin[0] * CalWin[1],nlines*nwidths),
                                  chunks=(CalWin[0] * CalWin[1],2048*nlines),dtype=np.bool)
for kk in tqdm(range(InitCol, nwidths_EP-RadiusCol), desc="Processing columns",ascii=' ='):
    for ll in tqdm(range(InitRow, nlines_EP-RadiusRow), desc="Processing rows",leave=False, ascii=' ='):
         Matrix = mlistack[:,ll-RadiusRow-1:ll+RadiusRow,kk-RadiusCol-1:kk+RadiusCol]
         Ref = Matrix[:,InitRow-1,InitCol-1]
         Xarray = dask.array.tile(Ref[:, np.newaxis], (1, CalWin[0] * CalWin[1]))
         Matrix_T = dask.array.transpose(Matrix, (0, 2, 1))
         Yarray = dask.array.reshape(Matrix_T, (Matrix_T.shape[0], CalWin[0] * 
         CalWin[1]))
         T = dask.array.from_array(bws.BWS(Xarray.compute(), Yarray.compute(), thershold))
         SeedPoint = dask.array.transpose(dask.array.reshape(~T, (CalWin[1], CalWin[0])))
         LL = dask.array.from_array(label(SeedPoint.compute(), 2))
         LL_flat = dask.array.Array.flatten(dask.array.transpose(LL))
         compar_pixel[:,num] = (LL_flat == LL[InitRow - 1, InitCol - 1])
         PixelInd[:,num] = (LL_flat == LL[InitRow - 1, InitCol - 1])
         num = num + 1

the processing speed, it is probably 40

From the results, it appears that dask processing is very slow can you give me some suggestion,thank you very much this is the bws function

import numpy as np
from scipy.stats import rankdata
def BWS(Xarrar, Yarray, threshold):
  n, m = Xarrar.shape
  rank = rankdata(np.vstack((Xarrar, Yarray)), axis=0)
  xrank = np.sort(rank[0:n,:], axis=0)
  yrank = np.sort(rank[n::,:], axis=0)
  temp = np.arange(1, n+1)[:,np.newaxis]*np.ones((1,m))
  tempx = (xrank - 2.0*temp) ** 2
  tempy = (yrank - 2.0*temp) ** 2
  temp  = temp/(n+1) * (1-temp/(n+1)) * 2 * n 
  BX    = 1/n * np.sum(tempx / temp, axis=0)
  BY    = 1/n * np.sum(tempy / temp, axis=0)
  # test statistic
  B = 1/2*(BX + BY)
  if threshold ==0.05:
    if n == 5:
        b = 2.533
    elif n == 6:
        b = 2.552
    elif n == 7:
        b = 2.620
    elif n == 8:
        b = 2.564   
    elif n == 9:
        b = 2.575     
    elif n == 10:
        b = 2.583
    else:
        b = 2.493
  else:
    b = 3.880

  H = (B >=b)
  H = np.transpose(H)
  return H

As I will be dealing with very large matrices in the future, I have to use dask to store data. However, as a beginner in dask, I want to use dask to achieve the same processing speed as numpy


解决方案


暂无回答



所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2046782/8019e595facd5cdb26fc/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

0 0
收藏该文
已收藏

评论内容:(最多支持255个字符)