发布于2019-08-06 11:18 阅读(692) 评论(0) 点赞(4) 收藏(0)
python自带的random在不同子进程中会生成不同的种子,而numpy.random不同子进程会fork相同的主进程中的种子。pytorch中的Dataloader类的__getitem__()会在不同子进程中发生不同的torch.seed(),并且种子与多进程的worker id有关(查看worker_init_fn参数说明)。但是三者互不影响,必须独立地处理。因此在写自己的数据准备代码时,如果使用了numpy中的随机化部件,一定要显示地在各个子进程中重新采样随机种子,或者使用pytho中的random发生随机种子。
举例参考:
下面例子中,random.uniform()和np.random.uniform()就不一样。
import numpy as np
import random
from multiprocessing import Pool
def Foo_np(seed=None):
# np.random.seed(seed)
return np.random.uniform(0, 1, 5) # random.uniform
pool = Pool(processes=8)
print np.array(pool.map(Foo_np, xrange(20))) # 和python中的map用法相同
# [[ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.11283279 0.28180632 0.28365286 0.51190168 0.62864241]
# [ 0.11283279 0.28180632 0.28365286 0.51190168 0.62864241]
# [ 0.28917586 0.40997875 0.06308188 0.71512199 0.47386047]
# [ 0.11283279 0.28180632 0.28365286 0.51190168 0.62864241]
# [ 0.64672339 0.99851749 0.8873984 0.42734339 0.67158796]
# [ 0.11283279 0.28180632 0.28365286 0.51190168 0.62864241]
# [ 0.14463001 0.80273208 0.5559258 0.55629762 0.78814652] <-
# [ 0.11283279 0.28180632 0.28365286 0.51190168 0.62864241]]
You can see that groups of up to 8 threads simultaneously forked with the same seed, giving me identical random sequences (I've marked the first group with arrows).
Calling np.random.seed()
within a subprocess forces the thread-local RNG instance to seed itself again from /dev/urandom
or the wall clock, which will (probably) prevent you from seeing identical output from multiple subprocesses. Best practice is to explicitly pass a different seed (or numpy.random.RandomState
instance) to each subprocess, e.g.:
def Foo_np(seed=None):
local_state = np.random.RandomState(seed)
print local_state.uniform(0, 1, 5)
pool.map(Foo_np, range(20))
Pytorch中多个进程加载随机样本Dataloader解决方法:
除了可选择python中的random解决外,
Instead, add this line to the top of your main script (and you need to use python 3)
import torch
import torch.multiprocessing as mp
mp.set_start_method('spawn')
https://discuss.pytorch.org/t/does-getitem-of-dataloader-reset-random-seed/8097/7
作者:就是不给你
链接:https://www.pythonheidong.com/blog/article/8279/cd066b15f8dbb6f1c98f/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!