发布于2019-08-24 15:10 阅读(389) 评论(0) 点赞(30) 收藏(4)
在早期的一个帖子之后,我把我的问题归结为它的问题,在从Perl脚本迁移到Python的过程中,我发现在Python中使用slurping文件存在巨大的性能问题。在Ubuntu Server上运行它。
注意:这不是X和Y线程,我需要从根本上了解它是如何实现的,或者我是在做一些愚蠢的事情。
我创建了我的测试数据,50,000个10kb文件(这反映了我正在处理的avg文件大小):
mkdir 1
cd 1
for i in {1..50000}; do dd if=/dev/zero of=$i.xml bs=1 count=10000; done
cd ..
cp -r 1 2
尽可能简单地创建了我的2个脚本:
Perl的
foreach my $file (<$ARGV[0]/*.xml>){
my $fh;
open($fh, "< $file");
my $contents = do { local $/; <$fh> };
close($fh);
}
蟒蛇
import glob, sys
for file in glob.iglob(sys.argv[1] + '/*.xml'):
with open(file) as x:
f = x.read()
然后我清除了缓存并运行了我的2个slurp脚本,在每次运行之间我再次使用以下方法清理缓存:
sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
然后进行监控以确保每次都从磁盘读取所有内容:
sudo iotop -a -u me
我在具有RAID 10磁盘的物理计算机上尝试了这一点,并且在一个全新的虚拟机上设置了虚拟机在RAID 1固态硬盘上的设置,刚刚包含了来自我的虚拟机的测试运行,因为物理服务器的速度大致相同。
$ time python readFiles.py 1
real 5m2.493s
user 0m1.783s
sys 0m5.013s
$ time perl readFiles.pl 2
real 0m13.059s
user 0m1.690s
sys 0m2.471s
$ time perl readFiles.pl 2
real 0m13.313s
user 0m1.670s
sys 0m2.579s
$ time python readFiles.py 1
real 4m43.378s
user 0m1.772s
sys 0m4.731s
我注意到在iotop上Perl运行时DISK READ约为45 M / s而IOWAIT约为70%,当运行Python DISK READ时为2M / s而IOWAIT为97%。我不知道从哪里开始把它们煮到尽可能简单。
如果它是相关的
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
根据要求提供进一步的信息
我运行了strace并抓住了文件1000.xml的信息,但似乎都做了同样的事情:
Perl的
$strace -f -T -o trace.perl.1 perl readFiles.pl 2
32303 open("2/1000.xml", O_RDONLY) = 3 <0.000020>
32303 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff7f6f7b90) = -1 ENOTTY (Inappropriate ioctl for device) <0.000016>
32303 lseek(3, 0, SEEK_CUR) = 0 <0.000016>
32303 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000016>
32303 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 <0.000017>
32303 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000030>
32303 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.005323>
32303 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 1808 <0.000022>
32303 read(3, "", 8192) = 0 <0.000019>
32303 close(3) = 0 <0.000017>
蟒蛇
$strace -f -T -o trace.python.1 python readFiles.py 1
32313 open("1/1000.xml", O_RDONLY) = 3 <0.000021>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000017>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000019>
32313 lseek(3, 0, SEEK_CUR) = 0 <0.000018>
32313 fstat(3, {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000018>
32313 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa18820a000 <0.000019>
32313 lseek(3, 0, SEEK_CUR) = 0 <0.000018>
32313 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 <0.006795>
32313 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 1808 <0.000031>
32313 read(3, "", 4096) = 0 <0.000018>
32313 close(3) = 0 <0.000027>
32313 munmap(0x7fa18820a000, 4096) = 0 <0.000022>
我注意到的一个区别是,不确定它是否相关,是Perl似乎在开始打开它们之前对所有文件运行它,而python不会:
32303 lstat("2/1000.xml", {st_mode=S_IFREG|0664, st_size=10000, ...}) = 0 <0.000022>
还用-c运行了strace(刚刚接过几个调用):
Perl的
$ time strace -f -c perl readFiles.pl 2
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
44.07 3.501471 23 150018 read
12.54 0.996490 10 100011 fstat
9.47 0.752552 15 50000 lstat
7.99 0.634904 13 50016 open
6.89 0.547016 11 50017 close
6.19 0.491944 10 50008 50005 ioctl
6.12 0.486208 10 50014 3 lseek
6.10 0.484374 10 50001 fcntl
real 0m37.829s
user 0m6.373s
sys 0m25.042s
蟒蛇
$ time strace -f -c python readFiles.py 1
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
42.97 4.186173 28 150104 read
15.58 1.518304 10 150103 fstat
10.51 1.023681 20 50242 174 open
10.12 0.986350 10 100003 lseek
7.69 0.749387 15 50047 munmap
6.85 0.667576 13 50071 close
5.90 0.574888 11 50073 mmap
real 5m5.237s
user 0m7.278s
sys 0m30.736s
是否对-T的输出进行了解析并计算了每个文件读取的第一个8192字节,并且很明显这是时间的地方,下面是50000首次读取文件所花费的总时间,然后是每次阅读的平均时间。
300.247128000002 (0.00600446220302379) - Python
11.6845620000003 (0.000233681892724297) - Perl
不确定这是否有帮助!
更新2 更新Python中的代码以使用os.open和os.read,只需读取前4096个字节(这对我来说就像我想要的信息在文件的顶部),也消除了所有其他的调用strace的:
18346 open("1/1000.xml", O_RDONLY) = 3 <0.000026>
18346 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 <0.007206>
18346 close(3) = 0 <0.000024>
$ time strace -f -c python readFiles.py 1
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
55.39 2.388932 48 50104 read
22.86 0.986096 20 50242 174 open
20.72 0.893579 18 50071 close
real 4m48.751s
user 0m3.078s
sys 0m12.360s
Total Time (avg read call)
282.28626 (0.00564290374812595)
仍然没有更好......接下来我将在Azure上创建一个虚拟机并尝试另一个例子!
更新3 - 为此大小道歉!!
确定一些有趣的结果在3个设置上使用你的(@JFSebastian)脚本,为了简洁而在开始时剥离输出,并且还删除了从缓存中运行速度超快的所有测试,如下所示:
0.23user 0.26system 0:00.50elapsed 99%CPU (0avgtext+0avgdata 9140maxresident)k
0inputs+0outputs (0major+2479minor)pagefaults 0swaps
Azure A2标准VM(2核3.5GB RAM磁盘未知但速度慢)
$ uname -a
Linux servername 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
(with 41 registered patches, see perl -V for more detail)
+ /usr/bin/time perl slurp.pl 1
1.81user 2.95system 3:11.28elapsed 2%CPU (0avgtext+0avgdata 9144maxresident)k
1233840inputs+0outputs (20major+2461minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
1.56user 3.76system 3:06.05elapsed 2%CPU (0avgtext+0avgdata 8024maxresident)k
1232232inputs+0outputs (14major+52273minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
1.90user 3.11system 6:02.17elapsed 1%CPU (0avgtext+0avgdata 9144maxresident)k
1233776inputs+0outputs (16major+2465minor)pagefaults 0swaps
对于两者都有可比较的第一次啜饮结果,不确定在第二次Perl啜饮期间发生了什么?
我的VMWare Linux VM(2核8GB RAM磁盘RAID1 SSD)
$ uname -a
Linux servername 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
$ perl -v
This is perl 5, version 18, subversion 2 (v5.18.2) built for x86_64-linux-gnu-thread-multi
(with 41 registered patches, see perl -V for more detail)
+ /usr/bin/time perl slurp.pl 1
1.66user 2.55system 0:13.28elapsed 31%CPU (0avgtext+0avgdata 9136maxresident)k
1233152inputs+0outputs (20major+2460minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
2.10user 4.67system 4:45.65elapsed 2%CPU (0avgtext+0avgdata 8012maxresident)k
1232056inputs+0outputs (14major+52269minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
2.13user 4.11system 5:01.40elapsed 2%CPU (0avgtext+0avgdata 9140maxresident)k
1233264inputs+0outputs (16major+2463minor)pagefaults 0swaps
这一次,和以前一样,Perl在第一次啜饮时速度更快,不确定在第二次Perl啜饮时发生了什么,尽管之前没有看到这种行为。再次测量measure.sh并且结果完全相同或者花费几秒钟。然后我做了正常人会做的事情并更新了内核以匹配Azure机器3.13.0-35-generic并再次运行measure.sh并且对结果没有任何影响。
出于好奇,我接着在measure.sh中交换了1和2参数,发生了一些奇怪的事情.Perl放慢速度,Python加速了!
+ /usr/bin/time perl slurp.pl 2
1.78user 3.46system 4:43.90elapsed 1%CPU (0avgtext+0avgdata 9140maxresident)k
1234952inputs+0outputs (21major+2458minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 2
1.19user 3.09system 0:10.67elapsed 40%CPU (0avgtext+0avgdata 8012maxresident)k
1233632inputs+0outputs (14major+52269minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 1
1.36user 2.32system 0:13.40elapsed 27%CPU (0avgtext+0avgdata 9136maxresident)k
1232032inputs+0outputs (17major+2465minor)pagefaults 0swaps
这让我更加困惑:-(
物理服务器(32核132 GB RAM磁盘RAID10 SAS)
$ uname -a
Linux servername 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$ python
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
$ perl -v
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
(with 55 registered patches, see perl -V for more detail)
+ /usr/bin/time perl slurp.pl 1
2.22user 2.60system 0:15.78elapsed 30%CPU (0avgtext+0avgdata 43728maxresident)k
1233264inputs+0outputs (15major+2984minor)pagefaults 0swaps
+ clearcache
+ sync
+ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
+ /usr/bin/time python slurp.py 1
2.51user 4.79system 1:58.53elapsed 6%CPU (0avgtext+0avgdata 34256maxresident)k
1234752inputs+0outputs (16major+52385minor)pagefaults 0swaps
+ /usr/bin/time perl slurp.pl 2
2.17user 2.95system 0:06.96elapsed 73%CPU (0avgtext+0avgdata 43744maxresident)k
1232008inputs+0outputs (14major+2987minor)pagefaults 0swaps
Perl似乎每次都赢了。
困惑
鉴于我本地虚拟机的奇怪之处,当我交换目录时,这是我最能控制的机器,我将尝试使用1或2作为数据目录运行python vs perl的所有可能选项的二进制方法尝试多次运行它们以保持一致性,但这需要一段时间,我会有点疯狂,所以首先需要休息!我想要的只是一致性:-(
更新4 - 一致性
(下面是在ubuntu-14.04.1服务器虚拟机上运行,内核是3.13.0-35-通用#62-Ubuntu)
我想我已经找到了一些一致性,在数据目录1/2上以Python / Perl slurp的方式运行测试我发现以下内容:
所以我看了操作系统级复制,看起来像Ubuntu'cp'的行为方式与Python相同,即原始文件速度慢,复制文件速度快。
这就是我运行的结果,我在具有单个SATA HD和RAID10系统的机器上执行了几次,结果如下:
$ mkdir 1
$ cd 1
$ for i in {1..50000}; do dd if=/dev/urandom of=$i.xml bs=1K count=10; done
$ cd ..
$ cp -r 1 2
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy2c cp -r 2 2copy
real 0m28.624s
user 0m1.429s
sys 0m27.558s
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy1c cp -r 1 1copy
real 5m21.166s
user 0m1.348s
sys 0m30.717s
跟踪结果显示时间花在哪里
$ head trace.copy1c trace.copy2c
==> trace.copy1c <==
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
60.09 2.541250 25 100008 read
12.22 0.516799 10 50000 write
9.62 0.406904 4 100009 open
5.59 0.236274 2 100013 close
4.80 0.203114 4 50004 1 lstat
4.71 0.199211 2 100009 fstat
2.19 0.092662 2 50000 fadvise64
0.72 0.030418 608 50 getdents
==> trace.copy2c <==
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
47.86 0.802376 8 100008 read
13.55 0.227108 5 50000 write
13.02 0.218312 2 100009 open
7.36 0.123364 1 100013 close
6.83 0.114589 1 100009 fstat
6.31 0.105742 2 50004 1 lstat
3.38 0.056634 1 50000 fadvise64
1.62 0.027191 544 50 getdents
So it seems copying copies is much faster than copying original files, my current guess is that when copied the files get aligned on disk better than when they were originally created making them more efficient to read?
Interestingly 'rsyn' and 'cp' seem to work in opposite ways speedwise, much like Perl and Python!
$ rm -rf 1copy 2copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Rsync 1"; /usr/bin/time rsync -a 1 1copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Rsync 2"; /usr/bin/time rsync -a 2 2copy
Rsync 1
3.62user 3.76system 0:13.00elapsed 56%CPU (0avgtext+0avgdata 5072maxresident)k
1230600inputs+1200000outputs (13major+2684minor)pagefaults 0swaps
Rsync 2
4.87user 6.52system 5:06.24elapsed 3%CPU (0avgtext+0avgdata 5076maxresident)k
1231832inputs+1200000outputs (13major+2689minor)pagefaults 0swaps
$ rm -rf 1copy 2copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Copy 1"; /usr/bin/time cp -r 1 1copy; sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'; echo "Copy 2"; /usr/bin/time cp -r 2 2copy
Copy 1
0.48user 6.42system 5:05.30elapsed 2%CPU (0avgtext+0avgdata 1212maxresident)k
1229432inputs+1200000outputs (6major+415minor)pagefaults 0swaps
Copy 2
0.33user 4.17system 0:11.13elapsed 40%CPU (0avgtext+0avgdata 1212maxresident)k
1230416inputs+1200000outputs (6major+414minor)pagefaults 0swaps
I will focus on only one of your examples, because rest things should be analogical:
What I think, may matter in this situation is Read-Ahead (or maybe another technique related to this) feature:
Let consider such example:
I have created 1000 xml files in "1" dir (names 1.xml to 1000.xml) as you did by dd command and then I copied orginal dir 1 to dir 2
$ mkdir 1
$ cd 1
$ for i in {1..1000}; do dd if=/dev/urandom of=$i.xml bs=1K count=10; done
$ cd ..
$ cp -r 1 2
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy2c cp -r 2 2copy
$ sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
$ time strace -f -c -o trace.copy1c cp -r 1 1copy
In the next step I debugged cp command (by strace) to found out in what order data are copied:
So cp does it in following order (only first 4 files, because I saw that the second read from original directory is more time consuming that second read from copied directory)
100.xml 150.xml 58.xml 64.xml ... * in my example
Now, take a look on filesystem blocks which are used by these files (debugfs output - ext3 fs):
Original directory:
BLOCKS:
(0-9):63038-63047 100.xml
(0-9):64091-64100 150.xml
(0-9):57926-57935 58.xml
(0-9):60959-60968 64.xml
....
Copied directory:
BLOCKS:
(0-9):65791-65800 100.xml
(0-9):65801-65810 150.xml
(0-9):65811-65820 58.xml
(0-9):65821-65830 64.xml
....
As you can see, in the "Copied directory" the block are adjacent, so it means that during reading of the first file 100.xml the "Read Ahead" technique (controller or system settings) can increase performance.
dd create file in order 1.xml to 1000.xml, but cp command copies it in another order (100.xml, 150.xml, 58.xml,64.xml). So when you execute:
cp -r 1 1copy
to copy this dir to another, the blocks of files which you are copied are not adjacent, so read of such files take more time.
When you copy dir which you copied by cp command (so files are not created by dd command), then file are adjacent so creating:
cp -r 2 2copy
copy of the copy is faster.
Summary: So to test performance python/perl you should use the same dir (or two dirs copied by cp command) and also you can use option O_DIRECT to read bypassing all kernel buffers and read data from disk directly.
Please remember, that results can be different on different type of kernel, system, disk controller, system settings, fs and so on.
Additions:
[debugfs]
[root@dhcppc3 test]# debugfs /dev/sda1
debugfs 1.39 (29-May-2006)
debugfs: cd test
debugfs: stat test.xml
Inode: 24102 Type: regular Mode: 0644 Flags: 0x0 Generation: 3385884179
User: 0 Group: 0 Size: 4
File ACL: 0 Directory ACL: 0
Links: 1 Blockcount: 2
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x543274bf -- Mon Oct 6 06:53:51 2014
atime: 0x543274be -- Mon Oct 6 06:53:50 2014
mtime: 0x543274bf -- Mon Oct 6 06:53:51 2014
BLOCKS:
(0):29935
TOTAL: 1
debugfs:
作者:黑洞官方问答小能手
链接:https://www.pythonheidong.com/blog/article/57098/1f59dd9a495ba6ff8217/
来源:python黑洞网
任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任
昵称:
评论内容:(最多支持255个字符)
---无人问津也好,技不如人也罢,你都要试着安静下来,去做自己该做的事,而不是让内心的烦躁、焦虑,坏掉你本来就不多的热情和定力
Copyright © 2018-2021 python黑洞网 All Rights Reserved 版权所有,并保留所有权利。 京ICP备18063182号-1
投诉与举报,广告合作请联系vgs_info@163.com或QQ3083709327
免责声明:网站文章均由用户上传,仅供读者学习交流使用,禁止用做商业用途。若文章涉及色情,反动,侵权等违法信息,请向我们举报,一经核实我们会立即删除!