程序员最近都爱上了这个网站  程序员们快来瞅瞅吧!  it98k网:it98k.com

本站消息

站长简介/公众号

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

Why does repeated memory allocation slow down after using JayDeBeApi on Windows?

发布于2025-01-02 16:57     阅读(966)     评论(0)     点赞(6)     收藏(5)


My python script reads large CSV files (~2GB) into memory and then does some sorting and merging of the data. The same code also connects to a database via JDBC driver using the JayDeBeApi package to run small queries, and I noticed that after making a connection to the database (not even running a query), the reading/sorting/merging part of the program takes 10x as long to complete.

I convinced myself that it has something to do with memory allocation after I boiled my code down to the following two snippets:

First without the slowdown:

# Allocate a large list of lists, twice.
# They take the same amount of time.
mydata1 = [[None, None, None, None] for _ in range(10_000_000)] # takes 5s
mydata2 = [[None, None, None, None] for _ in range(10_000_000)] # takes 5s

Then add the db connection and observe the slowdown:

# Allocate a large list of lists, twice, but connect to a database after the 1st list.
# The 2nd one takes 10x as long.
import jaydebeapi
db_info = (java_classname, db_url, driver_args, jdbc_jar_files)
mydata1 = [[None, None, None, None] for _ in range(10_000_000)] # takes 5s
conn = jaydebeapi.connect(db_info)
mydata2 = [[None, None, None, None] for _ in range(10_000_000)] # takes 50s

My Actual Qustion: What is it about the JayDeBeApi package that causes this drastic difference when running on Windows? Does sharing memory between a JVM and a python interpreter cause extra copies or operations on every allocation?

I read that JayDeBeApi starts up a JVM in the same process as the python interpreter and that it shares memory between the two. Does shared memory always cause a something inefficient about memory allocation? If so, why?

Some Additional Context:

  • I tested on two different PCs running Windows 10, same outcome for both
  • I tested on Linux and there was still a slight slowdown, but quite modest in comparison
  • The slowness is only pronounced with repeated memory allocations, as with creating a long list of small lists. If I use [None for _ in range(40_000_000)] it's not a problem, compared with [[None,None,None,None] for _ range(10_000_000)].
  • I found a wealth of low-level Windows information in this answer to a similar question, but I'm not sure how to connect that to what JayDeBeApi is doing and why it affects python code that doesn't use the JVM (even though the JVM is running alongside).

解决方案


暂无回答



所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接:https://www.pythonheidong.com/blog/article/2046699/b7c7bbb0a12e9ab84118/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

6 0
收藏该文
已收藏

评论内容:(最多支持255个字符)