+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

暂无数据

查找列表之间的差异并将差异追加到列表,但是对于40个不同的列表-python

发布于2020-05-29 22:08     阅读(898)     评论(0)     点赞(16)     收藏(4)


嗨,很难在标题中正确解释这一点,但是首先让我开始解释我的数据。我有一个这样的形式存储在列表中的40个列表:

data[0] = [[value1 value2 value3,80],[value1,90],[value1 value3,60],[value2 value3,70]]
data[1] = [[value2,40],[value1 value2 value3,90]]
data[2] = [[value1 value2,80],[value1,50],[value1 value3,20]]
   .
   .
   .

现在,我期望这样的输出:

data[0] = [[value1 value2 value3,80],[value1,90],[value1 value3,60],[value2 value3,70],[value2,0],[value1 value2,0]]
data[1] = [[value2,40],[value1 value2 value3,90],[value1,0],[value1 value3,0],[value2 value3,0],[value1 value2,0]]
data[2] = [[value1 value2,80],[value1,50],[value1 value3,20],[value1 value2 value3,0],[value2 value3,0],[value2,0]]    

我知道阅读起来有点复杂,但是我想确保那里有一个很好的数据演示。因此,基本上所有列表都需要具有所有列表中存在的值的所有可能组合,如果该列表中不存在作为标准的组合,则其频率(第二个字段)为0。

感谢您的帮助,请记住,这是40个不同列表的交集,因此需要快速高效。我不确定如何做到最好...

编辑:我也不知道所有的“值”,为简单起见,我在这里只写了3个不同的值(value1,value2,value3)。在我的项目中,我不知道值是多少或值有多少(我知道至少有几千个)

编辑2:这是一些真实的输入数据,我没有真实的输出数据,但我会尝试解决:

data[0] = [['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP syslog_priority:Info', '39.7769'], ['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP', '39.7769'], ['destination_ip:10.32.0.100 destination_service:http destination_port:80 syslog_priority:Info', '39.7769'], ['destination_ip:10.32.0.100 destination_service:http destination_port:80', '39.7769'], ['destination_ip:10.32.0.100 destination_service:http protocol:TCP syslog_priority:Info', '39.7769']]


data[1] = [['syslog_priority:Info', '100'], ['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80 protocol:TCP', '43.8362'], ['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80', '43.8362'], ['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http protocol:TCP', '43.8362'], ['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http', '43.8362']]


data[2] = [['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info protocol:TCP', '43.9506'], ['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info', '43.9506'], ['destination_ip:10.32.0.100 destination_port:80 destination_service:http protocol:TCP', '43.9506'], ['destination_ip:10.32.0.100 destination_port:80 destination_service:http', '43.9506'], ['destination_ip:10.32.0.100 destination_port:80 syslog_priority:Info protocol:TCP', '43.9506']]

解决方案


好吧,鉴于您的意见,我将使用已建议的集

首先遍历您的列表以构建每个可能的字符串的集合

possible_strings = set()
for row in mydata:
   for item in row:
       possible_string.add(item[0])

因此possible_strings在您的数据中具有所有可能的字符串

现在,您需要检查每一行是否有字符串,如果不存在,则需要以0的频率将其追加到行中

my_new_data = []
for row in mydata:
    row_strings = set(item[0] for item in row)
    missing_strings = possible_strings - row_strings
    for item in list(missing_strings):
         new_item = []
         new_item.append(item)
         new_item.append(0)
         row.append(new_item)
     row.sort()
     my_new_data.append(row)

我使用集合的原因是您不必进行任何查找,并且项目是字符串,因此它们可以成为集合的成员。有多种方法可以加快此过程(压缩代码),但是我喜欢布置布局,以便可以清楚地看到自己在做什么。除非我输入错误(并且我已经纠正了3个错误),否则该代码在我的计算机上有效

这是未排序的结果

newrow*************
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP syslog_priority:Info', '39.7769']
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP', '39.7769']
['destination_ip:10.32.0.100 destination_service:http destination_port:80 syslog_priority:Info', '39.7769']
['destination_ip:10.32.0.100 destination_service:http destination_port:80', '39.7769']
['destination_ip:10.32.0.100 destination_service:http protocol:TCP syslog_priority:Info', '39.7769']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http', 0]
['destination_ip:10.32.0.100 destination_port:80 syslog_priority:Info protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80 protocol:TCP', 0]
['syslog_priority:Info', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http', 0]
newrow*************
['syslog_priority:Info', '100']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80 protocol:TCP', '43.8362']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80', '43.8362']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http protocol:TCP', '43.8362']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http', '43.8362']
['destination_ip:10.32.0.100 destination_port:80 syslog_priority:Info protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP syslog_priority:Info', 0]
['destination_ip:10.32.0.100 destination_service:http protocol:TCP syslog_priority:Info', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_port:80 destination_service:http', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 syslog_priority:Info', 0]
newrow*************
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info protocol:TCP', '43.9506']
['destination_ip:10.32.0.100 destination_port:80 destination_service:http syslog_priority:Info', '43.9506']
['destination_ip:10.32.0.100 destination_port:80 destination_service:http protocol:TCP', '43.9506']
['destination_ip:10.32.0.100 destination_port:80 destination_service:http', '43.9506']
['destination_ip:10.32.0.100 destination_port:80 syslog_priority:Info protocol:TCP', '43.9506']
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 protocol:TCP syslog_priority:Info', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http destination_port:80 protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_service:http protocol:TCP syslog_priority:Info', 0]
['syslog_priority:Info', 0]
['destination_ip:10.32.0.100 syslog_priority:Info destination_service:http protocol:TCP', 0]
['destination_ip:10.32.0.100 destination_service:http destination_port:80 syslog_priority:Info', 0]


所属网站分类: 技术文章 > 问答

作者:黑洞官方问答小能手

链接: https://www.pythonheidong.com/blog/article/397411/

来源: python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

16 0
收藏该文
已收藏

评论内容:(最多支持255个字符)