VOC2007数据集制作并训练（已实现）-python黑洞网

本站消息

站长简介/公众号

出租广告位,需要合作请联系站长

pinggo

1044

文章

906377

访问

+关注

分类

暂无分类

日期归档

暂无数据

VOC2007数据集制作并训练（已实现）

发布于2019-08-07 12:00 阅读(1265) 评论(0) 点赞(3) 收藏(1)

背景：已经编译好py-faster-rcnn-master

了解数据集包括什么内容。

数据集信息：github上以为大佬的详细文件https://github.com/EddyGao/make_VOC2007

鉴于网站可能有点卡，在此处重新上述信息：

正文如下：

------------------------------------------------------------------分割线--------------------------------------------------------------------------------------

相信看这篇文章的人都在做深度学习吧，此数据集是为目标检测做的数据集，有错误处请海涵我的本篇博客地址http://blog.csdn.net/gaohuazhao/article/details/60871886 第一步：首先了解VOC2007数据集的格式

1)JPEGImages文件夹

文件夹里包含了训练图片和测试图片，混放在一起

2)Annatations文件夹

文件夹存放的是xml格式的标签文件，每个xml文件都对应于JPEGImages文件夹的一张图片

3)ImageSets文件夹

Action存放的是人的动作，我们暂时不用

Layout存放的人体部位的数据。我们暂时不用

Main存放的是图像物体识别的数据，分为20类，当然我们自己制作就呵呵呵不一定了，如果你有精力，Main里面有test.txt , train.txt, val.txt ,trainval.txt.这四个文件我们后面会生成

Segmentation存放的是可用于分割的数据

4)其他的文件夹不解释了，分割XXX等用的

如果你下载了VOC2007数据集，那么把它解压，把各个文件夹里面的东西删除，保留文件夹名字。如果没下载，那么就仿照他的文件夹格式，自己建好空文件夹就行。

第二步：搞定JPEGSImages文件夹

1)把你的图片放到JPEGSImages里面，在VOC2007里面，人家的图片文件名都是000001.jpg类似这样的，我们也统一格式，把我们的图片名字重命名成这样的，如果你的文件太多怎么办，请看我的另一篇文章http://blog.csdn.NET/gaohuazhao/article/details/60324715 能批量重命名文件

第三步：搞定Annatations文件夹

网上很多教程，但是我觉得都很麻烦，直到我遇到了一位大神做的软件，手动标注，会自动生成图片信息的xml文件

1)本项目中的labelImg-master,执行labelImg.py

2)保存的路径就是我们的Annatations文件夹，别保存别的地方去了，，，

3)一张张的慢慢画框。。。。。。。。。大约过了几个小时，好继续下一步

第四步：搞定ImageSets文件夹中的Main文件夹中的四个文件

直接上一个代码给你： make_main_txt.py

OK，制作完成，就是这么简单，那么解释一下这四个txt文档是干嘛的，看名字就知道，就是分分多少图片作为训练，多少图片作为测试，，，，

我们将继续填坑

-----------------------------------------------------------------------结束------------------------------------------------------------------------------------------

2.重命名你的图片名称

放一段代码，路径和自己的图片格式注意修改

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import os

class BatchRename():
    '''
    批量重命名文件夹中的图片文件

    '''
    def __init__(self):
        self.path = '/home/ubun/labelImg-master/data/input'

    def rename(self):
        filelist = os.listdir(self.path)
        total_num = len(filelist)
        i = 1
        for item in filelist:
            if item.endswith('.bmp'):
                src = os.path.join(os.path.abspath(self.path), item)
                if len(str(i))==1:
                    dst = os.path.join(os.path.abspath(self.path), '00000'+str(i) + '.bmp')
                elif len(str(i))==2:
                    dst = os.path.join(os.path.abspath(self.path), '0000'+str(i) + '.bmp')
                else:
                    dst = os.path.join(os.path.abspath(self.path), '000'+str(i) + '.bmp')
                try:
                    os.rename(src, dst)
                    print 'converting %s to %s ...' % (src, dst)
                    i = i + 1
                except:
                    continue
        print 'total %d to rename & converted %d jpgs' % (total_num, i)

if __name__ == '__main__':
    demo = BatchRename()
    demo.rename()

名称改完后，放入待训练的JPEGSImages文件夹。

3.使用labelImg进行标记检测位置

看我另一篇博客：ubuntu下安装labelImg

其中一个xml文件详解：

<annotation>
	<folder>input</folder>
	<filename>000001.bmp</filename>
	<path>/home/ubun/labelImg-master/data/input/000001.bmp</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>#尺寸
		<width>1536</width>
		<height>864</height>
		<depth>1</depth>#位深度
	</size>
	<segmented>0</segmented>
	<object>#物体一
		<name>bigerror</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>#是否有难度0为无
		<bndbox>#框位置左上右下坐标点
			<xmin>1179</xmin>
			<ymin>2</ymin>
			<xmax>1292</xmax>
			<ymax>253</ymax>
		</bndbox>
	</object>
</annotation>

标记之后，将所有的xml文件放入Annatations文件夹。

4.生成ImageSets文件夹中的子文件夹Main中的txt文档：test.txt , train.txt, val.txt ,trainval.txt。一共四个。

路径和文件名称需要注意修改：

#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 15 11:05:26 2019

@author: ubun
"""

import os
import random

trainval_percent = 0.66
train_percent = 0.5
xmlfilepath = '/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/Annotations'
txtsavepath = '/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/ImageSets/ImageSets/Main'
total_xml = os.listdir(xmlfilepath)

num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)

ftrainval = open('/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/ImageSets/Main/trainval.txt', 'w')
ftest = open('/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/ImageSets/Main/test.txt', 'w')
ftrain = open('/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/ImageSets/Main/train.txt', 'w')
fval = open('/home/ubun/py-faster-rcnn-master/VOCdevkit/VOC2007/ImageSets/Main/val.txt', 'w')

for i  in list:
    name=total_xml[i][:-4]+'\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest .close()

5.修改相关配置文件（过程有点多，主要是修改类别）：

参考了他的部分内容：用faster-rcnn训练自己的数据集(VOC2007格式,python版)

仍然写下来，好记性不如烂笔头

（1）预训练模型以及参数下载

cd $FRCN_ROOT
./data/scripts/fetch_imagenet_models.sh
./data/scripts/fetch_selective_search_data.sh

（2）配置文档修改

#修改py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt和stage2_fast_rcnn_train.pt 两个文件

name: "ZF"
layer {
  name: 'data'
  type: 'Python'
 top: 'data'
top: 'rois'
top: 'labels'
 top: 'bbox_targets'
 top: 'bbox_inside_weights'
top: 'bbox_outside_weights'
python_param {
  module: 'roi_data_layer.layer'
  layer: 'RoIDataLayer'
  param_str: "'num_classes': 2" #按训练集类别改，该值为类别数+1
}
}
 
layer {
 name: "cls_score"
 type: "InnerProduct"
 bottom: "fc7"
 top: "cls_score"
 param { lr_mult: 1.0 }
 param { lr_mult: 2.0 }
inner_product_param {
    num_output: 2 #按训练集类别改，该值为类别数+1
 weight_filler {
   type: "gaussian"
   std: 0.01
 }
 bias_filler {
    type: "constant"
   value: 0
  }
 }
}
 
layer {
  name: "bbox_pred"
 type: "InnerProduct"
 bottom: "fc7"
top: "bbox_pred"
 param { lr_mult: 1.0 }
 param { lr_mult: 2.0 }
 inner_product_param {
   num_output: 8 #按训练集类别改，该值为（类别数+1）*4
   weight_filler {
     type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

#修改py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt和stage2_rpn_train.pt 两个文件

layer {
  name: 'input-data'
  type: 'Python'
  top: 'data'
  top: 'im_info'
  top: 'gt_boxes'
  python_param {
    module: 'roi_data_layer.layer'
    layer: 'RoIDataLayer'
    param_str: "'num_classes': 2" #按训练集类别改，该值为类别数+1
  }
}

#修改py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt文件

layer {
  name: "cls_score"
  type: "InnerProduct"
  bottom: "fc7"
  top: "cls_score"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 2 #按训练集类别改，该值为类别数+1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
 
layer {
  name: "bbox_pred"
  type: "InnerProduct"
  bottom: "fc7"
  top: "bbox_pred"
  param { lr_mult: 1.0 }
  param { lr_mult: 2.0 }
  inner_product_param {
    num_output: 8 #按训练集类别改，该值为（类别数+1）*4
    weight_filler {
      type: "gaussian"
      std: 0.001
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

#修改py-faster-rcnn/lib/datasets/pascal_voc.py

self._classes = ('__background__', # always index 0
                         '你的标签1','你的标签2',你的标签3','你的标签4')
注:如果只是在原始检测的20种类别:'aeroplane', 'bicycle', 'bird', 'boat','bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse','motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor'中检测单一类别,可参考修改下面的代码:
def _load_image_set_index(self):
        """
        Load the indexes listed in this dataset's image set file.
        """
        # Example path to image set file:
        # self._devkit_path + /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt
        image_set_file = os.path.join(self._data_path, 'ImageSets', 'Main',
                                      self._image_set + '.txt')
        assert os.path.exists(image_set_file), \
                'Path does not exist: {}'.format(image_set_file)
        with open(image_set_file) as f:
            image_index = [x.strip() for x in f.readlines()]
注:如果需要在原始的20类别只检测车辆的话才需要修改这部分代码.
        # only load index with cars obj
        new_image_index = []
        for index in image_index:
            filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
            tree = ET.parse(filename)
            objs = tree.findall('object')
            num_objs = 0
            for ix, obj in enumerate(objs):
                curr_name = obj.find('name').text.lower().strip()
                if curr_name == 'car':
                    num_objs += 1
                    break
            if num_objs > 0:
                new_image_index.append(index)
        return new_image_index
def _load_pascal_annotation(self, index):
        """
        Load image and bounding boxes info from XML file in the PASCAL VOC
        format.
        """
        filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
        tree = ET.parse(filename)
        objs = tree.findall('object')
        if not self.config['use_diff']:
            # Exclude the samples labeled as difficult
            non_diff_objs = [
                obj for obj in objs if int(obj.find('difficult').text) == 0]
            # if len(non_diff_objs) != len(objs):
            #     print 'Removed {} difficult objects'.format(
            #         len(objs) - len(non_diff_objs))
            objs = non_diff_objs
注:如果需要在原始的20类别只检测车辆的话才需要修改这部分代码.
        # change num objs , only read car
        # num_objs = len(objs)
        num_objs = 0
        for ix, obj in enumerate(objs):
            curr_name = obj.find('name').text.lower().strip()
            if curr_name == 'car':
                num_objs += 1
        boxes = np.zeros((num_objs, 4), dtype=np.uint16)
        gt_classes = np.zeros((num_objs), dtype=np.int32)
        overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
        # "Seg" area for pascal is just the box area
        seg_areas = np.zeros((num_objs), dtype=np.float32)
#注:如果需要在原始的20类别只检测车辆的话才需要修改这部分代码
# Load object bounding boxes into a data    frame.
        tmp_ix = 0
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            # Make pixel indexes 0-based
            x1 = float(bbox.find('xmin').text) - 1
            y1 = float(bbox.find('ymin').text) - 1
            x2 = float(bbox.find('xmax').text) - 1
            y2 = float(bbox.find('ymax').text) - 1
            curr_name = obj.find('name').text.lower().strip()
            if curr_name != 'car':
                continue
            cls = self._class_to_ind[curr_name]
            boxes[tmp_ix, :] = [x1, y1, x2, y2]
            gt_classes[tmp_ix] = cls
            overlaps[tmp_ix, cls] = 1.0
            seg_areas[tmp_ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
            tmp_ix += 1
        overlaps = scipy.sparse.csr_matrix(overlaps)
        return {'boxes' : boxes,
                'gt_classes': gt_classes,
                'gt_overlaps' : overlaps,
                'flipped' : False,
                'seg_areas' : seg_areas}

#py-faster-rcnn/lib/datasets/imdb.py修改

def append_flipped_images(self):
        num_images = self.num_images
        widths = [PIL.Image.open(self.image_path_at(i)).size[0]
                  for i in xrange(num_images)]
        for i in xrange(num_images):
            boxes = self.roidb[i]['boxes'].copy()
            oldx1 = boxes[:, 0].copy()
            oldx2 = boxes[:, 2].copy()
            boxes[:, 0] = widths[i] - oldx2 - 1
            boxes[:, 2] = widths[i] - oldx1 - 1
 
            for b in range(len(boxes)):
                if boxes[b][2] < boxes[b][0]:
                   boxes[b][0] = 0
 
            assert (boxes[:, 2] >= boxes[:, 0]).all()

#修改检测图片类别

voc2007相关配置文件默认图片格式为jpg形式，如果寻要训练的图片不是此格式，则修改~/lib/datasets/pascal_voc.py文件。

self._image_ext = '.jpg'

#py-faster-rcnn/tools/train_faster_rcnn_alt_opt.py修改迭代次数（建议修改）

max_iters=[8000,4000,8000,4000]
建议:第一次训练使用较低的迭代次数,先确保能正常训练,如max_iters=[8,4,8,4]

训练分为4个阶段（rpn第1阶段，fast rcnn第1阶段，rpn第2阶段，fast rcnn第2阶段）的迭代次数。可改成你希望的迭代次数。
如果改了这些数值，最好把py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt里对应的solver文件（有4个）也修改，stepsize小于上面修改的数值，stepsize的意义是经过stepsize次的迭代后降低一次学习率（非必要修改）。

#删除缓存文件(每次修改配置文件后训练都要做)

删除py-faster-rcnn文件夹下所有的.pyc文件及data文件夹下的cache文件夹,data/VOCdekit2007下的annotations_cache文件夹(最近一次成功训练的annotation和当前annotation一样的话这部分可以不删,否则可以正常训练,但是最后评价模型会出错)

6.训练命令：

cd $FRCN_ROOT
./experiments/scripts/faster_rcnn_alt_opt.sh 0 ZF pascal_voc

7.测试命令

测试时将生成的模型手动转移到前面下载的~/data/faster-rcnn-models里面，修改demo.py中的类别名，以及需要检测的图片名称，图片已经放在~/data/demo/中，

./demo.py

即可测试

最后：首次实现，当天即写了下来，应该没有遗忘的部分，中间的坑也是通过这些步骤一点一点填平，下次再加以补充。

另附上中间特征层输出代码：

#!/usr/bin/env python
 
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
 
"""
Demo script showing detections in sample images.
See README.md for installation instructions before running.
"""
 
import _init_paths
from fast_rcnn.config import cfg
from fast_rcnn.test import im_detect
from fast_rcnn.nms_wrapper import nms
from utils.timer import Timer
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as sio
import caffe, os, sys, cv2
import argparse
import math
 
CLASSES = ('__background__',
           'aeroplane', 'bicycle', 'bird', 'boat',
           'bottle', 'bus', 'car', 'cat', 'chair',
           'cow', 'diningtable', 'dog', 'horse',
           'motorbike', 'person', 'pottedplant',
           'sheep', 'sofa', 'train', 'tvmonitor')
 
NETS = {'vgg16': ('VGG16',
                  'VGG16_faster_rcnn_final.caffemodel'),
        'zf': ('ZF',
                  'ZF_faster_rcnn_final.caffemodel')}
 
 
def vis_detections(im, class_name, dets, thresh=0.5):
    """Draw detected bounding boxes."""
    inds = np.where(dets[:, -1] >= thresh)[0]
    if len(inds) == 0:
        return
 
    im = im[:, :, (2, 1, 0)]
    fig, ax = plt.subplots(figsize=(12, 12))
    ax.imshow(im, aspect='equal')
    for i in inds:
        bbox = dets[i, :4]
        score = dets[i, -1]
 
        ax.add_patch(
            plt.Rectangle((bbox[0], bbox[1]),
                          bbox[2] - bbox[0],
                          bbox[3] - bbox[1], fill=False,
                          edgecolor='red', linewidth=3.5)
            )
        ax.text(bbox[0], bbox[1] - 2,
                '{:s} {:.3f}'.format(class_name, score),
                bbox=dict(facecolor='blue', alpha=0.5),
                fontsize=14, color='white')
 
    ax.set_title(('{} detections with '
                  'p({} | box) >= {:.1f}').format(class_name, class_name,
                                                  thresh),
                  fontsize=14)
    plt.axis('off')
    plt.tight_layout()
    #plt.draw()
def save_feature_picture(data, name, image_name=None, padsize = 1, padval = 1):
    data = data[0]
    #print "data.shape1: ", data.shape
    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = ((0, n ** 2 - data.shape[0]), (0, 0), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
    #print "padding: ", padding
    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
    #print "data.shape2: ", data.shape
    
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
    #print "data.shape3: ", data.shape, n
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
    #print "data.shape4: ", data.shape
    plt.figure()
    plt.imshow(data,cmap='gray')
    plt.axis('off')
    #plt.show()
    if image_name == None:
        img_path = './data/feature_picture/' 
    else:
        img_path = './data/feature_picture/' + image_name + "/"
        check_file(img_path)
    plt.savefig(img_path + name + ".jpg", dpi = 400, bbox_inches = "tight")
def check_file(path):
    if not os.path.exists(path):
        os.mkdir(path)
def demo(net, image_name):
    """Detect object classes in an image using pre-computed object proposals."""
 
    # Load the demo image
    im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
    im = cv2.imread(im_file)
 
    # Detect all object classes and regress object bounds
    timer = Timer()
    timer.tic()
    scores, boxes = im_detect(net, im)
    for k, v in net.blobs.items():
        if k.find("conv")>-1 or k.find("pool")>-1 or k.find("rpn")>-1:
            save_feature_picture(v.data, k.replace("/", ""), image_name)#net.blobs["conv1_1"].data, "conv1_1") 
    timer.toc()
    print ('Detection took {:.3f}s for '
           '{:d} object proposals').format(timer.total_time, boxes.shape[0])
 
    # Visualize detections for each class
    CONF_THRESH = 0.8
    NMS_THRESH = 0.3
    for cls_ind, cls in enumerate(CLASSES[1:]):
        cls_ind += 1 # because we skipped background
        cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]
        cls_scores = scores[:, cls_ind]
        dets = np.hstack((cls_boxes,
                          cls_scores[:, np.newaxis])).astype(np.float32)
        keep = nms(dets, NMS_THRESH)
        dets = dets[keep, :]
        vis_detections(im, cls, dets, thresh=CONF_THRESH)
 
def parse_args():
    """Parse input arguments."""
    parser = argparse.ArgumentParser(description='Faster R-CNN demo')
    parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]',
                        default=0, type=int)
    parser.add_argument('--cpu', dest='cpu_mode',
                        help='Use CPU mode (overrides --gpu)',
                        action='store_true')
    parser.add_argument('--net', dest='demo_net', help='Network to use [vgg16]',
                        choices=NETS.keys(), default='vgg16')
 
    args = parser.parse_args()
 
    return args
 
def print_param(net):
    for k, v in net.blobs.items():
	print (k, v.data.shape)
    print ""
    for k, v in net.params.items():
	print (k, v[0].data.shape)  
 
if __name__ == '__main__':
    cfg.TEST.HAS_RPN = True  # Use RPN for proposals
 
    args = parse_args()
 
    prototxt = os.path.join(cfg.MODELS_DIR, NETS[args.demo_net][0],
                            'faster_rcnn_alt_opt', 'faster_rcnn_test.pt')
    #print "prototxt: ", prototxt
    caffemodel = os.path.join(cfg.DATA_DIR, 'faster_rcnn_models',
                              NETS[args.demo_net][1])
 
    if not os.path.isfile(caffemodel):
        raise IOError(('{:s} not found.\nDid you run ./data/script/'
                       'fetch_faster_rcnn_models.sh?').format(caffemodel))
 
    if args.cpu_mode:
        caffe.set_mode_cpu()
    else:
        caffe.set_mode_gpu()
        caffe.set_device(args.gpu_id)
        cfg.GPU_ID = args.gpu_id
    net = caffe.Net(prototxt, caffemodel, caffe.TEST)
    
    #print_param(net)
 
    print '\n\nLoaded network {:s}'.format(caffemodel)
 
    # Warmup on a dummy image
    im = 128 * np.ones((300, 500, 3), dtype=np.uint8)
    for i in xrange(2):
        _, _= im_detect(net, im)
 
    im_names = ['000456.jpg', '000542.jpg', '001150.jpg',
                '001763.jpg', '004545.jpg', '000563.jpg']
    for im_name in im_names:
        print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
        print 'Demo for data/demo/{}'.format(im_name)
        demo(net, im_name)
 
    #plt.show()