python批量修改labelImg⽣成的xml⽂件的⽅法
概述
⾃⼰在⽤labelImg打好标签后,想只⽤其中⼏类训练,不想训练全部类别,⼜不想重新打标⽣成.xml⽂件,因此想到这个办法:直接在.xml⽂件中删除原有的不需要的标签类及其属性。
打标时标签名出现了⼤⼩写(⼯程量⼤时可能会⼿滑),程序中有改写标签值为⼩写的过程,因为我做py-faster-rcnn 训练时,标签必须全部为⼩写。
以如下的.xml⽂件为例,我故意把标签增加了⼤写
test.jpg C:\\Users\\yasin\\Desktop\est Unknown
400 300 3
0 People Unspecified 0 0 80 69 144
CAT
Unspecified 0 0 40 69 143 16
dog
Unspecified 0 0 96 82 176 87
具体实现
假如我们只想保留图⽚上的people和cat类,其他都删除,代码如下:
from xml.etree.ElementTree import ElementTreefrom os import walk, pathdef read_xml(in_path): tree = ElementTree() tree.parse(in_path) return tree
def write_xml(tree, out_path):
tree.write(out_path, encoding=\"utf-8\def find_nodes(tree, path): return tree.findall(path)
def del_node_by_target_classes(nodelist, target_classes_lower, tree_root): for parent_node in nodelist:
children = parent_node.getchildren()
if (parent_node.tag == \"object\" and children[0].text.lower() not in target_classes_lower): tree_root.remove(parent_node)
elif (parent_node.tag == \"object\" and children[0].text.lower() in target_classes_lower): children[0].text = children[0].text.lower()def get_fileNames(rootdir): data_path = [] prefixs = []
for root, dirs, files in walk(rootdir, topdown=True): for name in files:
pre, ending = path.splitext(name) if ending != \".xml\": continue else:
data_path.append(path.join(root, name)) prefixs.append(pre) return data_path, prefixs
if __name__ == \"__main__\":
# get all the xml paths, prefixes if not used here
paths_xml, prefixs = get_fileNames(\"/home/yasin/old_labels/\") target_classes = [\"PEOPLE\
target_classes_lower = []
for i in range(len(target_classes)):
target_classes_lower.append(target_classes[i].lower()) # make sure your target is lowe-case # print(target_classes_lower) for i in range(len(paths_xml)):
# rename and save the corresponding xml tree = read_xml(paths_xml[i])
# get tree node
tree_root = tree.getroot()
# get parent nodes
del_parent_nodes = find_nodes(tree, \"./\")
# get target classes and delete
target_del_node = del_node_by_target_classes(del_parent_nodes, target_classes_lower, tree_root)
# save output xml, 000001.xml
write_xml(tree, \"/home/yasin/new_labels/{}.xml\".format(\"%06d\" % i))
按照上述代码,⽰例.xml变为如下.xml,可以看出我们删除了除people和cat类的类别(即dog类),并把保留类别的打标改成了⼩写:
test.jpg C:\\Users\\yasin\\Desktop\est Unknown
400 300 3
0 people Unspecified 0
0 80 69 144
cat
Unspecified 0 0 40 69 143 16
以上就是本⽂的全部内容,希望对⼤家的学习有所帮助,也希望⼤家多多⽀持。