Advertisement

Unsupervised Learning of Depth and Ego-Motion from Video之单目深度估计论文复现

阅读量:

Unsupervised Learning of Depth and Ego-Motion from Video之单目深度估计论文复现

  • 0、论文详解
  • Discussion
  • 1、下载数kitti据集
  • 2、Running the single-view depth demo
  • 3、Preparing training data
  • 4、run the test_kitti_depth.py:
  • 5、train

0、论文详解

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
这一过程进行了从三维坐标到二维坐标的转换,也即投影透视过程(用中心投影法将物体投射到投影面上,从而获得的一种较为接近视觉效果的单面投影图,也就是使我们人眼看到景物近大远小的一种成像方式)。我们还是拿针孔成像来说明(除了成像亮度低外,成像效果和透镜成像是一样的,但是光路更简单)。成像过程如图二所示:针孔面(相机坐标系)在图像平面(图像坐标系)和物点平面(棋盘平面)之间,所成图像为倒立实像。
在这里插入图片描述
但是为了在数学上更方便描述,我们将相机坐标系和图像坐标系位置对调,变成图三所示的布置方式(没有实际的物理意义,只是方便计算):
在这里插入图片描述
此时,假设相机坐标系中有一点M,则在理想图像坐标系下(无畸变)的成像点P的坐标为(可由相似三角形原则得出):
这里写图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
由于定义的像素坐标系原点与图像坐标系原点不重合,假设图像坐标系原点在像素坐标系下的坐标为(u0,v0),每个像素点在图像坐标系x轴、y轴方向的尺寸为:dx、dy,且像点在实际图像坐标系下的坐标为(xc,yc),于是可得到像点在像素坐标系下的坐标为:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
公式2中(xp, yp)与公式5中(xc, yc)相同,都是图像坐标系下的坐标。若暂不考虑透镜畸变,则将式2与式5的转换矩阵相乘即为内参矩阵M:
在这里插入图片描述
之所以称之为内参矩阵可以理解为矩阵内各值只与相机内部参数有关,且不随物体位置变化而变化。
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
1、the scene is static without moving objects
2、there is no occlusion/disocclusion between the target view and the source views
3、Lambertian表面是指在一个固定的照明分布下从所有的视场方向上观测都具有相同亮度的表面,Lambertian表面不吸收任何入射光.Lambertian反射也叫散光反射,不管照明分布如何 Lambertian表面在所有的表面方向上接收并发散所有的入射照明,结果是每一个方向上都能看到相同数量的能量.

文章好久之前读的,有时间会补充这一块。

Discussion

Why is target frame at the center of the sequence ?
issue48
Getting pose vector without the scale factor uncertainty?
issue39

1、下载数kitti据集

For KITTI, first download the dataset using this script provided on the official website.
在这里插入图片描述

复制代码
    sudo bash ./raw_data_downloader.sh
    
    
      
    
    代码解读
在这里插入图片描述

2、Running the single-view depth demo

We provide the demo code for running our single-view depth prediction model. First, download the pre-trained model by running the following

复制代码
    bash ./models/download_depth_model.sh
    
    
      
    
    代码解读

then run the test.py:
test.py

复制代码
    from __future__ import division
    import numpy as np
    import PIL.Image as pil
    import tensorflow as tf
    from SfMLearner import SfMLearner
    from utils import normalize_depth_for_display
    import matplotlib.pyplot as plt
    
    img_height=128
    img_width=416
    ckpt_file = 'models/model-190532'
    fh = open('misc/sample.png', 'rb')
    I = pil.open(fh)
    I = I.resize((img_width, img_height), pil.ANTIALIAS)
    I = np.array(I)
    sfm = SfMLearner()
    sfm.setup_inference(img_height,
                    img_width,
                    mode='depth')
    saver = tf.train.Saver([var for var in tf.model_variables()])
    with tf.Session() as sess:
    saver.restore(sess, ckpt_file)
    pred = sfm.inference(I[None,:,:,:], sess, mode='depth')
    plt.imshow(I)
    plt.show()
    plt.imshow(normalize_depth_for_display(pred['depth'][0, :, :, 0]))
    plt.show()
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读

在这里插入图片描述在这里插入图片描述

3、Preparing training data

In order to train the model using the provided code, the data needs to be formatted in a certain manner.
then run prepare_train_data.py:

prepare_train_data.py

复制代码
    from __future__ import division
    import argparse
    # import scipy.misc
    import numpy as np
    from glob import glob
    from joblib import Parallel, delayed
    import os
    import cv2
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--dataset_dir", type=str, default="/home/ross/PycharmProjects/SfMLearner-master/raw/kitti/dataset/", help="where the dataset is stored")
    parser.add_argument("--dataset_name", type=str, default="kitti_raw_eigen", choices=["kitti_raw_eigen", "kitti_raw_stereo", "kitti_odom", "cityscapes"])
    parser.add_argument("--dump_root", type=str, default="/home/ross/PycharmProjects/SfMLearner-master/raw/kitti/resulting/formatted/data/", help="Where to dump the data")
    parser.add_argument("--seq_length", type=int, default=3, help="Length of each training sequence")
    parser.add_argument("--img_height", type=int, default=128, help="image height")
    parser.add_argument("--img_width", type=int, default=416, help="image width")
    parser.add_argument("--num_threads", type=int, default=4, help="number of threads to use")
    args = parser.parse_args()
    
    def concat_image_seq(seq):
    for i, im in enumerate(seq):
        if i == 0:
            res = im
        else:
            res = np.hstack((res, im))
    return res
    
    def dump_example(n, args):
    if n % 2000 == 0:
        print('Progress %d/%d....' % (n, data_loader.num_train))
    example = data_loader.get_train_example_with_idx(n)
    if example == False:
        return
    image_seq = concat_image_seq(example['image_seq'])
    intrinsics = example['intrinsics']
    fx = intrinsics[0, 0]
    fy = intrinsics[1, 1]
    cx = intrinsics[0, 2]
    cy = intrinsics[1, 2]
    dump_dir = os.path.join(args.dump_root, example['folder_name'])
    # if not os.path.isdir(dump_dir):
    #     os.makedirs(dump_dir, exist_ok=True)
    try: 
        os.makedirs(dump_dir)
    except OSError:
        if not os.path.isdir(dump_dir):
            raise
    dump_img_file = dump_dir + '/%s.jpg' % example['file_name']
    # scipy.misc.imsave(dump_img_file, image_seq.astype(np.uint8))
    cv2.imwrite(dump_img_file, image_seq.astype(np.uint8))
    dump_cam_file = dump_dir + '/%s_cam.txt' % example['file_name']
    with open(dump_cam_file, 'w') as f:
        f.write('%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy))
    
    def main():
    if not os.path.exists(args.dump_root):
        os.makedirs(args.dump_root)
    
    global data_loader
    if args.dataset_name == 'kitti_odom':
        from kitti.kitti_odom_loader import kitti_odom_loader
        data_loader = kitti_odom_loader(args.dataset_dir,
                                        img_height=args.img_height,
                                        img_width=args.img_width,
                                        seq_length=args.seq_length)
    
    if args.dataset_name == 'kitti_raw_eigen':
        from kitti.kitti_raw_loader import kitti_raw_loader
        data_loader = kitti_raw_loader(args.dataset_dir,
                                       split='eigen',
                                       img_height=args.img_height,
                                       img_width=args.img_width,
                                       seq_length=args.seq_length)
    
    if args.dataset_name == 'kitti_raw_stereo':
        from kitti.kitti_raw_loader import kitti_raw_loader
        data_loader = kitti_raw_loader(args.dataset_dir,
                                       split='stereo',
                                       img_height=args.img_height,
                                       img_width=args.img_width,
                                       seq_length=args.seq_length)        
    
    if args.dataset_name == 'cityscapes':
        from cityscapes.cityscapes_loader import cityscapes_loader
        data_loader = cityscapes_loader(args.dataset_dir,
                                        img_height=args.img_height,
                                        img_width=args.img_width,
                                        seq_length=args.seq_length)
    
    Parallel(n_jobs=args.num_threads)(delayed(dump_example)(n, args) for n in range(data_loader.num_train))
    
    # Split into train/val
    np.random.seed(8964)
    subfolders = os.listdir(args.dump_root)
    with open(args.dump_root + 'train.txt', 'w') as tf:
        with open(args.dump_root + 'val.txt', 'w') as vf:
            for s in subfolders:
                if not os.path.isdir(args.dump_root + '/%s' % s):
                    continue
                imfiles = glob(os.path.join(args.dump_root, s, '*.jpg'))
                frame_ids = [os.path.basename(fi).split('.')[0] for fi in imfiles]
                for frame in frame_ids:
                    if np.random.random() < 0.1:
                        vf.write('%s %s\n' % (s, frame))
                    else:
                        tf.write('%s %s\n' % (s, frame))
    
    main()
    
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
在这里插入图片描述

4、run the test_kitti_depth.py:

test_kitti_depth.py

复制代码
    from __future__ import division
    import tensorflow as tf
    import numpy as np
    import os
    # import scipy.misc
    import PIL.Image as pil
    from SfMLearner import SfMLearner
    
    flags = tf.app.flags
    flags.DEFINE_integer("batch_size", 4, "The size of of a sample batch")
    flags.DEFINE_integer("img_height", 128, "Image height")
    flags.DEFINE_integer("img_width", 416, "Image width")
    flags.DEFINE_string("dataset_dir", "/media/ross/DF091746BF0C6DD4/Dataset/kitti/", "Dataset directory")
    flags.DEFINE_string("output_dir", "/media/ross/DF091746BF0C6DD4/Dataset/kitti/formatted/", "Output directory")
    flags.DEFINE_string("ckpt_file", "/home/ross/PycharmProjects/SfMLearner-master/models/model-190532", "checkpoint file")
    FLAGS = flags.FLAGS
    
    def main(_):
    with open('data/kitti/test_files_eigen_copy.txt', 'r') as f:
        test_files = f.readlines()
        test_files = [FLAGS.dataset_dir + t[:-1] for t in test_files]
    if not os.path.exists(FLAGS.output_dir):
        os.makedirs(FLAGS.output_dir)
    basename = os.path.basename(FLAGS.ckpt_file)
    output_file = FLAGS.output_dir + '/' + basename
    sfm = SfMLearner()
    sfm.setup_inference(img_height=FLAGS.img_height,
                        img_width=FLAGS.img_width,
                        batch_size=FLAGS.batch_size,
                        mode='depth')
    saver = tf.train.Saver([var for var in tf.model_variables()]) 
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:
        saver.restore(sess, FLAGS.ckpt_file)
        pred_all = []
        for t in range(0, len(test_files), FLAGS.batch_size):
            if t % 100 == 0:
                print('processing %s: %d/%d' % (basename, t, len(test_files)))
            inputs = np.zeros(
                (FLAGS.batch_size, FLAGS.img_height, FLAGS.img_width, 3), 
                dtype=np.uint8)
            for b in range(FLAGS.batch_size):
                idx = t + b
                if idx >= len(test_files):
                    break
                fh = open(test_files[idx], 'rb')
                raw_im = pil.open(fh)
                scaled_im = raw_im.resize((FLAGS.img_width, FLAGS.img_height), pil.ANTIALIAS)
                inputs[b] = np.array(scaled_im)
                # im = scipy.misc.imread(test_files[idx])
                # inputs[b] = scipy.misc.imresize(im, (FLAGS.img_height, FLAGS.img_width))
            pred = sfm.inference(inputs, sess, mode='depth')
            for b in range(FLAGS.batch_size):
                idx = t + b
                if idx >= len(test_files):
                    break
                pred_all.append(pred['depth'][b,:,:,0])
        np.save(output_file, pred_all)
    
    
    if __name__ == '__main__':
    tf.app.run()
    
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
在这里插入图片描述

5、train

复制代码
    from __future__ import division
    import tensorflow as tf
    import pprint
    import random
    import numpy as np
    from SfMLearner import SfMLearner
    import os
    
    flags = tf.app.flags
    flags.DEFINE_string("dataset_dir", "./data/kitti/processed/formatted", "Dataset directory")
    flags.DEFINE_string("checkpoint_dir", "./checkpoints/", "Directory name to save the checkpoints")
    flags.DEFINE_string("init_checkpoint_file", None, "Specific checkpoint file to initialize from")
    flags.DEFINE_float("learning_rate", 0.0002, "Learning rate of for adam")
    flags.DEFINE_float("beta1", 0.9, "Momentum term of adam")
    flags.DEFINE_float("smooth_weight", 0.5, "Weight for smoothness")
    flags.DEFINE_float("explain_reg_weight", 0.0, "Weight for explanability regularization")
    flags.DEFINE_integer("batch_size", 4, "The size of of a sample batch")
    flags.DEFINE_integer("img_height", 128, "Image height")
    flags.DEFINE_integer("img_width", 416, "Image width")
    flags.DEFINE_integer("seq_length", 3, "Sequence length for each example")
    flags.DEFINE_integer("max_steps", 200000, "Maximum number of training iterations")
    flags.DEFINE_integer("summary_freq", 100, "Logging every log_freq iterations")
    flags.DEFINE_integer("save_latest_freq", 5000, \
    "Save the latest model every save_latest_freq iterations (overwrites the previous latest model)")
    flags.DEFINE_boolean("continue_train", False, "Continue training from previous checkpoint")
    flags.DEFINE_integer("num_source", 2, "num_source")
    flags.DEFINE_integer("num_scales", 4, "num_scales")
    FLAGS = flags.FLAGS
    
    def main(_):
    seed = 8964
    tf.set_random_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    
    pp = pprint.PrettyPrinter()
    pp.pprint(flags.FLAGS.__flags)
    
    if not os.path.exists(FLAGS.checkpoint_dir):
        os.makedirs(FLAGS.checkpoint_dir)
        
    sfm = SfMLearner()
    sfm.train(FLAGS)
    
    if __name__ == '__main__':
    tf.app.run()
    
    
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
    代码解读
在这里插入图片描述

全部评论 (0)

还没有任何评论哟~