Advertisement

图像语义分割python_Pytorch实现用于图像语义分割:U-Net

阅读量:

UNet: semantic segmentation with PyTorch

A custom-built implementation of the U-Net model in PyTorch for Kaggle’s Carvana Image Masking Challenge, derived from high-resolution images.

This model was developed without any prior training using a dataset of 5,000 images (and no data augmentation techniques applied). It achieved a Dice score of 0.988423, which corresponds to correctly segmenting 511 pixels out of a total of 735 pixels across over 100k test images. While this performance is impressive, it could potentially be enhanced through additional training sessions, by applying more sophisticated data augmentation techniques, fine-tuning the model further, tuning the CRF parameters for better post-processing results, and assigning greater emphasis to edge pixel accuracy.

The Carvana data is available on the Kaggle website.

Usage

Note : Use Python 3

Prediction

You can easily test the output masks on your images via the CLI.

To predict a single image and save it:

python predict.py -i image.jpg -o output.jpg

To predict a multiple images and show them without saving them:

python predict.py -i image1.jpg image2.jpg --viz --no-save

python predict.py -h

usage: predict.py [-h] [--model FILE] --input INPUT [INPUT ...]

[--output INPUT [INPUT ...]] [--viz] [--no-save]

[--mask-threshold MASK_THRESHOLD] [--scale SCALE]

Predict masks from input images

optional arguments:

-h, --help show this help message and exit

--model FILE, -m FILE

Specify the file in which the model is stored

(default: MODEL.pth)

--input INPUT [INPUT ...], -i INPUT [INPUT ...]

filenames of input images (default: None)

--output INPUT [INPUT ...], -o INPUT [INPUT ...]

Filenames of ouput images (default: None)

--viz, -v Visualize the images as they are processed (default:

False)

--no-save, -n Do not save the output masks (default: False)

--mask-threshold MASK_THRESHOLD, -t MASK_THRESHOLD

Minimum probability value to consider a mask pixel

white (default: 0.5)

--scale SCALE, -s SCALE

Scale factor for the input images (default: 0.5)

You can specify which model file to use with --model MODEL.pth.

Training

python train.py -h

usage: train.py [-h] [-e E] [-b [B]] [-l [LR]] [-f LOAD] [-s SCALE] [-v VAL]

Train the UNet on images and target masks

optional arguments:

-h, --help show this help message and exit

-e E, --epochs E Number of epochs (default: 5)

-b [B], --batch-size [B]

Batch size (default: 1)

-l [LR], --learning-rate [LR]

Learning rate (default: 0.1)

-f LOAD, --load LOAD Load model from a .pth file (default: False)

-s SCALE, --scale SCALE

Downscaling factor of the images (default: 0.5)

-v VAL, --validation VAL

Percent of the data that is used as validation (0-100)

(default: 15.0)

By default settings, the scaling factor is set to 0.5. If you aim for improved performance (but may require more memory resources), consider setting it to a value of 1.

The input images and target masks must be placed within the data/imgs and data/masks directories, respectively.

Tensorboard

Using TensorBoard, you can enable real-time visualization of both training and testing losses alongside model predictions.

tensorboard --logdir=runs

Notes on memory

A model has been trained from the beginning on a GTX970M with 3GB VRAM. Generating images of size 1918x1280 requires about 1.5GB of memory. Training demands roughly around 3GB, so if you're slightly short on VRAM, disable all graphical output to proceed efficiently. This setup assumes the use of bilinear interpolation instead of transposed convolutions.

Original work led by Olaf Ronneberger, Philipp Fischer, and Thomas Brox was published on arXiv with the identifier 1505.04597.

全部评论 (0)

还没有任何评论哟~