YOLO Main Functions:

class yolo_help.GroundTruthDataset(N=256, M=64, nclasses=3, reproducible=False)

Bases: Dataset

This dataset will generate a set of random images of size N x N. And they will contain poisson distributed cells with mean M.

Parameters for Initialization:

Nint

The number of images to generate for this simulated dataset; Default - 256

Mint

The mean number of cells appended to each image, sampled from a Poisson distribution; Default - 350

nclassesint

The number of classes to categorize this collection of simulated cells; Default = 3

reproduciblebool

If True, set np.random.seed() to the integer index supplied

Returns:

out_arr3 x N array
  • out_arr[0] is a 1 x N x N array defining the base image

  • out_arr[1] is a X x 4 array containing the bbox info for X cells

  • out_arr[2] is a X x 1 array containing the categorical label for the cell within the corresponding bbox; 0 - smooth boundary, 1 - sharp boudnary; 2 - bumpy boundary

class yolo_help.Net(nclasses=3)

Bases: Module

A neural network using the YOLO framework, with a batch size of 1 and an input layer capable of accepting inputs of variable shape.

Parameters:

nclassesint

Default - 3; The number of distinct classes in the dataset

Returns:

Nettorch.nn.Module

A neural network which can take input images with any number of channels

forward(x)

Forward method for the yolo network.

Inputs

xtorch.Tensor

Object is of size 1 x CH x ROW x COL. # of channels, rows, and columns are arbitrary. This differs from the original yolo paper, which requires a fixed number of channels, rows, and columns.

Outputs

xtorch.Tensor

Object is of size 5 * bbox_per_cell (2) + num_classes (3) x ROW x COL, where ROW and COL are equal to the input image size divided by the network’s stride (8). The five numbers per cell are [cx, cy, scalex, scaley, confidence].

class yolo_help.VariableInputConv2d(M)

Bases: Module

Note the assumption here is that we have batch size one, so I can work with the batch dimension. This version adds a few extra convolutions.

We only process 1 image at a time, so we can use the batch dimension.

Step 1: move channel dimension to batch dimension. Now we have N samples, with one channel each.

Step 2: apply some convolutions and relus, to end up with an N x M array of images. Where M is fixed,

and N is variable.

Step 3: Take a softmax of the result over the N channels. Now matrix multiply, and N x M array, with a N x 1 array,

to get an M x 1 array (where M is fixed).

Step 4: Return the result which is a fixed number of channels.

What’s nice is it is permutation invariant. So if we input an RGB image, or a BGR image, the result would be exactly the same. We don’t need to have the channels in any specific order.

WORKING TODO Change so it supports a batch dimension This requires every image in a batch to have the same number of channels, but I think its okay

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

yolo_help.bbox_to_rectangles(bbox, **kwargs)

This function converts a set of bounding boxes of the form [cx,cy,w,h] to a set of rectangular objects for visualization purposes

Parameters:

bboxA N x 4 array[numpy.float64]

Contains the bbox info N bboxes of the form [xc,cy,w,h]

Returns:

outmatplotlib.collections.PolyCollection

Contains N rectangles to be plotted later

yolo_help.convert_data(out, B, stride)

Convert the outputs from the YOLO neural network into bounding boxes for performance quantification. Note that this operation is not differentiable. It also outputs the raw data, reformmated into a list intsead of an image of grid cells. These outputs are differentiable. The last one of these outputs is the score (data[…,-1]). Note that class probabilities are NOT output by this function.

Parameters:

outtorch.Tensor

The output object from the primary YOLO neural network

Bint

The number of bounding boxes per block that the model will output. (Automatically defined when initializing the network)

strideint

The number of pixels to pass over when convolving the input image in each layer (Automatically defined when initializing the network)

Returns:

bboxestorch.Tensor of size N x 4

N is the number of bounding boxes output from the network; Bounding boxes are of the form [cx,cy,scalex,scaley]. There are no gradient computations done here. These bounding boxes are for visualization or downstream analysis.

datatorch.Tensor of size N x 5

N is the number of bounding boxes output from the network; Bounding boxes are of the form [cx,cy,w,h,conf]. There are gradient computations done here. These are for our loss function and, potentially, other downstream analysis.

yolo_help.get_assignment_inds(bboxes, bbox, shape, stride, B)

Assigns each training bounding box to a specific cell, and picks the bounding box from that cell with the best iou.

Parameters:

bboxestorch.Tensor of size [N, 4]

N is the number of bounding boxes computed for a single image and is equal to (np.shape(I)[0] / stride) * (np.shape(I)[0] / stride) * net.B

bboxtorch.Tensor of size [X, 4]

X is the true number of bboxes associated with image I

shape4-dimensional torch.Tensor

The shape of the data directly output from the YOLO neural network

strideint

The number of pixels to pass over when convolving the input image in each layer (Automatically defined when initializing the network)

Bint

The number of bboxes generated by the YOLO neural network at each block

Returns:

assignment_indsnumpy array of size X

The index of the bounding box from the ‘bboxes’ parameter cooresponding to the ith bounding box from the ‘bbox’ parameter

iousnumpy array of size X

The iou between the ith bounding box from the ‘bbox’ parameter, and the corresponding bbox from the ‘bboxes’ parameter

yolo_help.get_best_bounding_box_per_cell(bboxes, scores, B)

Get the best bounding box for each cell output from the YOLO neural network.

Parameters:

bboxestorch.Tensor of size [N, 4]

N is the number of bounding boxes computed for a single image and is equal to (np.shape(I)[0] / stride) * (np.shape(I)[0] / stride) * net.B

scorestorch.Tensor of size [N,1]

The confidence for each corresponding box in the ‘bboxes’ parameter

Bint

Default - 2; The number of bboxes generated by the YOLO neural network at each cell

Returns:

bboxes_outtorch.Tensor of size [N/B, 4]

A list of the best bounding boxes for each cell

scores_outtorch.Tensor of size [N/B, 1]

A list of the confidence for the corresponding bounding boxes in ‘bboxes_out’

yolo_help.get_reg_targets(assignment_inds, bbox, B, shape, stride)

What are the true bounding box parameters we want to predict.

Parameters:

assignment_indsnumpy array of size X

The idx at position i corresponds to the bounding box output from the YOLO framework which most accurately encompasses bounding box ‘i’ in the ‘bbox’ (ground truth labels) parameter

bboxtorch.Tensor of size [X, 4]

X is the true number of bboxes associated with image I

Bint

The number of bboxes generated by the YOLO neural network at each block

shape4-dimensional torch.Tensor

The shape of the data directly output from the YOLO neural network

strideint

The number of pixels to pass over when convolving the input image in each layer (Automatically defined when initializing the network)

Returns:

shiftxnumpy array of shape [X,]

Defines the shift in x needed to align bbox[i] with the corresponding best estimated bounding box

shiftynumpy array of shape [X,]

Defines the shift in y needed to align bbox[i] with the corresponding best estimated bounding box

scalexnumpy array of shape [X,]

Defines the scale in x needed to align bbox[i] with the corresponding best estimated bounding box

scaleynumpy array of shape [X,]

Defines the scale in y needed to align bbox[i] with the corresponding best estimated bounding box

yolo_help.imshow(I, ax, **kwargs)

This function will normalize the image I and plot it on the axes object ax

Parameters:

I1 x N x N array[numpy.float64]

An image

axmatplotlib.axes._axes.Axe object

An empty axis onto which I will be plotted

Returns:

None

yolo_help.iou(bbox0, bbox1, nopairwise=False)

Calculate pairwise iou between a set of estimated bounding boxes (bbox0) and the set of ground truth bounding boxes (bbox1).

Parameters:

bbox0torch.Tensor of size [N, 4]

N is the number of bounding boxes

bbox1torch.Tensor of size [N, 4]

N is the number of bounding boxes

nopairwisebool

If True, compute pointwise, not pairwise IOU

Returns:

outArray of length N

An array containing the pairwise (or pointwise) IOU values between the elements bbox0 and bbox 1

yolo_help.train_yolo_model(nepochs, lr, cls_loss, outdir, modelname, optimizername, lossname, verbose=False, resume=False, J_path=None)

Train a neural network defined by the YOLO framework and other provided hyperparameters using the simulated dataset ‘groundtruth’.

Parameters:

nepochsint

The number of epochs used to train the model

lrfloat

The learning rate used to train the model

outdirstr

The output directory where all files will be saved during training, or where all files were saved if the model has already been trained.

modelnamestr

The file name of the model used during training

optimizernamestr

The file name of the optimizer used during training

lossnamestr

The file name of the losses computed during training

resumebool

Default - False; If True, resume the training of model ‘outdir/modelname’ or load the pretrained model saved at ‘outdir/modelname’

J_pathstr

Default - None; The file path to the image to be used during the validation portion of training

Returns:

nettorch.nn.Module

A neural network which has been trained on a simulated dataset