mindvideo.utils¶

EvalLossMonitor¶

class mindvideo.utils.EvalLossMonitor(model)

Monitor for loss in validation.

base: Callback

Parameters:

model(str): The model to monitor.

Return:

None

def mindvideo.utils.EvalLossMonitor.epoch_begin(run_context)

Record time at the beginning of epoch.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.EvalLossMonitor.epoch_end(run_context)

Print training info at the end of epoch.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.EvalLossMonitor.step_begin(run_context)

Record time at the beginning of step.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.EvalLossMonitor.step_end(run_context)

Print training info at the end of step.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

ValAccMonitor¶

class mindvideo.utils.ValAccMonitor(model: ms.Model, dataset_val: ms.dataset, num_epochs: int, interval: int = 1, eval_start_epoch: int = 1, save_best_ckpt: bool = True, ckpt_directory: str = “./”, best_ckpt_name: str = “best.ckpt”, metric_name: str = “Accuracy”, dataset_sink_mode: bool = True)

Monitors the train loss and the validation accuracy, after each epoch saves the best checkpoint file with highest validation accuracy.

base: Callback

Parameters:

model (ms.Model): The model to monitor.
dataset_val (ms.dataset): The dataset that the model needs.
num_epochs (int): The number of epochs.
interval (int): Every how many epochs to validate and print information. Default: 1.
eval_start_epoch (int): From which time to validate. Default: 1.
save_best_ckpt (bool): Whether to save the checkpoint file which performs best. Default: True.
ckpt_directory (str): The path to save checkpoint files. Default: ‘./’.
best_ckpt_name (str): The file name of the checkpoint file which performs best. Default: ‘best.ckpt’.
metric_name (str): The name of metric for model evaluation. Default: ‘Accuracy’.
dataset_sink_mode (bool): Whether to use the dataset sinking mode. Default: True.

Raises:

ValueError: If interval is not more than 1.

Return:

None

def mindvideo.utils.ValAccMonitor.apply_eval()

Model evaluation, return validation accuracy.

Parameters:

None

Return:

Validation accuracy.

def mindvideo.utils.ValAccMonitor.epoch_end(run_context)

After epoch, print train loss and val accuracy, save the best ckpt file with highest validation accuracy.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.ValAccMonitor.end(run_context)

Print the best validation accuracy after network training.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

SaveCallback¶

class mindvideo.utils.SaveCallback(eval_model, ds_eval)

Callback for checkpoint saving.

base: Callback

Parameters:

model (ms.Model): The model to monitor.
dataset_val (ms.dataset): The dataset that the model needs.
num_epochs (int): The number of epochs.
interval (int): Every how many epochs to validate and print information. Default: 1.
eval_start_epoch (int): From which time to validate. Default: 1.
save_best_ckpt (bool): Whether to save the checkpoint file which performs best. Default: True.
ckpt_directory (str): The path to save checkpoint files. Default: ‘./’.
best_ckpt_name (str): The file name of the checkpoint file which performs best. Default: ‘best.ckpt’.
metric_name (str): The name of metric for model evaluation. Default: ‘Accuracy’.
dataset_sink_mode (bool): Whether to use the dataset sinking mode. Default: True.

Return:

None

def mindvideo.utils.SaveCallback.step_end(run_context)

At the end of each step, save the maximum accuracy checkpoint.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

LossMonitor¶

class mindvideo.utils.LossMonitor(lr_init: Optional[Union[float, Iterable]] = None, per_print_times: int = 1)

Loss Monitor for classification.

base: Callback

Parameters:

lr_init (Union[float, Iterable], optional): The learning rate schedule. Default: None.
per_print_times (int): Every how many steps to print the log information. Default: 1.

Return:

None

def mindvideo.utils.LossMonitor.epoch_begin(run_context)

Record time at the beginning of epoch.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.LossMonitor.epoch_end(run_context)

Print training info at the end of epoch.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.LossMonitor.step_begin(run_context)

Record time at the beginning of step.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

def mindvideo.utils.LossMonitor.step_end(run_context)

Print training info at the end of step.

Parameters:

run_context (RunContext): Context of the process running.

Return:

None

ClassFactory¶

class mindvideo.utils.ClassFactory()

Module class factory for builder.

Parameters:

None

Return:

None

def mindvideo.utils.ClassFactory.register(cls, module_type=ModuleType.GENERAL, alias=None)

Register class into registry.

Parameters:

module_type (ModuleType): Module type name, default: ModuleType.GENERAL.
alias (str) : class alias, default: None.

Returns:

Wrapper.

def mindvideo.utils.ClassFactory.wrapper(register_class)

Register class with wrapper function.

Parameters:

register_class: Class which need to be register.

Returns:

Wrapper of register_class.

def mindvideo.utils.ClassFactory.register_cls(cls, register_class, module_type=ModuleType.GENERAL, alias=None)

Register class with type name into registry.

Parameters:

register_class: Class which need to be register.
module_type(ModuleType): Module type name, default: ModuleType.GENERAL.
alias(String): class name.

Returns:

register_class.

def mindvideo.utils.ClassFactory.is_exist(cls, module_type, class_name=None)

Determine whether class name is in the current type group.

Parameters:

module_type(ModuleType): Module type.
class_name(string): Class name.

Returns:

Bool.

def mindvideo.utils.ClassFactory.get_cls(cls, module_type, class_name=None)

Get class.

Parameters:

module_type(ModuleType): Module type.
class_name(String): class name.

Returns:

register_class.

def mindvideo.utils.ClassFactory.get_instance_from_cfg(cls, cfg, module_type=ModuleType.GENERAL, default_args=None)

Get instance from configure.

Parameters:

cfg(dict): Config dict which should at least contain the key “type”.
module_type(ModuleType): module type.
default_args(dict, optional) : Default initialization arguments.

Returns:

obj: The constructed object.

def mindvideo.utils.ClassFactory.get_instance(cls, module_type=ModuleType.GENERAL, obj_type=None, args=None)

Get instance by ModuleType with object type.

Parameters:

module_type(ModuleType): Module type. Default: ModuleType.GENERAL.
obj_type(String): Class type.
args(dict): Object arguments.

Returns:

obj: The constructed object.

recur_list2tuple¶

def mindvideo.utils.recur_list2tuple(d)

Transform list data in dict into tuple recursively.

Parameters:

d(list).

Returns:

Tuple.

Config¶

class mindvideo.utils.Config(*args, **kwargs)

A Config class is inherit from dict. Config class can parse arguments from a config file of yaml or a dict.

base: dict

Parameters:

args (list) : config file_names
kwargs (dict) : config dictionary list

Returns:

None

def mindvideo.utils.Config.__getattr__(key)

Get a object attr by key.

Parameters:

key(str): the name of object attr.

Returns:

Attr of object that name is key.

def mindvideo.utils.Config.__setattr__(key, value)

Set a object value key with value.

Parameters:

key(str): The name of object attr.
value: the value need to set to the target object attr.

Returns:

None

def mindvideo.utils.Config.__delattr__(key)

Delete a object attr by its key.

Parameters:

key(str): The name of object attr.

Returns:

None

def mindvideo.utils.Config.merge_from_dict(options)

Merge options into config file.

Parameters:

options(dict): dict of configs to merge from.

Returns:

None

def mindvideo.utils.Config._merge_into(a, b)

Merge dict a into dict b, values in a will overwrite b.

Parameters:

a(dict): The source dict to be merged into b.
b(dict): The origin dict to be fetch keys from a.

Returns:

dict: The modified dict of b using a.

def mindvideo.utils.Config._file2dict(file_name=None)

Convert config file to dictionary.

Parameters:

file_name(str): Config file.

Returns:

dict

def mindvideo.utils.Config._dict2config(config, dic)

Convert dictionary to config.

Parameters:

config: Config object.
dic(dict): dictionary.

Returns:

None

ActionDict¶

class mindvideo.utils.ActionDict()

Argparse action to split an option into KEY=VALUE from on the first = and append to dictionary. List options can be passed as comma separated values.

i.e. ‘KEY=Val1,Val2,Val3’ or with explicit brackets ‘KEY=[Val1,Val2,Val3]’.

base: Action

Parameters:

None

Returns:

None

def mindvideo.utils.ActionDict._parse_int_float_bool(val)

Convert string val to int or float or bool or do nothing.

Parameters:

val (str) : Value String

Returns:

Int or float or bool or str.

def mindvideo.utils.ActionDict.find_next_comma(val_str)

Find the position of next comma in the string.

Note:

‘(’ and ‘)’ or ‘[’ and’]’ must appear in pairs or not exist.

Parameters:

val (str) : Value String

Returns:

Int.

def mindvideo.utils.ActionDict._parse_value_iter(val)

Convert string format as list or tuple to python list object or tuple object.

Parameters:

val (str) : Value String

Returns:

List or tuple.

Examples:

>>> ActionDict._parse_value_iter('1,2,3')
[1,2,3]
>>> ActionDict._parse_value_iter('[1,2,3]')
[1,2,3]
>>> ActionDict._parse_value_iter('(1,2,3)')
(1,2,3)
>>> ActionDict._parse_value_iter('[1,[1,2],(1,2,3)')
[1, [1, 2], (1, 2, 3)]

parse_args¶

def mindvideo.utils.parse_args()

Parse arguments from yaml config file.

Parameters:

None

Returns:

object: arg parse object.

gaussian_radius¶

def mindvideo.utils.gaussian_radius(det_size, min_overlap=0.7)

Set label value of gt bbox within gaussian radius. Details of why using gaussian_radius can be found in paper: https://arxiv.org/abs/1808.01244.

Parameters:

det_size (tuple[int]): Size of ground truth bounding box.
min_overlap (float): Threshold of iou which is calculated by gt bbox and bbox that is within radius. Default: 0.7.

Returns:

Minimum radius that meet the overlap condition.

gaussian2d¶

def mindvideo.utils.gaussian2d(shape, sigma=1)

Gaussian2d heatmap.

Parameters:

shape (tuple[int]): x, y radius of gaussian dustribution.
sigma (int, float): Standard deviation of gaussian dustribution. Default: 1.

Returns:

Gaussian heatmap mask.

draw_umich_gaussian¶

def mindvideo.utils.draw_umich_gaussian(heatmap, center, radius, k=1)

Draw umich gaussian, apply gaussian distribution to heatmap.

Parameters:

heatmap (numpy.ndarray): Heatmap.
center (sequence[int]): Center of gaussian mask.
radius (int, float): Radius of gaussian mask.
k (int, float): Multiplier for gaussian mask values.

Returns:

Heatmap.

draw_msra_gaussian¶

def mindvideo.utils.draw_msra_gaussian(heatmap, center, sigma)

Draw msra gaussian, apply gaussian distribution to heatmap.

Parameters:

heatmap (numpy.ndarray): Heatmap.
center (sequence[int]): Center of gaussian mask.
sigma (int, float): Standard deviation of gaussian dustribution.

Returns:

Heatmap.

compute_mask¶

def mindvideo.utils.compute_mask(depth, height, width, window_size, shift_size)

Calculate attention mask for SW-MSA.

Parameters:

depth, height, width (int): Numbers of depth, height, width dimensions.
window_size (Tuple(int)): Input window size.
shift_size (Tuple(int)): Input shift_size.

Returns:

Tensor, attention mask.

get_mask¶

def mindvideo.utils.get_mask(tensor)

Get img masks.

Parameters:

Tensor.

Returns:

Tensor.

_max_by_axis¶

def mindvideo.utils._max_by_axis(the_list)

Parameters:

List[List[int]].

Returns:

List[int].

nested_tensor_from_tensor_list¶

def mindvideo.utils.nested_tensor_from_tensor_list(tensor_list, split=True)

Normalize the input image data.

Parameters:

tensor_list (Tensor)
split (bool)

Returns:

Two tensors.

cal_for_frames¶

def mindvideo.utils.cal_for_frames(video_path)

Calculate optical flow using a list of frames.

Parameters:

video_path (string): Path to video.

Returns:

List.

cal_for_video¶

def mindvideo.utils.cal_for_frames(video_path)

Calculate optical flow of a video.

Parameters:

video_path (string): Path to video.

Returns:

List.

compute_tvl1¶

def mindvideo.utils.compute_tvl1(prev, curr, bound=20)

Compute the TV-L1 optical flow.

Parameters:

prev: previous frame.
curr: current frame.

Returns:

array

save_flow¶

def mindvideo.utils.save_flow(video_flows, flow_path, save_format=’jpg’)

Save video flows in specified format.

Parameters:

video_flows (obj): object of video flow
flow_path (str): The path where saves the optical flow.
save_format (str): Optical flow save format, can be ‘npy’ or ‘jpg’. Default: ‘jpg’.

Returns:

None

extract_flow¶

def mindvideo.utils.extract_flow(video_path, flow_path, save_format=’jpg’)

Extract flow from video frames.

Parameters:

video_path (str): The path of video. If video_path is a file directory, the function will extract optical flow from jpeg images in the directory. Else if video_path is a video, then extract optical flow frame by frame.
flow_path (str): The path where saves the optical flow.
save_format (str): Optical flow save format, can be ‘npy’ or ‘jpg’. Default: ‘jpg’.

Returns:

None

Example:

>>> vpath = "./path_to_video"
>>> save_path = "./path_to_saved_flow"
>>> extract_flow(vpath, save_path)

round_width¶

def mindvideo.utils.round_width(width, multiplier, min_width=8, divisor=8)

Round width of filters based on width multiplier.

Parameters:

width (int): the channel dimensions of the input.
multiplier (float): the multiplication factor.
min_width (int): the minimum width after multiplication.
divisor (int): the new width should be dividable by divisor.

Returns:

Round width of filters: Int

drop_path¶

def mindvideo.utils.drop_path(x: Tensor, drop_prob: float = 0.0, training: bool = False)

Stochastic Depth per sample.

Parameters:

x (Tensor): Input feature.
drop_prob(float): The probabilit of dropping.
training(bool): Determine whether the model is under training.

Returns:

Tensor

reisze_mean¶

def mindvideo.utils.reisze_mean(data_dir, save_dir=None, height=240, width=320, interpolation=’bilinear’, norm=True)

Calculate mean of resized video frames.

Parameters:

data_dir (str): The directory of videos, the file structure should be like this:

|-- data_dir
    |-- class1
        |-- video1-1
        |-- video1-2
        ...
    |-- class2
        |-- video2-1
        |-- video2-2

save_dir (Union[str, None]): The directory where saves the resized mean. If None, this function will not save it to disk.
height (int): Height of resized video frames.
width (int): Width of reiszed video frames.
interpolation (str): Method of resize the frames, it can be ‘bilinear’, ‘nearest’, ‘linear’, ‘bicubic’. Default: ‘bilinear’.
norm (bool): Whether to normalize resized frames, if True, the resize mean will divided by 255.

Returns:

resized mean (numpy.ndarray): Resized mean of video frames in shape of (height, width, 3).

Example:

>>> vmean = reisze_mean(data_dir="/home/publicfile/UCF101/train",
>>>                     save_dir="./",
>>>                     height=128,
>>>                     width=128)
>>> print(vmean.shape)

six_padding¶

def mindvideo.utils.six_padding(padding)

Convert padding list into a tuple of 6 integer. If padding is an int, returns (padding, padding, padding, padding, padding, padding), If padding’s length is 3, returns (padding[0], padding[0], padding[1], padding[1], padding[2], padding[2]), If padding’s length is 6, returns (padding[0], padding[1], padding[2], padding[3], padding[4], padding[5]),

Parameters:

padding(Union[int, tuple, list]): Padding list that has the length of 1, 3 or 6.

Returns:

Tuple of shape (6,).

TaskAccuracy¶

class mindvideo.utils.TaskAccuracy(label_format=’one_hot’)

Calculates the accuracy for classification and multilabel data. The accuracy class has two local variables, the correct number and the total number of samples, that are used to compute the frequency with which y_pred matches y. This frequency is ultimately returned as the accuracy: an idempotent operation that simply divides the correct number by the total number.

Parameters:

eval_type (str): The metric to calculate the accuracy over a dataset. Supports ‘classification’ and ‘multilabel’. ‘classification’ means the dataset label is single. ‘multilabel’ means the dataset has multiple labels. Default: ‘classification’.
label_format (str): The format of output label.

Return:

None

Examples:

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, Tensor
>>>
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> y = Tensor(np.array([[1, 0], [1, 0], [0, 1]]), mindspore.float32)
>>> metric = nn.Accuracy('one_hot')
>>> metric.clear()
>>> metric.update(x, y)
>>> accuracy = metric.eval()

def mindvideo.utils.TaskAccuracy.update(*inputs)

Updates the local variables. For ‘classification’, if the index of the maximum of the predict value matches the label, the predict result is correct. For ‘multilabel’, the predict value match the label, the predict result is correct.

Parameters:

inputs: Logits and labels. y_pred stands for logits, y stands for labels. y_pred and y must be a Tensor, a list or an array. For the ‘one_hot’ evaluation type, y_pred is a list of floating numbers in range :math:[0, 1] and the shape is :math:(1, N, C) in most cases (not strictly), where :math:N is the number of cases and :math:C is the number of categories. y must be in one-hot format that shape is :math:(1, N, C), or can be transformed to one-hot format that shape is :math:(N,). For ‘single’ evaluation type, y is not one-hot format :match:’(1, N)`.

Raises:

ValueError: If the number of the inputs is not 2.

limit_window_size¶

def mindvideo.utils.limit_window_size(input_size, window_size, shift_size)

Limit the window size and shift size for window W-MSA and SW-MSA. If window size is larger than input size, we don’t partition or shift windows.

Parameters:

input_size (tuple[int]): Input size of features. E.g. (16, 56, 56).
window_size (tuple[int]): Target window size. E.g. (8, 7, 7).
shift_size (int): depth of video. E.g. (4, 3, 3).

Returns:

Tuple[int], limited window size and shift size.

window_partition¶

def mindvideo.utils.window_partition(features, window_size)

Window partition function for Swin Transformer.

Parameters:

features: Original features of shape (B, D, H, W, C).
window_size (tuple[int]): Window size.

Returns:

Tensor of shape (B * num_windows, window_size * window_size, C).

window_reverse¶

def mindvideo.utils.window_reverse(windows, window_size, batch_size, depth, height, width)

Window reverse function for Swin Transformer.

Parameters:

windows: Partitioned features of shape (B*num_windows, window_size, window_size, C).
window_size (tuple[int]): Window size.
batch_size (int): Batch size of video.
depth (int): depth of video.
height (int): Height of video.
width (int): Width of video.

Returns:

Tensor of shape (B, D, H, W, C).