Crate ocl_convolution

source ·
Expand description

OpenCL-accelerated 2D convolutions.

Convolution is a fundamental building block in signal processing. This crate is focused on 2D convolutions (i.e., the signal is a still image) in the context of deep learning (more precisely, convolutional neural networks). The second requirement means that the convolution filter may contain many (order of hundreds) filters; and the input may contain many channels (order of hundreds or thousands), rather than traditional 3 or 4. Computing such convolutions is computationally heavy and can be effectively accelerated with the help of OpenCL.

Features

The crate implements convolutions on two numerical formats:

  • Single-precision floats (f32)
  • Signed 8-bit integers with 32-bit multiply-add accumulator (this format is frequently denoted int8/32 in deep learning literature). Quantization parameters are applied uniformly to the entire layer.

For both cases, dilated or grouped convolutions are supported.

Implementation details

The implementation uses output-stationary workflow (see, e.g., this paper for the definition); that is, each element of the output tensor is computed in a single run of the OpenCL kernel. This minimizes memory overhead, but may not be the fastest algorithm.

Examples

Floating-point convolution

use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, FeatureMap, Params};

let convolution = Convolution::f32(3)?.build(Params {
    strides: [1, 1],
    pads: [0; 4],
    dilation: [1, 1],
    groups: 1,
})?;

// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-1.0..=1.0));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-1.0..=1.0));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter). The output layout will
// be the same as in the signal.
let output = convolution.compute(
    // `FeatureMap` wraps `ArrayView4` with information about
    // memory layout (which is "channels-last" / NHWC in this case).
    FeatureMap::nhwc(&signal),
    &filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);

// For increased efficiency, we may pin filter memory.
// This is especially useful when the same filters are convolved
// with multiple signals.
let convolution = convolution.with_filters(&filters)?;
let new_output = convolution.compute(FeatureMap::nhwc(&signal))?;
assert_eq!(output, new_output);

Quantized convolution

use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, I8Params, FeatureMap, Params};

const BIT_SHIFT: u8 = 16;
let params = I8Params {
    common: Params::default(),
    // These params are found by profiling; here, they are
    // chosen randomly.
    bit_shift: BIT_SHIFT,
    scale: I8Params::convert_scale(BIT_SHIFT, 0.1),
    output_bias: -10,
    signal_bias: 20,
    filter_bias: -5,
};
let convolution = Convolution::i8(3)?.build(params)?;

// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-127..=127));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-127..=127));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter).
let output = convolution.compute(
    FeatureMap::nhwc(&signal),
    &filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);

Structs

  • Convolution without pinned memory.
  • Convolution builder. The same builder can be used to create multiple Convolutions which share the same spatial size.
  • Feature map, i.e., a signal or output of the convolution operation.
  • Shape of a FeatureMap.
  • Convolution with pinned filters memory. Pinning memory increases efficiency at the cost of making the convolution less flexible.
  • Params for the quantized convolution.
  • General convolution parameters.
  • Convolution with pinned memory for filters, signal and output. Pinning memory increases efficiency at the cost of making the convolution less flexible.

Enums

Traits