Crate ocl_convolution
source ·Expand description
OpenCL-accelerated 2D convolutions.
Convolution is a fundamental building block in signal processing. This crate is focused on 2D convolutions (i.e., the signal is a still image) in the context of deep learning (more precisely, convolutional neural networks). The second requirement means that the convolution filter may contain many (order of hundreds) filters; and the input may contain many channels (order of hundreds or thousands), rather than traditional 3 or 4. Computing such convolutions is computationally heavy and can be effectively accelerated with the help of OpenCL.
Features
The crate implements convolutions on two numerical formats:
- Single-precision floats (
f32
) - Signed 8-bit integers with 32-bit multiply-add accumulator (this format is frequently denoted
int8/32
in deep learning literature). Quantization parameters are applied uniformly to the entire layer.
For both cases, dilated or grouped convolutions are supported.
Implementation details
The implementation uses output-stationary workflow (see, e.g., this paper for the definition); that is, each element of the output tensor is computed in a single run of the OpenCL kernel. This minimizes memory overhead, but may not be the fastest algorithm.
Examples
Floating-point convolution
use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, FeatureMap, Params};
let convolution = Convolution::f32(3)?.build(Params {
strides: [1, 1],
pads: [0; 4],
dilation: [1, 1],
groups: 1,
})?;
// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-1.0..=1.0));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-1.0..=1.0));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter). The output layout will
// be the same as in the signal.
let output = convolution.compute(
// `FeatureMap` wraps `ArrayView4` with information about
// memory layout (which is "channels-last" / NHWC in this case).
FeatureMap::nhwc(&signal),
&filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);
// For increased efficiency, we may pin filter memory.
// This is especially useful when the same filters are convolved
// with multiple signals.
let convolution = convolution.with_filters(&filters)?;
let new_output = convolution.compute(FeatureMap::nhwc(&signal))?;
assert_eq!(output, new_output);
Quantized convolution
use ndarray::Array4;
use rand::{Rng, thread_rng};
use ocl_convolution::{Convolution, I8Params, FeatureMap, Params};
const BIT_SHIFT: u8 = 16;
let params = I8Params {
common: Params::default(),
// These params are found by profiling; here, they are
// chosen randomly.
bit_shift: BIT_SHIFT,
scale: I8Params::convert_scale(BIT_SHIFT, 0.1),
output_bias: -10,
signal_bias: 20,
filter_bias: -5,
};
let convolution = Convolution::i8(3)?.build(params)?;
// Generate random signal with 6x6 spatial dims and 3 channels.
let mut rng = thread_rng();
let signal = Array4::from_shape_fn([1, 6, 6, 3], |_| rng.gen_range(-127..=127));
// Construct two 3x3 spatial filters.
let filters = Array4::from_shape_fn([2, 3, 3, 3], |_| rng.gen_range(-127..=127));
// Perform the convolution. The output must have 4x4 spatial dims
// and contain 2 channels (1 per each filter).
let output = convolution.compute(
FeatureMap::nhwc(&signal),
&filters,
)?;
assert_eq!(output.shape(), [1, 4, 4, 2]);
Structs
- Convolution without pinned memory.
- Convolution builder. The same builder can be used to create multiple
Convolution
s which share the same spatial size. - Feature map, i.e., a signal or output of the convolution operation.
- Shape of a
FeatureMap
. - Convolution with pinned filters memory. Pinning memory increases efficiency at the cost of making the convolution less flexible.
- Params for the quantized convolution.
- General convolution parameters.
- Convolution with pinned memory for filters, signal and output. Pinning memory increases efficiency at the cost of making the convolution less flexible.
Enums
- Memory layout of a
FeatureMap
.
Traits
- Supported element types for convolutions.