Sorry, but + size_t p = i * vf->priv->outw / 4 + j / 2, > + q = i * vf->priv->outw + j; > removing vf->priv-> here will significantly improve the performance, then it will be less than 5% slower for planar YUV than the current filter.