[FFmpeg-devel] [Ffmpeg-devel] PATCH: Blackfin Accelerated CSC (Color Space Converter)

Thu May 3 03:49:06 CEST 2007

Hi

Michael Niedermayer writes:
 > Hi
 > 
 > On Sat, Apr 28, 2007 at 09:23:48AM -0400, Marc Hoffman wrote:
 > Content-Description: message body text
 > > Marc Hoffman writes:
 > >  > Michael Niedermayer writes:
 > >  >  > Hi
 > >  >  > 
 > >  >  > On Thu, Apr 26, 2007 at 08:52:16AM -0400, Marc Hoffman wrote:
 > >  >  > Content-Description: message body text
 > >  >  > > 
 > >  >  > > Marc Hoffman writes:
 > >  >  > >  > 
 > >  >  > >  > Please review, and let me know how this is comming along?
 > > 
 > > Review. Mods made per Michael.
 > > 
 > 
 > [...]
 > > +static
 > > +int core_yuv420_rgb (SwsContext *c,
 > > +                     unsigned char **in, int *instrides,
 > > +                     int srcSliceY, int srcSliceH,
 > > +                     unsigned char **oplanes, int *outstrides, ltransform_t lcscf, int rgb)
 > > +{
 > > +    unsigned char *py,*pu,*pv,*op;
 > > +    int w  = instrides[0];
 > > +    int w2 = w/2;
 > > +    int h2 = srcSliceH>>1;
 > > +    int i;
 > > +
 > > +    py = in[0];
 > > +    pv = in[1+(0^rgb)];
 > > +    pu = in[1+(1^rgb)];
 > > +    op = oplanes[0] + srcSliceY*outstrides[0];
 > > +
 > > +    if (rgb) {
 > > +      c->cgu = c->ugCoeff&0xffffffffU;
 > > +      c->cgv = c->vgCoeff&0xffffffffU;
 > > +    } else {
 > > +      c->cgu = c->vgCoeff&0xffffffffU;
 > > +      c->cgv = c->ugCoeff&0xffffffffU;
 > > +    }
 > 
 > the &0xff... is unneeded
 > indention is inconsistent
 > and instead of uint32_t cgu you could change ugCoeff to uint32_t
 > for blackfin per #ifdef, that would greatly simplify the code
 > same for the other variables

Done it makes sense actually I simplified the whole thing based on your recomendations.  Is this better?

static
int core_yuv420_rgb (SwsContext *c,
                     unsigned char **in, int *instrides,
                     int srcSliceY, int srcSliceH,
                     unsigned char **oplanes, int *outstrides, ltransform_t lcscf, int rgb)
{
    unsigned char *py,*pu,*pv,*op;
    int w  = instrides[0];
    int w2 = w/2;
    int h2 = srcSliceH>>1;
    int i;

    if (rgb) {
        c->crv = c->vrCoeff;
        c->cbu = c->ubCoeff;
        c->cgu = c->ugCoeff;
        c->cgv = c->vgCoeff;
    } else {
        c->crv = c->ubCoeff;
        c->cbu = c->vrCoeff;
        c->cgu = c->vgCoeff;
        c->cgv = c->ugCoeff;
    }

    py = in[0];
    pu = in[1+(1^rgb)];
    pv = in[1+(0^rgb)];

    op = oplanes[0] + srcSliceY*outstrides[0];

    for (i=0;i<h2;i++) {
        lcscf (py,pu,pv,op,w,&c->oy);

        py += instrides[0];
        op += outstrides[0];

        lcscf (py,pu,pv,op,w,&c->oy);

        py += instrides[0];
        pu += instrides[1];
        pv += instrides[2];
        op += outstrides[0];
    }
    return srcSliceH;
}

I removed the expander and integrated it directly into the core
converter, its a bit cleaner and simpler to follow now.  I want to put
some controls for DMA movement here with a little double buffering
scheme is that ok?  the Blackfin linux provides dma_memcpy, 

 > 
 > 
 > [...]
 > > +static
 > > +int bfin_yuv420_rgb565 (SwsContext *c,
 > > +                         unsigned char **in, int *instrides,
 > > +                         int srcSliceY, int srcSliceH,
 > > +                         unsigned char **oplanes, int *outstrides)
 > > +{
 > > +    c->rmask = 0x001f * 0x00010001U;
 > > +    c->gmask = 0x07e0 * 0x00010001U;
 > > +    c->bmask = 0xf800 * 0x00010001U;
 > 
 > wouldnt static const variables like mmx uses be simpler and avoid wasting
 > space in the context? or does bfin have problems with accessing statics?
 > 

Sorry, I could do it but it would add complexity to the optimized
assembler codes, the trick here is that I take advantage of the DSP
architectures compute, load, load infastructure otherwise I would need
to insert full machine cycles to load the mask actually it would take
2 machine clocks to do this.

 > 
 > [...]
 > > +static void *ff_bfin_maybe_alloc_l1 (int amt, int where)
 > > +{
 > > +    void *b = sram_alloc (amt, where);
 > > +    if (b == 0)
 > > +	b = av_malloc (amt);
 > 
 > tab

I'm getting closer to getting the next patch for review ready, does
the group have a problem with me putting the patch in and then adding
dma or prefetch logic to the module after the fact or is it better to
do it all in one sweep?

Also, when I get sometime I plan to fix that Altivec code if no one
opposes.  I don't think that code actually works as well as what the
group pushed me to do this time around.