I'm developing face detection app in android platform using OpenCL. Face detection algorithm is based on Viola Jones algorithm. I tried to make Cascade classification step kernel code. and I set classifier data of cascade stage 1 among cascade stages to local memory(__local) because classifier data are used for all work-items.
But, kernel profiling time without using local mem(using global mem) is more faster than that does with using local memory.
__kernel void CASCADE(__read_only image2d_t input_image, __write_only image2d_t output_image,__constant float* classifierMem){
int cascadeLocalSize = get_local_size(0);
__local float localStage1[5];
int localIdx = get_local_id(1)*cascadeLocalSize + get_local_id(0);
if(localIdx<5)
{
int stage1Idx = localIdx + idxNumValStageArray[0]+4;
localStage1[localIdx] = classifierMem[stage1Idx];
}
barrier(CLK_LOCAL_MEM_FENCE);
int gx = get_global_id(0);
int gy = get_global_id(1);
int featureIndex =0;
float featureThres = localStage1[featureIndex++];
float succVal = localStage1[featureIndex++];
float failVal = localStage1[featureIndex++];
float regionValue = localStage1[featureIndex++];
float stageThres = localStage1[featureIndex];
float featureValue += (regionValue < featureThres)?failVal:succVal;
if(featureValue < stageThres)
write_imagef(output_image, (int2)(gx, gy), (0.1));
}
Without using local memory version (original version):
__kernel void CASCADE(__read_only image2d_t input_image, __write_only image2d_t output_image,__constant float* classifierMem){
int gx = get_global_id(0);
int gy = get_global_id(1);
int featureIndex =0;
float featureThres = classifierMem[featureIndex++];
float succVal = classifierMem[featureIndex++];
float failVal = classifierMem[featureIndex++];
float regionValue = classifierMem[featureIndex++];
float stageThres = classifierMem[featureIndex];
float featureValue += (regionValue < featureThres)?failVal:succVal;
if(featureValue < stageThres)
write_imagef(output_image, (int2)(gx, gy), (0.1));
}
Why using local memory version is slower??
Aucun commentaire:
Enregistrer un commentaire