This is not a full solution but it may drive to something better.īy converting your data from BGR (or RGB) to CIE-La b you can process a grayscale image as the weighted sum of the colour channels a* and b*. combined_image = binarized + red_binarized However, even with this filtering, it still doesn't extract red well.Īdd images obtained in (1.) and (2.). Plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB)) Red_binarized = cv2.bitwise_and(im, im, mask = mask) Image = im.convert('L') # convert image to monochromeīin_im = image.point(lambda p: p > threshold and 255)Įxtract only the red text values with a filter, then binarize it: import cv2 If you are set on OCR, one approach you could try:īinarize the image to get the non-red text in white and background black as in your binarized image: from PIL import Image The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR.
Or convert your image to YUV and then use just Y channel for further processing.Īlso, if you do not have a monochrome background consider performing some background substraction. Perhaps use Image.split() and rge() to binarise each colour separately and then bring them back together. Apply sharpness and contrast on the RGB image, then binarise it. I would advise you to try some PIL built-in filters like sharpness filter. In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time. (Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters). Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one. The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.
I measured the kerning width of each character. I could not detect reliably text changes to average frames and reduce the interference.the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.I knew exactly which fonts and colors were going to be used.I knew exactly in which area of the screen the text was going to go.In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times. I can only offer a butcher's solution, potentially a nightmare to maintain.