Ocr font variations

#Ocr font variations full#

This is not a full solution but it may drive to something better.īy converting your data from BGR (or RGB) to CIE-La b you can process a grayscale image as the weighted sum of the colour channels a* and b*. combined_image = binarized + red_binarized However, even with this filtering, it still doesn't extract red well.Īdd images obtained in (1.) and (2.). Plt.imshow(cv2.cvtColor(red_binarized, cv2.COLOR_BGR2RGB)) Red_binarized = cv2.bitwise_and(im, im, mask = mask) Image = im.convert('L') # convert image to monochromeīin_im = image.point(lambda p: p > threshold and 255)Įxtract only the red text values with a filter, then binarize it: import cv2 If you are set on OCR, one approach you could try:īinarize the image to get the non-red text in white and background black as in your binarized image: from PIL import Image The image is far too blurry because of video compression, so even preprocessing the image to improve quality may not get the image quality high enough for accurate OCR.

Assuming you get this snapshot from another's twitch stream, dealing with the streamer's video compression and network connectivity, you're going to get pretty blurry snapshot, so OCR is going to be pretty tough.

What are you going to do with this data? How accurate does the OCR need to be?.

How many calls are you going to do to the Google Vision API? If you are doing this throughout a whole stream, you'd probably need to get a paid subscription.

So some of my advices here might actually be implemented already and doing them would be unnecessary and repetitive. I suspect that Google Vision uses tesseract, or portions of it, but what other preprocessing it does for you I have no idea. Tesseract also sometimes gives better results if you resize the image to twice its original size. So some deskewing might be in order if your text is recorded at an angle. (You might need to keep the picture size though, so you should replace it with white colour). What tesseract likes when detecting scanned text is removed frames, so you can try to destroy as much of non character space from the image.

Or convert your image to YUV and then use just Y channel for further processing.Īlso, if you do not have a monochrome background consider performing some background substraction. Perhaps use Image.split() and rge() to binarise each colour separately and then bring them back together. Apply sharpness and contrast on the RGB image, then binarise it. I would advise you to try some PIL built-in filters like sharpness filter. In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time. (Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters). Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one. The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about.

I measured the kerning width of each character. I could not detect reliably text changes to average frames and reduce the interference.the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.I knew exactly which fonts and colors were going to be used.I knew exactly in which area of the screen the text was going to go.In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times. I can only offer a butcher's solution, potentially a nightmare to maintain.