how to discover all images within an ebook and determine what they are
Posted: Wed Oct 31, 2018 8:22 pm
Happy Halloween everyone! I hope all of you are getting a chance to enjoy this great holiday.
I would just like to pass along some tips on how to discover and work with all images present in an EPUB file, especially if no text descriptions are available. I have come across a couple of books with images, yet the alt attribute is completely blank. By default, most screen readers won't detect these pictures unless you're specifically searching for them. The text of the book might not give you any indication that an unlabeled image is nearby. Worse, the image may contain meaningful text content that blind readers would miss out on. Alternatively, an undescribed image may simply be a decorative icon that makes the book more visually appealing.
As part of my testing workflow, I make extensive efforts to find out about all images used in the books I read. There are a few ways of going about this.
If you use Jaws for Windows, you can easily configure how graphical information is displayed within the Quick Settings dialog by typing the word "graphic" into the search field. You can choose to have Jaws show all graphics, only those that have been tagged (with alt attributes), or none. Jaws can give you descriptions of an image using it's alt attribute, longest available text info, title, OnMouseOver or custom search chriterion. For testing purposes, I have Jaws set to show all graphics. This option detects most images. If I find an image that is not well described, I will set Jaws to recognize graphics by the longest string of textual information available.
I have recently begun using a feature in NVDA called Windows 10 OCR. If you have this version of Windows or if you've installed an OCR plugin, you can use this feature to obtain the text inside an image. There may be spelling errors, but the text you receive back is usually high quality. You can use Windows 10 OCR while reading a book in browse mode. Simply have your cursor on an image, then press NVDAKey+R to use OCR on it. Alternatively, unzip the epub file, find the location where all the book's images are stored, then open up a picture in the photos app. Next, just press NVDAKey+R to OCR the image and extract all the text from it. The content you receive back from the OCR scan will be in a separate document that can be pasted to your clipboard.
To get a thorough overview of how and where undescribed images are being used, I unzip an EPUB file and look at it's underlying code and structure. If an undescribed image is always present at the beginning or ending of a chapter, and if it's filename implies that the image is an icon of some sort, I can determine that the image is probably visually decorative.
I usually search each HTML file within an unzipped EPUB document for image tags. For those of you who may not know HTML, images are inserted into a document in the following way: <img src="ImageName.jpg" alt="Sample image description">. This is a simplified example, but all image tags should contain these essential elements. The src attribute contains the location of the image within the EPUB's structure and the alt attribute contains a description of the image. If you see something like this in the image tag, alt="", you might not even locate the presence of this image with your screen reader. Some images may not even contain the alt attribute at all. When seeking out images in an HTML file, I search for the keyword "<img".
Please note: All unzipped EPUB books have an images directory that contains all pictures used in the book. Thus far, I have found that the image filenames publishers use can sometimes be ambiguous and not very descriptive.
I hope these tips will assist you in your work. If you have any questions, please let me know. I wish all of you a great evening!
Heidi
I would just like to pass along some tips on how to discover and work with all images present in an EPUB file, especially if no text descriptions are available. I have come across a couple of books with images, yet the alt attribute is completely blank. By default, most screen readers won't detect these pictures unless you're specifically searching for them. The text of the book might not give you any indication that an unlabeled image is nearby. Worse, the image may contain meaningful text content that blind readers would miss out on. Alternatively, an undescribed image may simply be a decorative icon that makes the book more visually appealing.
As part of my testing workflow, I make extensive efforts to find out about all images used in the books I read. There are a few ways of going about this.
If you use Jaws for Windows, you can easily configure how graphical information is displayed within the Quick Settings dialog by typing the word "graphic" into the search field. You can choose to have Jaws show all graphics, only those that have been tagged (with alt attributes), or none. Jaws can give you descriptions of an image using it's alt attribute, longest available text info, title, OnMouseOver or custom search chriterion. For testing purposes, I have Jaws set to show all graphics. This option detects most images. If I find an image that is not well described, I will set Jaws to recognize graphics by the longest string of textual information available.
I have recently begun using a feature in NVDA called Windows 10 OCR. If you have this version of Windows or if you've installed an OCR plugin, you can use this feature to obtain the text inside an image. There may be spelling errors, but the text you receive back is usually high quality. You can use Windows 10 OCR while reading a book in browse mode. Simply have your cursor on an image, then press NVDAKey+R to use OCR on it. Alternatively, unzip the epub file, find the location where all the book's images are stored, then open up a picture in the photos app. Next, just press NVDAKey+R to OCR the image and extract all the text from it. The content you receive back from the OCR scan will be in a separate document that can be pasted to your clipboard.
To get a thorough overview of how and where undescribed images are being used, I unzip an EPUB file and look at it's underlying code and structure. If an undescribed image is always present at the beginning or ending of a chapter, and if it's filename implies that the image is an icon of some sort, I can determine that the image is probably visually decorative.
I usually search each HTML file within an unzipped EPUB document for image tags. For those of you who may not know HTML, images are inserted into a document in the following way: <img src="ImageName.jpg" alt="Sample image description">. This is a simplified example, but all image tags should contain these essential elements. The src attribute contains the location of the image within the EPUB's structure and the alt attribute contains a description of the image. If you see something like this in the image tag, alt="", you might not even locate the presence of this image with your screen reader. Some images may not even contain the alt attribute at all. When seeking out images in an HTML file, I search for the keyword "<img".
Please note: All unzipped EPUB books have an images directory that contains all pictures used in the book. Thus far, I have found that the image filenames publishers use can sometimes be ambiguous and not very descriptive.
I hope these tips will assist you in your work. If you have any questions, please let me know. I wish all of you a great evening!
Heidi