Images, Words, and SEO

Images, Words, and SEO

I saved this post from the Google Webmaster Blog a few weeks back because I thought it also worthwhile sharing here. These days people get to ‘hyped’ about what they should or shouldn’t be doing as far as on-page search engine optimisation goes. I’ve posted a few times on the topic myself recently. But what I’m trying to to do is help you see *through* the clutter that is “internet marketing SEO”… because straight up, at least HALF of it is bullshit! And sadly, hundreds of people daily are getting suckered into buying products that may for a “time” improve their rankings, but then some time later get them blown completely out of the water.

I say bring on the Penguin. Bring on the Panda! Google only wants for itself what it wants for it’s CUSTOMERS… good quality, relevant results. And you can do that without having to go over board. There’s still room for creativity. There’s still room for “more”… but the real secret to growing your traffic is to offer CONTENT that once people see it, they will come back time and time again to see MORE.

Look, creativity is an important aspect of our lives and can enrich nearly everything we do. Say I’d like to make my teammate a cup of cool-looking coffee, but my creative batteries are empty; this would be (and is!) one of the many times when I look for inspiration on Google Images.

blog_coffeeart

The images you see in in the search results come from publishers of all sizes — bloggers, media outlets, stock photo sites — who have embedded these images in their HTML pages. Google can index image types formatted as BMP, GIF, JPEG, PNG and WebP, as well as SVG.

But how does Google know that the images are about coffee and not about tea? When their algorithms index images, they look at the textual content on the page the image was found on to learn more about the image. They also look at the page’s title and its body; they might also learn more from the image’s filename, anchor text that points to it, and its “alt text;” they may use computer vision to learn more about the image and may also use the caption provided in the Image Sitemap if that text also exists on the page.

To help Google index your images, make sure that they can:

  • crawl both the HTML page the image is embedded in, and the image itself;
  • and that the image is in one of their supported formats: BMP, GIF, JPEG, PNG, WebP or SVG.

Additionally, they recommend:

  • that the image filename is related to the image’s content;
  • that the alt attribute of the image describes the image in a human-friendly way;
  • and finally, it also helps if the HTML page’s textual contents as well as the text near the image are related to the image.

Now some answers to questions they’ve seen many times:

Q: Why do I sometimes see Googlebot crawling my images, rather than Googlebot-Image?
A: Generally this happens when it’s not clear that a URL will lead to an image, so they crawl the URL with Googlebot first. If they find the URL leads to an image, they’ll usually revisit with Googlebot-Image. Because of this, it’s generally a good idea to allow crawling of your images and pages by both Googlebot and Googlebot-Image.

Q: Is it true that there’s a maximum file size for the images?
A: No. They’re happy to index images of any size; there’s no file size restriction.

blog_exifQ: What happens to the EXIF, XMP and other metadata my images contain?
A: They may use any information they find to help their users find what they’re looking for more easily. Additionally, information like EXIF data may be displayed in the right-hand sidebar of the interstitial page that appears when you click on an image.

Q: Should I really submit an Image Sitemap? What are the benefits?
A: Yes! Image Sitemaps help Google learn about your new images and may also help them learn what the images are about.

Q: I’m using a CDN to host my images; how can I still use an Image Sitemap?
A: Cross-domain restrictions apply only to the Sitemaps’ tag. In Image Sitemaps, the tag is allowed to point to a URL on another domain, so using a CDN for your images is fine. They also encourage you to verify the CDN’s domain name in Webmaster Tools so that they can inform you of any crawl errors that we might find.

Q: Is it a problem if my images can be found on multiple domains or subdomains I own — for example, CDNs or related sites?
A: Generally, the best practice is to have only one copy of any type of content. If you’re duplicating your images across multiple hostnames, our algorithms may pick one copy as the canonical copy of the image, which may not be your preferred version. This can also lead to slower crawling and indexing of your images.

Q: We sometimes see the original source of an image ranked lower than other sources; why is this?
A: Keep in mind that Google uses the textual content of a page when determining the context of an image. For example, if the original source is a page from an image gallery that has very little text, it can happen that a page with more textual context is chosen to be shown in search. If you feel you’ve identified very bad search results for a particular query, feel free to share your example in the Webmaster Help Forum.

SafeSearch

Google algorithms use a great variety of signals to decide whether an image — or a whole page, if they’re talking about Web Search — should be filtered from the results when the user’s SafeSearch filter is turned on. In the case of images some of these signals are generated using computer vision, but the SafeSearch algorithms also look at simpler things such as where the image was used previously and the context in which the image was used.

One of the strongest signals, however, is self-marked adult pages. They recommend that webmasters who publish adult content mark up their pages with one of the following meta tags:

<meta name=”rating” content=”adult” />
<meta name=”rating” content=”RTA-5042-1996-1400-1577-RTA” />

Many users prefer not to have adult content included in their search results (especially if kids use the same computer). When a webmaster provides one of these meta tags, it helps to provide a better user experience because users don’t see results which they don’t want to or expect to see.

As with all algorithms, sometimes it may happen that SafeSearch filters content inadvertently. If you think your images or pages are mistakenly being filtered by SafeSearch, please let them know using the support form.

If you need more information about how Google index images, please check out the section of their Help Center dedicated to images, read the SEO Starter Guide which contains lots of useful information, and if you have more questions please post them in the Webmaster Help Forum.