Can Lemmy instances make content of their sub-Lemmys indexable by search engines?

labbbb@thelemmy.club · 10 months ago

Can Lemmy instances make content of their sub-Lemmys indexable by search engines?

themoonisacheese@sh.itjust.works · edit-2 10 months ago

Since communities are viewable by anyone without an account, including search engine crawlers, this is the case by default. It is then up to search engines to crawl them and rank the appropriately.

A major problem right now is that search engines down rank massively pages with duplicate content, and that’s the case with most Lemmy instances because of federation. If the fediverse ever becomes large enough to matter, they will maybe change that, but currently finding things on the fediverse is not exactly a good time.

Edit: kagi search (paid search engine) has recently announced a “search on the fediverse” feature. Neat.

Björn Tantau@swg-empire.de · 10 months ago

Duplicate content shouldn’t be a problem as every post has a source URL. This is linked in the HTML head as the canonical URL. That way search engines know where something is from and that only that one is the true source.

themoonisacheese@sh.itjust.works · 10 months ago

I mean, do they? Do the search engines do that? I don’t know that they do. They could, but why spend the time making that?

Björn Tantau@swg-empire.de · 10 months ago

That’s standard HTML stuff available for decades.

themoonisacheese@sh.itjust.works · edit-2 10 months ago

Semantic html is largely ignored by search engines. If you’re talking about the source tag, it does not syndicate, at least on Google.

If you’re talking about iframes, Lemmy does not use them. The content appears as though your home instance hosts it (hence why images need to be moderated off-instance so badly).

Björn Tantau@swg-empire.de · 10 months ago

Google supports rel canonical link annotations as described in RFC 6596.

From https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls?hl=en#rel-canonical-link-method