hugo-to-gemini/testdata/general_text.md
mntn ddb3943359
Strip HTML tags but keep content while rendering
This makes the renderer print the content of informational
HTML tags while stripping the tags themselves.

Tags like script, iframe, style, etc, which are unlikely to
ever hold presentable content, are exempt from this, and
their content is skipped from rendering as well as the tags
themselves.

<br>, a hard-break tag, is supported as a Markdown
hard-break replacement (the two spaces before newline).

This also adds tests for this behavior inside general_text.md.

Fixes #6, a longstanding issue with inline HTML in
blockquotes.
2021-10-02 15:07:46 +05:00

4.9 KiB

General text

Paragraphs are printed verbatim in gmnhg.

Single newlines (like in this multi-line paragraph) will get replaced by a space, as Gemini specification p. 5.4.1 recommends this for soft-wrapping text by clients.

Inline formatting bits (like this bold text, emphasized text, strikethrough text, preformatted text) are kept to make sure Gemini readers still have the stylistic context of your text.

Adding two spaces at the end of a line will insert a hard
break. You can also create a hard break using a backslash at the end
of a line. Hard breaks at the end of a paragraph are ignored.

Blockquotes

Newlines in blockquote paragraphs, unlike usual paragraphs, aren't replaced with a space. This facilitates appending authorship information to the quote, or using blockquotes to write poems.

"Never trouble another for what you can do yourself" — Thomas Jefferson, 3rd president of the US

"Wow, writing comprehensive test suites is hard!" — Timur Demin, while writing this very test file

"Somehow I know these two paragraphs will be broken into two separate blockquotes by gmnhg. I think my knowledge of that comes from being the author of this program."

— also Timur Demin, in the process of writing this test file

Hard breaks are also supported in blockquotes,
for compatibility. Hard breaks at the end of a blockquote are ignored.

Code

gmnhg will use Gemtext preformatted blocks for that. Markdown alt-text for preformatted blocks is supported, and is used to render alt-text as specified by Gemini spec p. 5.4.3.

package main

func main() {
    println("gmnhg is awesome!")
}

Preformatted Markdown of course isn't rendered:

# I am a test Markdown document

I contain text in **bold**.

gmnhg supports links, images, and footnotes. Links are a very interesting topic on itself; see a separate document for those.

Lists

Definition lists, numbered and ordered lists are all supported in gmnhg. There's also a separate document displaying those.

Tables

Markdown tables are supported in gmnhg, and are better displayed by a separate document.

Headings

Gemini specification allows up to three heading levels, with an optional space after the last heading symbol, #. With Markdown, you get 6; gmnhg will simply print the relevant number of #-s, making the client up to parse more heading levels and keeping context of the source document.

Since clients like Lagrange treat the fourth and the rest of #-s as heading content, it's best to avoid using H4-H6 in Gemini-aware Markdown entirely. Headings from H3 to H6 are provided below so you can test how your client handles that.

Heading 3

Heading 4

Heading 5
Heading 6

HTML

Inline HTML is currently stripped, but HTML contents remain on-screen. This may change in the future. HTML tags can be escaped with \ as in <span></span> or enclosed with ``.

HTML tags are stripped from HTML blocks. (Note that HTML blocks must begin and end with a supported HTML block tag, and must have blank lines before and after the block.)

Break tags

Hard breaks
using <br> are supported.

Hard breaks using <br> are supported
inside HTML blocks.

HTML entities

HTML escaped entities like & and < are unescaped, even when they show up inside an inline HTML section. Escaping them with a leading backslash is possible outside of HTML blocks: &amp;, &lt;. Any escaped characters inside a code span (such as &lt; or &gt;) will not be unescaped.

HTML escaped entities like < and > are also unescaped inside HTML blocks. Backslash escapes have no effect: \&.

Forbidden tags

Tags that are unable to output Gemini-compatible text are completely removed from the output.

Fieldset blocks are not rendered. Form blocks are not rendered.

Canvas blocks are not rendered.

Dialog blocks are not rendered.

Progress blocks are not rendered.

Note that the contents of "forbidden" tags will be rendered if they are placed , although the tags themselves will be stripped. Placing HTML block elements inline in this manner violates the spec of common Markdown flavors, but gmnhg handles it the best it can.

HTML in blockquotes

HTML spans are stripped from inside blockquotes.

Non HTML block text before the block.

HTML blocks are stripped from inside blockquotes.

Non HTML block text after the block.

Standalone blockquoted HTML blocks
are also stripped of their tags.

Misc


The Markdown horizontal line above is rendered as triple dashes.