Importing feeds into Obsidian with Simple RSS

May 12 2024
Importing feeds into Obsidian with Simple RSS

This post is about how to import RSS into Obsidian using the Simple RSS plugin. It's aimed at total beginners up. If you're not a beginner, you may want to just skim down to the part where I show the data.json files for Simple RSS, or download a copy of them from the links at the end of the article.

FYI: Obsidian is a free for personal/non profit use note-making app with full customisation using plugins, many of which are open source. Simple RSS is an Obsidian community plugin developed by Monnier Antoine, that creates a new Obsidian markdown file for each post in an RSS feed. As far as I know this is the only RSS plugin that will do this for Obsidian.

Context: I'm using two different feeds, one from my Grav website, and one from my main Mastodon account. My Grav feed is generated using the ' Grav Feed Plugin '. My Mastodon feed is captured using the https://server.social/@myaccountname.rss url format. Simple RSS is a great way to archive my website posts as a second backup, and is also very useful for archiving post content from Mastodon accounts. I haven't tested the plugin for other Fediverse feeds, but I suspect it would work very similarly, though it depends on the structure of the outputted RSS.

The main challenge is how to get posts into Obsidian that capture all the available fields you would want or need, and be able to get the post into your preferred layout and design. I may add to this post as I learn more about how best to achieve things. This is as far as I've got as of Feb/Mar 2024.

I'll cover:

  • The core use-case procedure for Simple RSS
  • The RSS structure of each type of feed I'm using, going through each item field in brief detail.
  • For each type of feed I'll detail how I've created templates and feed types in the Simple RSS plugin.
  • Provide downloads of the feeds in raw form, plus a download of the template code Im using for each feed. You can use that to get you going with your own feed templates.

Things to beware of

  • If you sync Obsidian, whether to cloud or locally, I'd turn that off while you're working.
  • Be careful to note that sometimes Simple RSS will seem to lose your data.json config file and return a blank, or the icon on the left sidebar will disappear even though the plugin is activated. I think this is probably due to some kind of caching issue on your device.
  • I keep regular copy backups of the template Im developing
  • Delete entries you've fetched with the plugin before testing otherwise nothing will happen to refresh the template layout.
  • Close Obsidian every time you make changes, delete all entries, save the json file, then open Obsidian, refresh the feed.

Simple RSS

Install the plugin in Obsidian!

  • Make a note of all the fields you want to capture into your posts
  • Decide if you want Simple RSS to generate posts in a dedicated folder in your Obsidian folder/file tree (this is preferable). If so, specify the file path to a specific folder.
  • In Simple RSS settings, create a custom feed type and give it a name.
  • Put some feed fields and item fields into the panel in the Simple RSS settings interface in Obsidian.
  • After you've done this, I highly recommend moving to your favourite code editor of choice (e.g. Sublime) and opening the data.json file located in .../vault/.obsidian/plugins/simple-rss/. If this folder isnt in your vault, turn on 'view hidden files' in your finder/explorer. Json isn't hard to work with and it's very logical to see where the template code is for your custom feed. Ill show this in more detail below.
  • You can now refine your field structure, and if you'd like, add HTML elements to your layout.
  • Experiment with using CSS classes that are in your CSS snippets folder.

The Mastodon Feed

Here is the structure of the Mastodon RSS output. I have commented sections of the code to highlight what is happening. A plain .txt file is available in the downloads at the end of this article to see the full feed structure.

The Mastodon feed uses a more unusual structure, the main difference being that it uses media: content items, rather than a more standard img url with an html element that is embedded in the main description field. This presents problems in how to target the url attribute but Simple RSS can fetch it.

The top level fields

This is at the top of the RSS feed document model, and tells us the declared document type (XML), what type of RSS it is and what version is being used (RSS v2). It also tells us the compatibility of the feed. MRSS is Media RSS and is what the Mastodon feed format is using.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:webfeeds="http://webfeeds.org/rss/1.0" xmlns:media="http://search.yahoo.com/mrss/">

The opening channel field and child fields of channel

This is the title of the channel (the actual feed) and other details about it like short description, link, last update and what generated it. There is also a webfeeds: icon, which is the avatar of the feed, the same avatar .png that the image.url fetches.

<channel>
    <title>Dr Pen</title>
    <description>Public posts from @DrPen@mastodon.social</description>
    <link>https://mastodon.social/@DrPen</link>
    <image <url>https://files.mastodon.social/accounts/avatars/110/128/000/908/920/430/original/a62af531dc3efd4c.jpg</url>
      <title>Dr Pen</title>
      <link>https://mastodon.social/@DrPen</link>
    </image>
    <lastBuildDate>Wed, 31 Jan 2024 12:18:42 +0000</lastBuildDate>
    <webfeeds:icon>https://files.mastodon.social/accounts/avatars/110/128/000/908/920/430/original/a62af531dc3efd4c.jpg</webfeeds:icon>
    <generator>Mastodon v4.3.0-nightly.2024-02-01</generator>

The opening item fields

The channel fields precede the item field(s). The channel encloses all the items, a bit like an article or section might in html5. The item field is where post content is held. Let's look at this in more detail.

The guid is the permalink, the permanent url of the individual post. The link is the same as the permalink. The pubDate is the date of publication, when the post was made.

  <item>
     <guid isPermaLink="true">https://mastodon. Social/@DrPen/111832201685713509</guid>
     <link>https://mastodon.social/@DrPen/111832201685713509</link>
     <pubDate>Sun, 28 Jan 2024 06:39:35 +0000</pubDate>

The item.description field

The item description holds all the post content, including post links, tags and text. It does NOT have the image url included. This code snippet indicates what you'll see in the description item type.

<description> 
    The description content starts with the post text content and contains some asci html name characters like `&lt;p&gt;` and odd html renders like e.g. `... translate="no"&gt;&lt;span class="invisible"&gt;https://www.&lt;/span&gt;&lt;span class="ellipsis"&gt;`. This is to escape open and closed html element tags (< > /) or apostrophes ... or other special characters ...

The post text content is followed by the mastodon hashtags rendered as links, again with asci 
href="https://mastodon.social/tags/academicchatter" class="mention hashtag" rel="tag"&gt;#&lt;span&gt;academicchatter&lt;/span&gt;&lt;/a&gt; &lt;a 

// ... multiple other tags will follow this in one long string ... // 

The closing description field tag:

</description>

The media:content field

The media: content field follows description, and is a bit like the media: group field (e.g. used in YouTube feeds) but the media: content field has the problem of the image url not being in a separate child element, it is an attribute of the main media:content element.

     <media:content url="https://files.mastodon.social/media_attachments/files/111/832/170/886/342/281/original/6cae376ed484e250.png" type="image/png" fileSize="165089" medium="image">
       <media:rating scheme="urn:simple">nonadult</media:rating>
       <media:description type="plain">The alt text of the image</media:description>
     </media:content>

The item.category field

The category field follows the closed description field. The mastodon hashtags are also rendered as categories, you will get multiple single category elements, each with a hashtag in it. Eg:

     <category>academicchatter</category>
     <category>fediverse</category>
     <category>mastodon</category>

The closing item field tag looks like this:

   </item>

NB There is an item field for each post, a total of twenty in the feed. I don't think it is possible to increase this number. After the last item field, the closing channel fields end the feed.

Simple RSS and Mastodon Feeds

Simple RSS instructions are on the GitHub readme.

  • Enter a few fields into your custom feed type, e.g. "Masto 1"
  • Open the data.json file
  • Flesh out your template code to add more layout elements
  • Save, refresh the fetched entries (replace with new entries)

Something of note: The big challenge was to capture the media:content url attribute. I was stuck, it seemed very difficult. After some tests, frustration and puzzling, plus a post on the Obsidian forum that yielded no replies, I went to look at the issues listed in the Simple RSS GitHub. One of the closed issues was about capturing the description of a Mastodon feed. Though the issue was closed, and wasn't the issue I was stuck on, it was a recent entry. I replied, saying I'd had to add description to the core default feed type because description wasn't being called at all (in either feed). This drew the attention of the plugin dev, and ended being added to the plugin ReadMe. Description is not a default item field in the RSS Parser library that Simple RSS uses, so you need to add it to a custom feed to call it.

Then I asked about what I was actually stuck on, fetching the media:content url attribute. To their complete credit, the dev person responded in a couple of hours. With their expert guidance and instruction, I solved my problem. To call the url attribute, use the following code:

item.media:content.$.url

The Simple RSS feed template interface for Mastodon RSS

The dropdown selects which feed type you are using. Then you see the path to the folder for the generated posts, then the title and template code. This is the setup for my Mastodon feed.

screenshot of Mastodon RSS template in Simple RSS

I decided to lay out the categories as an unordered list for simplicity, but as you can see below, they can also be transformed into hashtags that in read view would be converted into Obsidian tags and added to your tag list.

A download file is provided below to see the full data.json file.


I'll now move on to how to work with a Grav feed and Simple RSS. This is simpler and included to show a different kind of feed structure and template.

The Grav Feed

The Grav feed is more straightforward, but still needs to have a custom feed type to benefit from capturing all available fields. The template is more basic at time of writing but will get more complex. One thing its not doing at the moment is figuring out how to capture inline images in slideshows or otherwise using grav style markdown image features.

The top document type, RSS type (Atom in this case) and opening channel fields. I run the plugin with json compatibility enabled.

<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
    <channel>
        <title>Penworks.Net</title>
        <link>https://penworks.net/blog</link>
        <atom:link href="https://penworks.net/blog.rss" rel="self" type="application/rss+xml"/>
        <description>Internet Gardener</description>
        <language>en</language>
        <lastBuildDate>Wed, 22 Nov 2023 18:35:17 +0100</lastBuildDate>

The item field (with description field). The link to the post is included, along with pubDate, and permalink guid. The description item field uses the CDATA (character data) tag and does not include any ASCII HTML name and number characters like those you see in the Mastodon feed. CDATA allows for 'normal' HTML characters to be used that won't break the XML. The description item field therefore includes the img url as an HTML element, and other HTML styling and lists. Like the Mastodon feed, tags are included as separate categories.

<item>
<title>Possible futures of microblogging</title>
<link>https://penworks.net/blog/possible-futures-of-microblogging</link>
<guid>https://penworks.net/blog/possible-futures-of-microblogging</guid>
<pubDate>Thu, 27 Jul 2023 11:10:00 +0200</pubDate>
<description>
<![CDATA[

<img alt="" src="https://penworks.net/images/8/f/9/f/d/8f9fde5c7ffbedec139b4d201a3578872bdf81ad-twitter-17956521920.jpg" />
<p>It's going to be very interesting what happens to <a href="https://www.techtarget.com/searchmobilecomputing/definition/microblogging">microblogging </a> now, with the apparent chaotic meltdown of 'X', formerly known as Twitter, and the rapid, planned rise of Threads. While I expect Mastodon or other microblogging federated platforms will very likely continue to draw in the older, more educated and specialist communities (check the demographics), I believe Threads will come to dominate fairly/very quickly. It seems logical ...

[...] 

<ul>
<li>DSA - Digital Services Act</li>
<li>EDA - European Data Act</li>
<li>EDPS - European Data Protection Supervisor</li>
<li>FOSS - Free and Open Source Software</li>
<li>WEI - Web Environment Integrity</li>
</ul>

]]><!--end of CDATA-->
</description>

        <category>social-media</category>
        <category>datasociety</category>

    </item>

In Grav, several things pose further challenges that currently Im working on to fetch in better templating. The main one is how to tidy up and either show inline images included in a Grav post, or hide them due to them having relative urls and not being able to be shown. The other attributes in the img element are from Grav lightbox, gallery and resizing.

NB only images with an absolute url are visible in the Obsidian feed, i.e. linked to the original host site.

<p id="1745288826">
    <a href="/user/pages/blog/cans-festival-leake-street/01p1080028_2471103213_o.jpg" class="glightbox-1745288826">

        <img width="267" height="200" title="01p1080028_2471103213_o" alt="01p1080028_2471103213_o" src="/user/pages/blog/cans-festival-leake-street/01p1080028_2471103213_o.jpg" srcset="/images/9/3/8/9/d/9389dd1caaf13d04a4f722aa8bd7c7892db07dcd-01p10800282471103213o534w.jpg 534w, /user/pages/blog/cans-festival-leake-street/01p1080028_2471103213_o.jpg 2048w" sizes="1px" />

    </a>

The Simple RSS feed template interface for Grav RSS

screenshot of Grav RSS template in Simple RSS

Rounding up

I've been testing this for several weeks with both the Grav and the Mastodon accounts and it works very well. I now have a full archive of posts that can be searched in Obsidian, and exported into PDF collections or other formats if I want. In time I'll design up the posts nicely, and add my other Mastodon account. I hope this post is a useful guide to others if they want to capture their Mastodon or other website posts into Obsidian.

Check the thumbnail for a large image of what a single Mastodon post looks like in Obsidian.

Full size image of Mastodon post (click)

The 'designed' post as it looks in Obsidian A screenshot of how a Mastodon post looks when imported into Obsidian as a separate file


Links & Downloads

Downloads


Suggested Posts


Previous Post