Advertisement
  1. Computer Skills
  2. App Training
Computers

How to Effortlessly Create Markdown With TextSoap

by
Difficulty:BeginnerLength:MediumLanguages:

TextSoap is an invaluable utility that I find increasingly useful. It comes with special text cleaners -- little routines for processing text.

In this tutorial, I will introduce TextSoap and create a custom text cleaner for processing an article written in Markdown to make it ready for adding into WordPress.


The Use Case

When I'm writing I like to see the words. When writing for the web, you often need some type of markup to style your text. Thankfully, there is Markdown to markup the text, but not hide the text.

Many content management systems, or CMS, like WordPress do not accept markdown directly. Therefore, there's a lot of translating, adding, and other tasks required to get the text ready for publication. That can be a time-consuming mess. Especially if you have to do it often.

Whilst working on some articles, for Tuts+, I decided to automate the text conversion process. That is were the power of TextSoap really shines.


Getting Started

When you start TextSoap, you will be presented with the main screen.

It contains the work area to the left, and the list of text cleaners to the right. If you have stuff in your clipboard already, it will automatically be placed in the work area.

TextSoap Main Window
TextSoap Main Window

Press the Edit Cleaners to open the custom cleaner editor. This is the working area to create a custom cleaner. It will show the contents of the last edited cleaner. As you can see, I have many custom cleaners.

TextSoap Edit Custom Cleaner Window
TextSoap Edit Custom Cleaner Window

Pressing the + button at the bottom left will create a new cleaner. Name it Markdown to Article. The next step is to create the cleaner. The middle area is the list of clearners to be applied. It's currently empty, but not for long. The right hand side has all of the available actions that you can add to the cleaner.

Creating a New Cleaner
Creating a New Cleaner

You can add an actions by dragging one from the list on the right to where you want it in the middle. Each action placed in the middle area will be applied one at a time starting from the top of the screen to the bottom. Therefore, each time it processes a block in the cleaner, it actually goes completely through all of the text.

The yellow area under the middle work area is for adding comments. When you add a cleaner to the middle area, you can add a more readable description of what you are doing with that cleaner. This makes it easier to follow the cleaner and to edit it in the future!

Since every block goes through every line of text to be processed, the more blocks you use in your cleaner, the longer the cleaning process will take. Therefore, it is best to keep the number of blocks to the minimum required.

Adding Markdown Cleaner
Adding Markdown Cleaner

The first thing to do is to convert the markdown text to HTML. Look at the list of cleaners on the right hand side. Right above the list is a search box. Type mark and a cleaner will show up Markdown Text. Drag that one into the middle area. That does it for converting from Markdown to HTML.

That cleaner, unfortunately, does not produce the exact format of HTML that is needed. That is what the rest of the cleaner will do.

Removing HTML Paragraph Tags
Removing HTML Paragraph Tags

Since WordPress does not want you to put in paragraph tags, they need to be removed. Drag the Regex Search and Replace Text cleaner to the middle just after the Markdown Text block. In the first text box, place what is being searched: \<[/]*p\>. This is a regular expression for detecting an opening or closing paragraph tag. The second text box is what to replace the matching string. Just leave it blank to delete the tags.

Tip: If you are not confident with regular expressions, have a read of You Don’t Know Anything About Regular Expressions: A Complete Guide.

Every Regex Search and Replace block needs to be set to ignore case. Therefore, click the Options button and check ignore case. You will see an i placed just after the button.

Adding Classes to Figures
Adding Classes to Figures

Next, all of the figure tags need to have the tutorial-image class associated with it. So, get another Regex Search and Replace Text and place it under the last one. In the first text box, place \<figure\> . In the second text block, place <figure class=’tutorial-image’>. Don’t forget about the ignore case setting!

Fixing H2 tags: Adding <hr /> tag before <h2>
Fixing H2 tags: Adding <hr /> tag before <h2>

The writing standard for Tuts+ is to have a <hr /> tag before every header. To do this, you have to add another Regex Search and Replace Text block in with the search text box containing (<h2) and the replace text box containing <hr />$1. This will simply search for every <h2 tag fragment and place an <hr /> tag in front of it.

Fixing H2 tags: Removing ids
Fixing H2 tags: Removing ids

The markdown converter will always put an ID tag in the header tag. But, the standard says no IDs! They have to go. Once again, get a Regex Search and Replace Text block with <h2[^ > ]*> in the find text field and <h2> in the replace text field. This will search for every <h2> tag no matter what is in the middle and replace it with just a plain <h2> tag.

Fixing Image Source Address
Fixing Image Source Address

When you upload images in WordPress, it always places them in a specific directory location, a four digit year, and a two digit month before the name of the actual file. Since it is always in a predictable location, a search and replace can be used to set this up. Before you do the text cleaning, make sure to set the year and month appropriately for when you uploaded the pictures for the article.

This time, place a Find and Replace Text in the middle section with the first text field containing <img src=" , and the second text field containing <img src= "http://cdn.tutsplus.com/mac/uploads/2013/10/. This basically finds an image tag and replaces everything before the file name with the proper web path to the file.

There is one problem with this implementation: the month and year has to be changed each time the month and/or year changes for your articles. Since TextSoap does not have a dynamic memory system, it has to be changed manually each time.

Fixing Misc. Anchor Tags
Fixing Misc. Anchor Tags

When you have an anchor tag to download something loaded from the media area of WordPress, you will need to correct those addresses as well. Therefore, just repeat the previous Find and Replace Text block with the first text area containing \<a href\=\"[^h][^t][^t][^p](.*)\ " and the second text area containing <a href="http://cdn.tutsplus.com/mac/uploads/2013/10/$1.


Titles

The last thing that needs fixed is the titles. Tuts+ requires all titles to be title cased. But, not all title cases are the same. There is a small web app for making the headers title case designed specifically for Tuts+ articles. Examining the JavaScript code reveals that it is desired for the following words to always be lowercase: a, an, and, as, at, but, by, en, for, if, in, of, on, or, the, to, vs, vs., and via. Excepting when any of these is the first word in a title, then it needs to be capitalized.

Fixing Title Case: Regular Expression
Fixing Title Case: Regular Expression

Select an If Text Matches block and drag it to the center as above. In the text box, place this regular expression string \<h2.\>(.)\<\/h2\>. This will match every <h2></h2> tag set and pass it to the next block. It will do this for every line that matches. Set the Match capture group: to $1. That will send just the text inside the tags to the cleaner(s) inside the block and put the results back in between the header tags. That is a lot of work made easy! Remember to set the options to ignore case.

Next, grab a Title Case with Options block and drag it to the middle in between the If Text Matches block and the end conditional block. Since the default list for making small is the same as the web app, then nothing needs to be added. Best of all, it is smart enough to make sure the first word is always capitalized, no matter what word it is. If another word needs to be lower case, then it can be added to the large text box under Default: a,…. If you want to set some acronyms to not be changed, you can place them in the second large text box under Default: AT&T…. I have added HTML CSS PHP because those should always be completely uppercase.

The Treat: vs vs. v v. as small words needs to be checked.


Summary

Now you have a TextSoap cleaner that will take any Markdown text and convert it to HTML that can be pasted directly into WordPress for publication. No more adjustments needed. It's now easy enough to work only in Markdown and paste it into WordPress just to publish.

For example, I wrote this article in Sublime Text using Markdown, copied the text to the clipboard, used my TextSoap Alfred workflow to run this cleaner to convert the Markdown to HTML in the clipboard, pasted the results into WordPress, and uploaded my pictures. It is that simple!

A word of caution: paste your HTML into the Text tab of the WordPress editor!

Have you created any unique TextSoap cleaners? Let me know in the comments!

Advertisement
Advertisement
Looking for something to help kick start your next project?
Envato Market has a range of items for sale to help get you started.