TextSoap is an invaluable utility that I find increasingly useful. It comes with special text cleaners -- little routines for processing text.
The Use Case
When I'm writing I like to see the words. When writing for the web, you often need some type of markup to style your text. Thankfully, there is Markdown to markup the text, but not hide the text.
Many content management systems, or CMS, like WordPress do not accept markdown directly. Therefore, there's a lot of translating, adding, and other tasks required to get the text ready for publication. That can be a time-consuming mess. Especially if you have to do it often.
Whilst working on some articles, for Tuts+, I decided to automate the text conversion process. That is were the power of TextSoap really shines.
When you start TextSoap, you will be presented with the main screen.
It contains the work area to the left, and the list of text cleaners to the right. If you have stuff in your clipboard already, it will automatically be placed in the work area.
Press the Edit Cleaners to open the custom cleaner editor. This is the working area to create a custom cleaner. It will show the contents of the last edited cleaner. As you can see, I have many custom cleaners.
Pressing the + button at the bottom left will create a new cleaner. Name it Markdown to Article. The next step is to create the cleaner. The middle area is the list of clearners to be applied. It's currently empty, but not for long. The right hand side has all of the available actions that you can add to the cleaner.
You can add an actions by dragging one from the list on the right to where you want it in the middle. Each action placed in the middle area will be applied one at a time starting from the top of the screen to the bottom. Therefore, each time it processes a block in the cleaner, it actually goes completely through all of the text.
Since every block goes through every line of text to be processed, the more blocks you use in your cleaner, the longer the cleaning process will take. Therefore, it is best to keep the number of blocks to the minimum required.
The first thing to do is to convert the markdown text to HTML. Look at the list of cleaners on the right hand side. Right above the list is a search box. Type mark and a cleaner will show up Markdown Text. Drag that one into the middle area. That does it for converting from Markdown to HTML.
That cleaner, unfortunately, does not produce the exact format of HTML that is needed. That is what the rest of the cleaner will do.
Since WordPress does not want you to put in paragraph tags, they need to be removed. Drag the Regex Search and Replace Text cleaner to the middle just after the Markdown Text block. In the first text box, place what is being searched: \<[/]*p\>. This is a regular expression for detecting an opening or closing paragraph tag. The second text box is what to replace the matching string. Just leave it blank to delete the tags.
Every Regex Search and Replace block needs to be set to ignore case. Therefore, click the Options button and check ignore case. You will see an i placed just after the button.
Next, all of the figure tags need to have the tutorial-image class associated with it. So, get another Regex Search and Replace Text and place it under the last one. In the first text box, place \<figure\> . In the second text block, place <figure class=’tutorial-image’>. Don’t forget about the ignore case setting!
The writing standard for Tuts+ is to have a <hr /> tag before every header. To do this, you have to add another Regex Search and Replace Text block in with the search text box containing (<h2) and the replace text box containing <hr />$1. This will simply search for every <h2 tag fragment and place an <hr /> tag in front of it.
The markdown converter will always put an ID tag in the header tag. But, the standard says no IDs! They have to go. Once again, get a Regex Search and Replace Text block with <h2[^ > ]*> in the find text field and <h2> in the replace text field. This will search for every <h2> tag no matter what is in the middle and replace it with just a plain <h2> tag.
When you upload images in WordPress, it always places them in a specific directory location, a four digit year, and a two digit month before the name of the actual file. Since it is always in a predictable location, a search and replace can be used to set this up. Before you do the text cleaning, make sure to set the year and month appropriately for when you uploaded the pictures for the article.
This time, place a Find and Replace Text in the middle section with the first text field containing <img src=" , and the second text field containing <img src= "http://cdn.tutsplus.com/mac/uploads/2013/10/. This basically finds an image tag and replaces everything before the file name with the proper web path to the file.
There is one problem with this implementation: the month and year has to be changed each time the month and/or year changes for your articles. Since TextSoap does not have a dynamic memory system, it has to be changed manually each time.
When you have an anchor tag to download something loaded from the media area of WordPress, you will need to correct those addresses as well. Therefore, just repeat the previous Find and Replace Text block with the first text area containing
\<a href\=\"[^h][^t][^t][^p](.*)\ " and the second text area containing <a href="http://cdn.tutsplus.com/mac/uploads/2013/10/$1.
Select an If Text Matches block and drag it to the center as above. In the text box, place this regular expression string
\<h2.\>(.)\<\/h2\>. This will match every <h2></h2> tag set and pass it to the next block. It will do this for every line that matches. Set the Match capture group: to $1. That will send just the text inside the tags to the cleaner(s) inside the block and put the results back in between the header tags. That is a lot of work made easy! Remember to set the options to ignore case.
Next, grab a Title Case with Options block and drag it to the middle in between the If Text Matches block and the end conditional block. Since the default list for making small is the same as the web app, then nothing needs to be added. Best of all, it is smart enough to make sure the first word is always capitalized, no matter what word it is. If another word needs to be lower case, then it can be added to the large text box under Default: a,…. If you want to set some acronyms to not be changed, you can place them in the second large text box under Default: AT&T…. I have added HTML CSS PHP because those should always be completely uppercase.
The Treat: vs vs. v v. as small words needs to be checked.
Now you have a TextSoap cleaner that will take any Markdown text and convert it to HTML that can be pasted directly into WordPress for publication. No more adjustments needed. It's now easy enough to work only in Markdown and paste it into WordPress just to publish.
For example, I wrote this article in Sublime Text using Markdown, copied the text to the clipboard, used my TextSoap Alfred workflow to run this cleaner to convert the Markdown to HTML in the clipboard, pasted the results into WordPress, and uploaded my pictures. It is that simple!
A word of caution: paste your HTML into the Text tab of the WordPress editor!
Have you created any unique TextSoap cleaners? Let me know in the comments!