Converting Markdown to HTML in PHP

Jan 12, 2015

Table of contents:

What are we going to build?
Choosing the right Markdown parser
Writing the Domain Service Interface
Writing the Implementation
Bootstrapping the CommonMark package
Rendering Markdown as HTML
Auto-linking @mentions
Extracting @mentions
Conclusion

A common feature of new web applications that are aimed at a particular crowd is Markdown. Markdown is a lightweight way of adding formatting to text that can be converted to HTML.

A lot of big websites like GitHub, StackOverflow and Reddit use variations of the original Markdown syntax to provide their users with the ability to format their text.

Cribbb will most definitely be aimed at the early adopter technology crowd and so I want to be able to offer Markdown support.

I also want to be able to extract @mentions to send user notifications as well as auto-link user profiles.

In today’s tutorial we’re going to look at how to create a service to parse Markdown and render it to HTML as well as seeing how we can extract @mentions from a body of text.

What are we going to build?

Before I jump into the code, first I will explain what we’re going to be building today to solve our problem.

When a user creates a new post in Cribbb, the body of the post will be in Markdown. We need a way to:

Convert Markdown to HTML so we can save it in the database
Auto-link @mentions to user profiles
Extract the @mentions to an array so we can send notifications whenever a user is mentioned in a post

To do this we will need to write a new Domain Service that can accept the raw Markdown, convert it to HTML and extract the @mentions.

You might be thinking, “why is this a Domain Service and not just a method on the Post object?”.

The process of converting the raw Markdown to HTML is not part of the Ubiquitous Language of the application because it is simply a technical detail. Theoretically we could use an alternative to Markdown or an alternative to the Markdown parser. The Domain Object really doesn’t care about this process so we’re not stealing any logic away.

To read more about the distinction between Domain Service and Domain Objects, take a look at Creating Domain Services.

Choosing the right Markdown parser

When I’m required to write functionality for generic jobs like parsing Markdown, I’m very thankful that I work in an industry that advocates for Open Source software.

There’s absolutely no point in me writing my own Markdown parser as that would be a massive waste of my time. Instead I can turn to the beautiful world of Open Source software to select a package that I can just drop into my application.

Markdown parsing is one of those problems that people seem to love to solve. Within the PHP ecosystem, there is definitely not a shortage when it comes to dealing with Markdown.

However to narrow my search down, I’ll need a list of requirements that the package will have to cover:

Support for the majority of the standard Markdown specification
Can be extended to add additional functionality (I really like GitHub flavoured Markdown)
Can autolink @mentions

The Markdown parser I ended up choosing was thephpleague/commonmark. This package seems to be well maintained, covers my requirements and I doubt that it will suddenly just drop off the radar.

Writing the Domain Service Interface

So the first thing we need to do is to write the Domain Service Interface. This interface will define the methods that we require the Service to implement.

By relying on an interface within our Domain code, we protect ourselves from any particular concrete implementation. This means we can arbitrary swap out implementations without disrupting the inner domain.

Create a new namespace for Discussion under the existing Domain\Services namespace and create the following Parser.php file:

<?php namespace Cribbb\Domain\Services\Discussion;

interface Parser
{
    /**
     * Render a string of text
     *
     * @param string $text
     * @return string
     */
    public function render($string);

    /**
     * Extract the users from the text
     *
     * @param string $text
     * @return array
     */
    public function users($string);
}

This interface defines the two methods we’re going to need on this service.

The render() method accepts a string of text in a given format and should return a string of HTML.

The users() method should accept a string of text in a given format and return the usernames of the users that were @mentioned.

Writing the Implementation

With the interface in place, we can now write the implementation. As in previous tutorials, the actual implementation of this service will live in the Infrastructure namespace.

Create a new Discussion namespace under the existing Infrastructure\Services namespace and create a new file called CommonMarkParser.php:

<?php namespace Cribbb\Infrastructure\Services\Discussion;

use Cribbb\Domain\Services\Discussion\Parser;

class CommonMarkParser implements Parser
{
}

As you can see, the first thing we need to do is to write the class definition and ensure that it implements the Parser interface.

At this stage we can also create the test file for this class:

<?php namespace Cribbb\Tests\Infrastructure\Services\Discussion;

class CommonMarkParserTest extends \PHPUnit_Framework_TestCase
{
}

Bootstrapping the CommonMark package

Now that we have our Service class set up, we need to inject it with the CommonMark package so that we can actually parse and render the incoming Markdown.

Add the CommonMark package to your project by running the following command in terminal:

$ composer require league/commonmark

This command will automatically update your composer.json and add the package as a dependency to your project.

Next we need to inject the package into our Service class. If we have a look at the documentation you will see that we can do this in one of two ways.

I’m going to inject DocParser, Environment and HtmlRenderer objects individually because I want to extend the plugin to add my own functionality:

use League\CommonMark\DocParser;
use League\CommonMark\Environment;
use League\CommonMark\HtmlRenderer;
use Cribbb\Domain\Services\Discussion\Parser;

class CommonMarkParser implements Parser
{
    /**
     * @var Environment
     */
    private $environment;

    /**
     * @var DocParser
     */
    private $parser;

    /**
     * @var HtmlRenderer
     */
    private $renderer;

    /**
     * Create a new CommonMarkParser
     *
     * @param Environment $environment
     * @param DocParser $parser
     * @param HtmlRenderer $renderer
     * @return void
     */
    public function __construct(
        Environment $environment,
        DocParser $parser,
        HtmlRenderer $renderer
    ) {
        $this->environment = $environment;
        $this->parser = $parser;
        $this->renderer = $renderer;
    }
}

I can also use the setUp() method of the test file to set up the class before each test:

use League\CommonMark\DocParser;
use League\CommonMark\Environment;
use League\CommonMark\HtmlRenderer;
use Cribbb\Infrastructure\Services\Discussion\CommonMarkParser;

class CommonMarkParserTest extends \PHPUnit_Framework_TestCase
{
    /** @var Parser */
    private $service;

    public function setUp()
    {
        $environment = Environment::createCommonMarkEnvironment();
        $parser = new DocParser($environment);
        $renderer = new HtmlRenderer($environment);
        $this->service = new CommonMarkParser($environment, $parser, $renderer);
    }
}

Rendering Markdown as HTML

The first method we will look at will be to render Markdown as HTML. Fortunately the CommonMark package deals with all of this complexity for us, so we can simply pass the string of text in and then return the output:

/**
 * Render a string of text
 *
 * @param string $text
 * @return string
 */
public function render($string)
{
    $document = $this->parser->parse($string);

    return $this->renderer->renderBlock($document);
}

We can test this by passing in a sample chunk of Markdown and ensuring that the correct HTML is returned:

/** @test */
public function should_render_markdown_as_html()
{
    $text = $this->service->render('Cribbb is **awesome**');

    $this->assertEquals('<p>Cribbb is <strong>awesome</strong></p>', rtrim($text));
}

We don’t need to test that the CommonMark package can convert every type of Markdown input because that is really not our concern.

Auto-linking @mentions

The next bit of functionality we need to implement is the ability to auto-link @mentions. Unfortunately the CommonMark package does not have this functionality out-of-the-box.

To illustrate what I’m trying to achieve, we can write the test first:

/** @test */
public function should_render_mention()
{
    $text = $this->service->render('You should follow @philipbrown!');

    $this->assertEquals(
    '<p>You should follow <a href="https://cribbb.com/philipbrown">@philipbrown</a>!</p>',
    rtrim($text));
}

As you can see, when a string of text contains @philipbrown, it should be rendered as a link.

Fortunately we have to do very little to get this functionality working as the good people who created the CommonMark package have built their package around extensibility.

We can very easily include our own inline parsing routine to pick out @mentions and convert them to HTML links. In fact, if we take a look at the documentation, we can see that an example as already been provided for us!

I’m going to create a new namespace for the CommonMark extensions that I’m going to be using to keep them separate from the main Service class. Here is the MentionParser.php extension:

<?php namespace Cribbb\Infrastructure\Services\Discussion\CommonMarkExtensions;

use League\CommonMark\ContextInterface;
use League\CommonMark\InlineParserContext;
use League\CommonMark\Inline\Element\Link;
use League\CommonMark\Inline\Parser\AbstractInlineParser;

class MentionParser extends AbstractInlineParser
{
    /**
     * @return array
     */
    public function getCharacters()
    {
        return ["@"];
    }

    /**
     * Parse @mentions from a string of text
     * https://github.com/thephpleague/commonmark/blob/gh-pages/customization/inline-parsing.md#example
     *
     * @param ContextInterface $context
     * @param InlineParserContext $inlineContext
     * @return bool
     */
    public function parse(
        ContextInterface $context,
        InlineParserContext $inlineContext
    ) {
        $cursor = $inlineContext->getCursor();

        $previousChar = $cursor->peek(-1);

        if ($previousChar !== null && $previousChar !== " ") {
            return false;
        }

        $previousState = $cursor->saveState();

        $cursor->advance();

        $handle = $cursor->match("/^\w+/");

        if (empty($handle)) {
            $cursor->restoreState($previousState);

            return false;
        }

        $profileUrl = "https://cribbb.com/" . $handle;

        $inlineContext->getInlines()->add(new Link($profileUrl, "@" . $handle));

        return true;
    }
}

This is basically just copy and pasted directly from the documentation. You will notice that I’ve hardcoded the https://cribbb.com/ line. This will be updated once I’ve actually written the routes of the application.

In the test file we can update the setUp method to include the extension:

use League\CommonMark\DocParser;
use League\CommonMark\Environment;
use League\CommonMark\HtmlRenderer;
use Cribbb\Infrastructure\Services\Discussion\CommonMarkParser;
use Cribbb\Infrastructure\Services\Discussion\CommonMarkExtensions\MentionParser;

class CommonMarkParserTest extends \PHPUnit_Framework_TestCase
{
    /** @var Parser */
    private $service;

    public function setUp()
    {
        $environment = Environment::createCommonMarkEnvironment();
        $environment->addInlineParser(new MentionParser());
        $parser = new DocParser($environment);
        $renderer = new HtmlRenderer($environment);
        $this->service = new CommonMarkParser($environment, $parser, $renderer);
    }
}

Now if you run the previous test again, you should see that the service will automatically link @mentions!

/** @test */
public function should_render_mention()
{
    $text = $this->service->render('You should follow @philipbrown!');

    $this->assertEquals(
    '<p>You should follow <a href="https://cribbb.com/philipbrown">@philipbrown</a>!</p>',
    rtrim($text));
}

Extracting @mentions

Finally I want to be able to extract the @mentions from a post so I can send a notification to the user to inform them that they were mentioned.

The test for this bit of functionality looks like this:

/** @test */
public function should_extract_users()
{
    $users = $this->service->users('You should follow @philipbrown and @jack');

    $this->assertEquals(['philipbrown', 'jack'], $users);
}

The users() method should return an array of usernames without the @ (because we need to search the database for the username without the @).

To implement this functionality we can use a regex that will search the text and return the correct result:

/**
 * Extract the users from the text
 *
 * @param string $text
 * @return array
 */
public function users($string)
{
    $pattern = '/(^|[^a-z0-9_])[@＠]([a-z0-9_]{1,20})([@＠\xC0-\xD6\xD8-\xF6\xF8-\xFF]?)/iu';

    preg_match_all($pattern, $string, $result);

    return $result[2];
}

I hate trying to write regex patterns and so I turned once again to the beautiful world of Open Source to find this example.

Now if you run through all the tests that we’ve written today, you should seem them all pass green!

Conclusion

Markdown is a really great way of allowing your users to format their text whilst making it easy for you to convert it to HTML to be rendered back. I really love how widespread Markdown is becoming as a standard across websites and applications.

In consumer web applications @mentions are like table stakes and have become a default feature of any type of application that involves interactivity.

In today’s tutorial we looked at creating a Domain Service for parsing text into the appropriate format. When dealing with a generic service that is not important to the Domain, its usually better to have it as a Domain Service, rather than adding it to a Domain Object.

We also saw how it is good practice to always write an interface when creating a Domain Service. Even if you only ever plan to have a single implementation it’s important to insulate your code to prevent it being coupled to a certain implementation.

The interface protects the domain first and foremost. The ability to swap the implementation is really just a side benefit.

As we progress in the development of Cribbb we will more than likely return to processing posts and adding Markdown extensions. But for now we’ve got the foundation in place to deal with basic formatting and rendering input to HTML.

This is a series of posts on building an entire Open Source application called Cribbb. All of the tutorials will be free to web, and all of the code is available on GitHub.