Migrating from GoDaddy/IIS/WCF/C#/MySQL to DigitalOcean/Apache/Flask/Python/SQLite

In a span of a few days I have moved this site off of GoDaddy/IIS/WCF/C#/MySQL and onto DigitalOcean/Apache/Flask/Python/SQLite to gain better performance, consolidate all my websites, and to give a good refactoring.

Reasons

This site is essentially a collection of static pages, and so should be blazingly fast under all circumstances. However, the site never seemed as fast as it should be. I decided to investigate. Numbers were fairly easy to come by thanks to Google Webmaster Tools, which reported average page load was 436ms with a range from 61ms to 854ms. Static sites should not take this long! For background, this site is hosted on GoDaddy specifically as a website, so it should be able to pump through requests like nothing. The webserver is located on the west coast, and since I’m in Michigan, a response takes longer than it would if the server was located on the east coast. Thus, from a testing standpoint it is paramount to take into account this consideration. Fiddler allows this functionality by breaking the request timeline into a series of steps, and the ones that turned out to be particularly interesting were ServerGotRequest and ServerBeginResponse. With this functionality I saw that a request to the home page has a delay of 200ms between the time the server got the response and when the server started processing the response. For comparison, I have a website hosted on DigitalOcean and it has a delay of 30ms! Since the speedup was nearly an order of magnitude in difference that reason is good enough for me to switch.

Another reason is cost saving. If I were to stop my GoDaddy site hosting, I would be saving six dollars a month and I would consolidate all my sites onto a single server.

I also feel more in control. I know that this might not be necessarily true in actuality, as I’m sure DigitalOcean has the right to swoop in and confiscate all my data. However, I like the feeling that I have a shell and sudo access on this box. With GoDaddy I could interact with the webserver through FTP or a mediocre web interface. If something didn’t work out, I felt pretty powerless. Now it is trivial to debug and make changes. A great example of this is when I was debugging my application and why some of the urls weren’t working. Since I am completely new to apache, I heavily relied on logs to let me know what was going wrong. Thankfully, I have a local apache webserver, so I was able to test out any changes before pushing them to production, but there were still several issues that were uncovered once the site was pushed to production. Being able to investigate the error logs helped to fix bugs that slipped through testing.

What’s Migrating

Migrating from MySQL to SQLite with database changes.

Migrating from MySQL to SQLite was only done because SQLite is baked into python and I didn’t feel like installing drivers for MySQL. Besides, I will freely admit that I am not the biggest fan of how I laid out my database as far as naming and column order is concerned. Thus I wanted to preserve all the current data, but rename the columns.

I first downloaded all my data from mysql and stored it in tab delimited files.

echo "SELECT * FROM Comments" | mysql -B -h xxx -u xxx -p db-name > comments.tsv
echo "SELECT * FROM Content" | mysql -B -h xxx -u xxx -p db-name > content.tsv
echo "SELECT * FROM User" | mysql -B -h xxx -u xxx -p db-name > users.tsv

Now the data needs to be put into the SQlite database. If the data is small then it may be possible to manually craft SQL statements, but for any data of significant size, an automated route would be prepared. My first attempt was to use several unix utilities to create the necessary sql statements.

# Since the tabbed delimited data has headers, start at the second line.
tail -n +2 users.tsv |\
awk -F '\t' '{print "INSERT INTO users(userId, name) VALUES("$1",\""$2"\")";}'

It should be immediately obvious that this is not secure and is slow. The insecurity comes from the fact that if a user contained any special characters there could be drastic side effects. SQL injection is not something to be taken lightly. The performance would also suffer because each insert would mean a transaction, so SQLite would write to disk for each inserted record. It is much better to group related inserts into a single transaction.

The best solution is to take advantage of SQlite’s .import function, which will import data from a file into a specified table. This function will take care of renaming of columns, but I also needed to reorder the comments table, which awk -F '\t' '{print $1 "\t" $2 "\t" $3 "\t" $5 "\t" $4;}' took care of nicely. I then dropped the headers from the data files used the following script to load all the data.

.separator '\t'
.import nusers.tsv users
.import ncontent.tsv content
.import ncomments.tsv comments

For reference, for those that want to know what my database schema looks like I have copied it down below. It is definitely debatable whether or not this is optimal.

CREATE TABLE users (
    userId INTEGER PRIMARY KEY AUTOINCREMENT,
    name VARCHAR(100) NOT NULL,
    UNIQUE(name)
);

CREATE TABLE content (
    contentId INTEGER PRIMARY KEY AUTOINCREMENT,
    name VARCHAR(100) NOT NULL,
    UNIQUE(name)
);

CREATE TABLE comments (
    comemntId INTEGER PRIMARY KEY AUTOINCREMENT,
    userId INTEGER NOT NULL,
    contentId INTEGER NOT NULL,
    timestamp DATETIME NOT NULL DEFAULT (datetime('now')),
    comment TEXT NOT NULL,
    FOREIGN KEY(userId) REFERENCES users(userId)
    FOREIGN KEY(contentId) REFERENCES content(contentId)
);

I also opted to convert my stored procedures to prepared statement in code, for no other reason then it makes more semantic sense to me. The stored procedures were simply SELECT and INSERT statements.

Migrating from WCF to Flask

When I first started this site I had WCF handle serving the comments via JSON. I chose WCF because I had to use it for work and it was the only technology that I was familiar with at the time that could implement this simple REST requirement. Fast forward a couple years and it is safe to say that I would never use WCF for a web app on the .NET platform instead of Nancy or ASP.NET Web API. But then why use Flask and make the switch from C# to python? The biggest reason is familiarity. I have already deployed several Flask applications behind Apache and the results have been impressive, so I’m going to continue the trend. And not that it is a huge issue, but I’m not entirely comfortable developing on Mono. There are some discrepancies between Microsoft’s .NET and Mono’s, and I’m overly paranoid that I would hit a snag, as I prefer developing C# on a Windows machine, whereas I prefer python development on linux.

Probably the hardest part of the migration was re-evaluating how the site’s architecture. The WCF implementation would serve the comments after the static content via a request from the client. The benefits to this approach is that comments and the content are completely decoupled. If, for some reason, the WCF application went down, the static content would still be served and the site would be functional. The consequence to this approach is slightly longer time for the page to be completely loaded as there has to be two round trips made to California, one to get the content, and the other to get the comments. This was especially noticeable with the GoDaddy hosting. Contrast the previous architecture with implementing content as templates. Instead of being static, content now have template placeholders that are rendered on the serverside. In total time, this is the quicker method as only a single request is made from the client to get the content and comments. However, there is a possibility that this site might attacked by 10,000 simultaneous requests, which would overload the commenting system and thus the content wouldn’t get served until the database caught up, leaving the site effectively useless. Having content and comments decoupled also allows clients to cache the content, so on their next visit, the site loads even quicker. After much deliberation, I decided to keep the same architecture because comments are not an integral part of the content.

Migrating from IIS to Apache

Honestly, this was the part that I was most worried about. I had a 130 line Web.config file for IIS and I had to port it over to apache. I’m unfamiliar with both technologies so I had to understand what the IIS config was accomplishing and try to find an equivalent mechanism. These are the requirements in a webserver that makes this website run smoothly:

The combination of Google, debugging, and perseverance resulted in a 30 line configuration, and that includes comments and blank lines! It’s so concise that I’ve decided to paste the configuration in its entirety.

<VirtualHost *:80>
    # Redirect the www subdomain to the actual canonical domain
    ServerName www.nbsoftsolutions.com
    Redirect permanent / http://nbsoftsolutions.com/
</VirtualHost>

<VirtualHost *:80>
    ServerName nbsoftsolutions.com
    ServerAlias 192.241.253.241
    DocumentRoot /var/www/nbsoftsolutions
    ErrorDocument 404 /WhereArtThouContent.html
    LogLevel info

    # Route everything with the API request to the flask application
    WSGIScriptAlias /api /usr/local/www/wsgi-scripts/nbsoftsolutions.wsgi
    <Directory /usr/local/www/wsgi-scripts>
        Order allow,deny
        Allow from all
    </Directory>

    # If a url comes in and there exists an html file with the same name,
    # serve that html file
    RewriteEngine On
    RewriteCond /var/www/nbsoftsolutions/$1.html -f
    RewriteRule (.*) $1.html

    # remap example.com to example.com/home
    RedirectMatch ^/$ /home
</VirtualHost>

Migrating from plain text comments to Markdown

I’m a huge fan of markdown, using it repeatedly more and utilizing it for significant projects. I took this migration as the perfect opportunity to migrate away from how comments worked. Even thought the comments worked fine, they were somewhat limited as they were restricted to plain text. The result was that all links had to be pasted in their entirety, and maybe more importantly there is no context formatting as in markdown. A good example of this is a code snippet. A code snippet looks wrong in this font and style. Markdown can provide this alternative.

The amount of code needed to accomplish was quite minimal. The tedious part was making sure that the backend and frontend markdown renders converged to the same format. The reason why there are two renderers is that the frontend is an approximation what the backend will render. If the backend accepted arbitrary rendered markdown from the frontend, it would be easy for someone to inject javascript into all visiting browsers. Thus the backend doesn’t trust the frontend and so re-renders the mark up before storing it in the database.

For the record, I ended going with marked and highlight.js on the frontend and misaka and pygments on the backend with Houdini.py providing the escaping.

Results

10/10. Would migrate from GoDaddy to DigitalOcean again.

Page latency: Old vs New Solution

Page latency: Old vs New Solution

Comments

If you'd like to leave a comment, please email [email protected]