Security Musings

Security Musings


Microsoft Word Considered Harmful to HTML


Stephen Northcutt, Brian Corcoran, and Sean Carolan


Summary:
For years Stephen Northcutt and other content providers at the SANS Institute had written their content in Microsoft Word and sent the .doc files to the webmasters for posting. Since Microsoft Word had a translate to HTML function, surely it was not a big deal to convert from Word to Web. Or was it? It turns out using a WYSIWYG HTML editor can save your organization time and money.


Learning the truth about converting from Word to Web

As one of the notes was being posted, Sean Carolan wrote with the following information:
"Would it be possible to convince our authors to use the nvu editor to create content? The process of converting a Word or PDF document to valid xHTML is a bit time-consuming, whereas if we used a WYSIWYG editor such as Nvu, articles could be posted almost immediately with very little time to clean them up. nvu has all the formatting such as bold, italic, center. lists, etc. that MS Word has, but in the end the document will require no extra work to convert for web. Stephen, if you are open to this idea I am willing to work with you and whoever else will be creating content for the site on how to use nvu to create content. I find it actually *easier* than MS word because it's a lot less cluttered. See the attached sample document - it's completely ready to go for web publishing, and only took a few minutes to write up.

Go Ahead and Try It

The SANS Institute has standards like any other organization, but since we work in the field of information security, you know, the industry with the zero day vulnerabilities, we have to be able to adapt, improvise and overcome[2] adversity at the speed our industry requires. We have implemented an organizational policy called a GAATI, for Go Ahead and Try It.


A GAATI works like this:

First, validate you have a real problem that is worth solving
Second, quickly research potential solutions
Third, select a solution that appears likely to solve the problem
Fourth, test fly the solution
Fifth, document the results of the test

In order to maintain an innovation program like GATTI over the long term in any organization, it is important for everyone involved to realize failure is as valuable as success. Failing forward gives you a list of things that do not work and that is valuable. However, knowing what does work is far more valuable. So we can use this scenario to demonstrate a GAATI in action.


First, validate you have a real problem that is worth solving

We typed "problems converting microsoft word to html" into google and 1,050,000 documents were returned. There are a number of utilities available to translate, but we have tried these in the past and they certainly have their issues. Anyway, it is fairly clear with 1 million documents on the subject there is a valid problem worth solving.

Second, quickly research potential solutions

We typed "wysiwyg html editors" into google and in addition to three adword ads at the top of the page, the right hand side was also full of ads. Clearly there is a market for html editors and this also validates there is a real problem. The solution Sean recommended, nvu was number nine on google. Many times the solution you want will be in the top three. So, we followed that with searches like "wysiwyg html editors experience" and "wysiwyg html editors great". It was quickly becoming clear that one source of popular information was the website, webdesign.about.com.

They had a page on Linux/Unix which listed nvu first:
http://webdesign.about.com/od/htmleditors/tp/aatpwyslinux.htm

And also Windows, which listed nvu eighth:
http://webdesign.about.com/od/htmleditors/tp/aatpwyswindows.htm

Since nvu was free, we decided to give it a shot.

Third, select a solution that appears likely to solve the problem

Brian Corcoran is in charge of web tools and standards so he took a quick look. He replied by email: "I had not heard of this but based on a quick look, I'd agree with Sean and the code it makes is 1000% better then the mess Microsoft Word creates, and I think this is a good idea."[3] So everyone agreed it was worth a test.

Fourth, test fly the solution

Testing has two major characteristics whether it works will with our technical environment and whether the tool was usable by the content providers like Stephen Northcutt. Brian and the web team did the technical review. Some issues were found, but they were configuration issues, not problems with the nvu tool. "Users would have to be sure to go into preferences and set it to use XHTML not HTML and be sure it is set to Transitional not Strict in preferences we would also want them to set the Special characters option should be set to use the &#... notation for all special characters, under Advanced. We would also want to be sure that the users are using the paragraph designations and not just using enter everywhere, which should be easy enough to do."[3]

In terms of usability, Stephen Northcutt was able to download the tool and immediately start using it. He struggled a bit with the paragraph issue mentioned above and years of using Microsoft Word's auto spelling correction had weakened his ability to spell words, but with time both of those issues should pass.

Fifth, document the results of the test

And of course, that is exactly what an internal copy of this note is intended to do. The bottom line. If you are producing web content for your organization and you are using Microsoft Word, take some time and talk with your web team. Switching to a wysiwyg html editor like nvu may same time and money. Nvu's web site is located:
http://www.nvu.com/

  1. Email December 20 from Sean Carolan to Stephen Northcutt
  2. US Marine Poster, circa 1960
  3. Email December 20 from Brian Corcoran to Stephen Northcutt