Detecting Malicious Microsoft Office Macro Documents

For the past few months I have been looking into macro enabled Office documents and during that time I have detected hundreds of malicious documents. This post just highlights what to look out for so it might benefit some of you if deciding to notify or quarantine mail in your environment. I’ve also did a quick analysis on a Word2010 formatted document I received last week.

So what are Macros?
Macros are a series of commands that can be run automatically to perform a task. Macro code is embedded in Office documents written in a programming language known as Visual Basic for Applications (VBA). Macros could be used maliciously to drop malware, download malware, etc. Malicious macro files usually are received in Word documents or Excel spreadsheets but other formats do exist though I have never encountered them. Once a malicious document is opened only a single click is next required for the macro code to run.

Automating Macros
Visual Basic has reserved names for launching code when documents are opened. These names are the key to detect possible malicious code. Sometimes are used for legitimate purposes but generally we should consider them dangerous. For Word the reserved names that could be used maliciously are AutoOpen() and Document_Open() and for Excel the reserved names are Auto_Open() and Workbook_Open(). These days malicious documents are using AutoOpen() and Auto_Open() but Document_Open() and Workbook_Open() could also be used.

Below is an example in Word document where AutoOpen() subroutine is set in Modules-NewMacros

The macros could also be added in the “ThisDocument” section and then NewMacros section is not really required

Similarly in Excel the subroutine Workbook_Open() would be in the “ThisWorkbook” section and the Module1 section is not required

What to look for
Below is a table of the kind of strings to search for based on the extension and file format.

Format Reserved Names Embedded in Extensions
Word 2003 AutoOpen
n/a Doc dot*
Excel 2003 Auto_Open
n/a Xls xlt
Word 2010 AutoOpen
vbaProject.bin Docm dotm* doc (renamed)
Excel 2010 Auto_Open
vbaProject.bin Xls xlsb xltm

*Only applies when using Document_Open name.

Word 2003 also supports saving macro enabled documents to be saved as XML extension files which are able to run on Word 2010. XML files can also be renamed to a doc extension. The macro code in XML is stored as base64 and the string to search for would be w:macrosPresent=”yes”

Office 2010 format is not a binary format like Office 2003 documents. Office 2010 documents are an Office Open XML (OOXML) format which was introduced with Microsoft Office 2007. Office Open XML is a zipped, XML-based file format so string “vbaProject.bin” would need to be searched in the initial file. Within this vbaProject.bin file the reserved subroutine names will be found.

Couple of months ago a new macro based documents have been seen in the wild. These documents were web page based formatted documents saved as MHT files (Single File Web Page) and then renamed to a doc. Strings you could search for are MIME-Version, Content-Location and x-mso. I have not seen xls extension being used in the wild, most likely because it adds another warning when opened.

When saving macro based documents as HTML files (Web Page) the file extension could be renamed from html to doc or xls. The editdata.mso is a zlib compressed file which contains the macros. The mso file could be called anything so not dependent on this name. If the mso file was to be dropped but some other means the macro document contents would look like this below

<link rel=Edit-Time-Data href="C:/Temp/editdata.mso">
<body>Will open Windows Calculator to test macros</body>

If the mso file was to be downloaded remotely an extra warning would be given.

<link rel=Edit-Time-Data href="">
<body>Will open Windows Calculator to test macros</body>


Malicious Word 2010 Document “email_message.doc” Analysis
I’ve never detected an Office 2010 formatted document till now. Pretty much every document happens to be in Word 2003 format. Below is some quick analysis I did just to highlight the unusual properties taken.

The “email_message.doc” I detected last week sent with a doc extension. Office 2010 macro enabled Word documents by default takes a docm extension. Once this particular malicious document has been opened you’ll see this content

Looking into the macros we see a new technique used to obfuscate its code not seen before (as far as know). In the “NewMacros” section the code can be clearly seen dropping the code then executing it.

We also see pretty much the same code in the “ThisDocument” section.

The line of code of real importance is

dll = Base64Decode(UserForm1.TextBox1)

Here is reads the encoded base64 string from UserForm1.TextBox1 and decodes it before writing to disk and executing it.

Even though the same macro codes are in “ThisDocument” and “NewMacros” section the code in “NewMacros” will not work due to using the reserved macro subroutine name “Document_Open” which only works when used in the “ThisDocument” section.

Final part of the macro code in the malicious document runs a subrountine ClearDocPasteText(“”) which clears the document contents which end up viewing a blank document.

Uploading the Word document to VirusTotal yesterday detected 33/55 and the dropped binary file detected 38/55

Finally some strings in the binary stand out which suggest this malware spams out emails.

00027EB1   0042A2B1      0   MailAddr
00027EBE   0042A2BE      0
00027ED2   0042A2D2      0   SendInBackgr
00027EE0   0042A2E0      0   MailAsSmtpServer
00027EF2   0042A2F2      0   MailAsSmtpClient
00027F04   0042A304      0   UploadViaHttp 
00027F13   0042A313      0   MailViaMapi 
00027F20   0042A320      0   MailViaMailto
00027F2F   0042A32F      0   SmtpServer
00027F3F   0042A33F      0   SmtpPort
00027F4D   0042A34D      0   SmtpAccount
00027F5E   0042A35E      0   SmtpPassword
00027F70   0042A370      0   HttpServer
00027F7F   0042A37F      0
00027F9B   0042A39B      0   HttpPort
00027FA9   0042A3A9      0   HttpAccount
00027FBA   0042A3BA      0   HttpPassword
00027FCC   0042A3CC      0   AttachBugRep 
00027FDA   0042A3DA      0   AttachBugRepFile 
00027FEC   0042A3EC      0   DelBugRepFile 
00027FFB   0042A3FB      0   BugRepSendAs
0002800C   0042A40C      0   bugreport.txt BugRepZip
00028029   0042A429      0   ScrShotDepth
0002803B   0042A43B      0   ScrShotAppOnly 
0002804B   0042A44B      0   ScrShotSendAs
0002805D   0042A45D      0   screenshot.png
0002806C   0042A46C      0   ScrShotZip



  1. Man, you haven’t even begun scratching the surface of the horrendous crap that is the world of Office macros…

    While macros are most often used in Word and Excel, just about every Office application supports them. PowerPoint, Access, Visio, you name it…

    Stuff like Document_Open MUST be in the ThisDocument module – it won’t work outside of it. Regular stuff like AutoOpen can be anywhere.

    While AutoOpen and Document_Open have the advantage that they are automatically executed when the document is opened, there is a HUGE number of “auto” macro names that can be executed in various “convenient” circumstances. Names like AutoClose and Document_Close execute when the document is closed, that’s obvious. But there is much, much more. “Auto” macros can be executed when a new document is created (AutoNew, Document_New). Macros can be executed when specific actions are performed – e.g., a macro named FileClose will execute when the document is closed via File/Close from the menu – but not when the document is closed by clicking on the [x] button. You can “intercept” ANY action invoked via the menus like this – and many of the actions invoked via any of the buttons.

    And don’t even let me get started on the awful mess that is the internal representation of the macros…

    You see, an Office macro module has THREE different “code” areas, each containing the complete functionality of the macro and each being “executable” under different conditions.

    If a macro has ever been executed (or even started to be executed by single-stepping into it with the debugger), its code is compiled into “execodes” streams. If you look at the OLE2 stream tree, these streams have the name “__Exe”, followed by digits (e.g., “__Exe0”, “__Exe1”, etc.). This is a kind of “compiled and linked p-code” (the exact format is not known to me) that can be executed directly by the Office application. This part of the code is run ONLY if these streams exist AND are created by the exact same version of the Office application as the one that is opening them. Most of the time this happens rarely (e.g., only on your own machine and only with macros you’ve already used).

    Normally, the macro text you enter into the VBA Editor is compiled into some sort of p-code (like Java bytecode, but different, of course). It resides elsewhere (i.e., unlike the execodes) – it resides in the module stream name in the OLE2 file (e.g., “ThisDocument”). Interestingly, what the VBA Editor displays is the de-compiled p-code. There is a one-to-one relationship – modifying a line of code with the VBA Editor results in it being immediately compiled into p-code and modifying the p-code would result into different source being displayed by the VBA Editor.

    Usually, the p-code is what is being executed (unless proper execodes are present, as explained above, but this is rare) UNLESS you open the document with an Office application that has a different major version of VBA than the one that has created the document (e.g., VBA6 vs VBA5). Then some different shit happens.

    You see, the OLE2 stream containing the macro module also contains the source of that module – compressed trivially and attached to the end of the stream. Normally, this source code is not used – not even by the VBA Editor. But, if you open the document with an Office application that sports a different major VBA version from the one that has created the document, the source code is used to re-compile the p-code and then the new p-code is executed.

    This is because when changing VBA versions Microsoft, in their infinite wisdom, inserted a bunch of new p-code opcodes in the middle of the existing ones (instead of at the end), so the opcodes of a large number of p-code instructions suddenly changed. The re-compilation is necessary in order to achieve VBA portability across major VBA versions.

    So, in practice, you can have 3 different programs in the same macro, each being executed under different circumstances. (You can’t achieve this from the Office application; you’d have to doctor the OLE2 image containing the macros.)

    Oh, and speaking of Excel, not sure about the latest versions, but the previous ones also support a completely different sort of macros, called Excel Formula macros. The support for running them continued even when the support for creating them was dropped (in Excel97).

    For some other macro stuff (somewhat trivial and somewhat biased toward viruses rather than malware in general), take a look at some of my papers:

Leave a Reply

Your email address will not be published. Required fields are marked *