Skip to content

Ianhelle/tar slip did protection 2026 04 27#889

Open
ianhelle wants to merge 6 commits intomainfrom
ianhelle/tar-slip-did-protection-2026-04-27
Open

Ianhelle/tar slip did protection 2026 04 27#889
ianhelle wants to merge 6 commits intomainfrom
ianhelle/tar-slip-did-protection-2026-04-27

Conversation

@ianhelle
Copy link
Copy Markdown
Contributor

Some defense-in-depth fixes for handling zip/tar files.
For msticpy all of the files used are secured on https endpoints (trusted github repos - Azure-Sentinel and OTRF - and Maxmind) so there is no opportunity for an attacker to inset malicious payloads.
However, the code change to validate target paths for extracted files is minimal so have do that here.

Also updating some pickle files used in tests to updated numpy/pandas version to avoid deprecation warning.

Add safe archive extraction utilities that validate member paths
before extraction to prevent path traversal attacks (tar-slip/zip-slip).

- Add msticpy/common/archive_utils.py with validate_archive_member_path(),
  safe_tar_extract(), and safe_zip_extract() functions
- Reject absolute paths, parent directory references (..), and resolved
  paths that escape the destination directory
- Reject symlinks/hardlinks pointing outside the destination (defense-in-depth)
- Patch geoip.py _extract_to_folder() to use safe_tar_extract()
- Patch mordor_driver.py _extract_zip_file_to_df() to use safe_zip_extract()
- Patch sentinel_query_reader.py download to use safe_zip_extract()
- Add 14 unit tests covering valid paths, traversal rejection, symlink
  validation, and both tar/zip variants
Re-serialized all 22 .pkl files with current pandas/numpy versions.
Old pickles stored dtype align=0 (int), NumPy 2.4+ expects align=False
(bool), causing VisibleDeprecationWarning in 5 test files.

The one file that could not be re-saved (tests/testdata/ip_entities.pkl)
references a removed module (msticpy.nbtools.entityschema) and is not
used by any current tests.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds defense-in-depth protections against zip/tar path traversal (“zip slip” / “tar slip”) by introducing centralized safe extraction helpers and updating existing download/extraction code paths to use them. It also refreshes a set of pickled pandas test (and notebook) artifacts to match newer numpy/pandas versions and reduce deprecation warnings.

Changes:

  • Added msticpy.common.archive_utils with path validation plus safe_zip_extract/safe_tar_extract.
  • Updated Sentinel GitHub query download and Mordor zip extraction to use safe zip member extraction.
  • Updated GeoIP (MaxMind) tar.gz extraction to use safe tar member extraction and refreshed various .pkl fixtures.

Reviewed changes

Copilot reviewed 5 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
msticpy/common/archive_utils.py New safe archive extraction utilities used across the codebase.
msticpy/data/drivers/sentinel_query_reader.py Uses safe zip extraction when unpacking the Azure-Sentinel repo archive.
msticpy/data/drivers/mordor_driver.py Uses safe zip extraction when unpacking Mordor dataset archives.
msticpy/context/geoip.py Uses safe tar extraction when unpacking the GeoLite database archive.
tests/common/test_archive_utils.py New unit tests covering safe extraction/path validation behaviors.
tests/testdata/sent_incidents.pkl Refreshed pandas pickle test fixture.
tests/testdata/mordor/mitre_tact_cache.pkl Refreshed pandas pickle cache fixture.
tests/testdata/localdata/processes_on_host.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/process_tree.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/host_logons.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/failed_logons.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/az_whois.df.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/az_net_comms_df.pkl Refreshed pandas pickle fixture.
tests/testdata/localdata/alerts_list.pkl Refreshed pandas pickle fixture.
tests/testdata/azmondata/query_response.pkl Refreshed pickle fixture for Azure Monitor query response tests.
docs/notebooks/data/sent_incidents.pkl Refreshed notebook data pickle.
docs/notebooks/data/ip_entities.pkl Refreshed notebook data pickle.

Comment thread msticpy/common/archive_utils.py Outdated
Comment thread msticpy/common/archive_utils.py Outdated
Comment thread msticpy/data/drivers/sentinel_query_reader.py Outdated
- Replace startswith string checks with Path.is_relative_to() for
  robust containment validation (handles root destinations correctly)
- Restrict tar member types to regular files and directories; reject
  device files, FIFOs, and other special types
- Add tarfile filter='data' on Python >= 3.12 to avoid restoring
  unsafe metadata/permissions
- Fix unclosed ZipFile resource leak in sentinel_query_reader.py
  by wrapping extraction loop in a context manager
- Add test for device file member rejection
@ianhelle ianhelle requested a review from FlorianBracq April 27, 2026 20:33
implicit bytes-to-str conversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants