Open
Conversation
Add safe archive extraction utilities that validate member paths before extraction to prevent path traversal attacks (tar-slip/zip-slip). - Add msticpy/common/archive_utils.py with validate_archive_member_path(), safe_tar_extract(), and safe_zip_extract() functions - Reject absolute paths, parent directory references (..), and resolved paths that escape the destination directory - Reject symlinks/hardlinks pointing outside the destination (defense-in-depth) - Patch geoip.py _extract_to_folder() to use safe_tar_extract() - Patch mordor_driver.py _extract_zip_file_to_df() to use safe_zip_extract() - Patch sentinel_query_reader.py download to use safe_zip_extract() - Add 14 unit tests covering valid paths, traversal rejection, symlink validation, and both tar/zip variants
Re-serialized all 22 .pkl files with current pandas/numpy versions. Old pickles stored dtype align=0 (int), NumPy 2.4+ expects align=False (bool), causing VisibleDeprecationWarning in 5 test files. The one file that could not be re-saved (tests/testdata/ip_entities.pkl) references a removed module (msticpy.nbtools.entityschema) and is not used by any current tests.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds defense-in-depth protections against zip/tar path traversal (“zip slip” / “tar slip”) by introducing centralized safe extraction helpers and updating existing download/extraction code paths to use them. It also refreshes a set of pickled pandas test (and notebook) artifacts to match newer numpy/pandas versions and reduce deprecation warnings.
Changes:
- Added
msticpy.common.archive_utilswith path validation plussafe_zip_extract/safe_tar_extract. - Updated Sentinel GitHub query download and Mordor zip extraction to use safe zip member extraction.
- Updated GeoIP (MaxMind) tar.gz extraction to use safe tar member extraction and refreshed various
.pklfixtures.
Reviewed changes
Copilot reviewed 5 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
msticpy/common/archive_utils.py |
New safe archive extraction utilities used across the codebase. |
msticpy/data/drivers/sentinel_query_reader.py |
Uses safe zip extraction when unpacking the Azure-Sentinel repo archive. |
msticpy/data/drivers/mordor_driver.py |
Uses safe zip extraction when unpacking Mordor dataset archives. |
msticpy/context/geoip.py |
Uses safe tar extraction when unpacking the GeoLite database archive. |
tests/common/test_archive_utils.py |
New unit tests covering safe extraction/path validation behaviors. |
tests/testdata/sent_incidents.pkl |
Refreshed pandas pickle test fixture. |
tests/testdata/mordor/mitre_tact_cache.pkl |
Refreshed pandas pickle cache fixture. |
tests/testdata/localdata/processes_on_host.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/process_tree.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/host_logons.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/failed_logons.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/az_whois.df.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/az_net_comms_df.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/localdata/alerts_list.pkl |
Refreshed pandas pickle fixture. |
tests/testdata/azmondata/query_response.pkl |
Refreshed pickle fixture for Azure Monitor query response tests. |
docs/notebooks/data/sent_incidents.pkl |
Refreshed notebook data pickle. |
docs/notebooks/data/ip_entities.pkl |
Refreshed notebook data pickle. |
- Replace startswith string checks with Path.is_relative_to() for robust containment validation (handles root destinations correctly) - Restrict tar member types to regular files and directories; reject device files, FIFOs, and other special types - Add tarfile filter='data' on Python >= 3.12 to avoid restoring unsafe metadata/permissions - Fix unclosed ZipFile resource leak in sentinel_query_reader.py by wrapping extraction loop in a context manager - Add test for device file member rejection
implicit bytes-to-str conversion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some defense-in-depth fixes for handling zip/tar files.
For msticpy all of the files used are secured on https endpoints (trusted github repos - Azure-Sentinel and OTRF - and Maxmind) so there is no opportunity for an attacker to inset malicious payloads.
However, the code change to validate target paths for extracted files is minimal so have do that here.
Also updating some pickle files used in tests to updated numpy/pandas version to avoid deprecation warning.