[Core] CI step added to check external github url#9831
[Core] CI step added to check external github url#9831msarfraz wants to merge 3 commits intoAzure:mainfrom
Conversation
️✔️Azure CLI Extensions Breaking Change Test
|
|
Hi @msarfraz, |
|
Thank you for your contribution! We will review the pull request and get back to you soon. |
|
The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR. Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions). pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>
|
|
There was a problem hiding this comment.
Pull request overview
Adds a CI guardrail to prevent introducing new raw.githubusercontent.com dependencies (problematic in network-isolated environments) by scanning PR diffs for forbidden external GitHub raw URLs and maintaining an exclusions list for known/intentional occurrences.
Changes:
- Added
scripts/ci/validate_external_source_urls.pyto scan added diff lines for forbiddenraw.githubusercontent.comURLs. - Added
scripts/ci/external_url_exclusions.jsonto configure path-based exclusions for the validator. - Added a new
CheckExternalUrlsAzure Pipelines job to run the validator in PR builds.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| scripts/ci/validate_external_source_urls.py | New diff-based validator that fails CI when forbidden raw GitHub URLs are added. |
| scripts/ci/external_url_exclusions.json | Exclusions configuration consumed by the validator to skip docs/tests/etc. |
| azure-pipelines.yml | New pipeline job to execute the validator during CI. |
| "scripts/*" | ||
| ], | ||
| "_justification": "CI and tooling scripts are maintained separately from extension source content and are excluded to avoid self-flagging the validator configuration." |
There was a problem hiding this comment.
The exclusion rule "scripts/*" effectively disables validation for all scripts (including CI tooling). That makes it easy for new raw.githubusercontent.com usages to be introduced in scripts without being flagged. Consider narrowing this exclusion to only the specific files that need to self-contain the checker (e.g., the validator script/config), rather than excluding the entire scripts/ tree.
| "scripts/*" | |
| ], | |
| "_justification": "CI and tooling scripts are maintained separately from extension source content and are excluded to avoid self-flagging the validator configuration." | |
| "scripts/ci/external_url_exclusions.json" | |
| ], | |
| "_justification": "The external URL validator configuration is excluded so it can self-contain the allowlist and justification text without self-flagging." |
| def _run_diff(src: str, tgt: str, cached: bool = False) -> str: | ||
| cmd = ["git", "diff", "--unified=0", "--no-color"] | ||
| if cached: | ||
| cmd.append("--cached") | ||
| else: | ||
| cmd.append(f"{tgt}...{src}") | ||
|
|
||
| proc = subprocess.run( | ||
| cmd, | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE, | ||
| text=True, | ||
| check=False, | ||
| ) | ||
| if proc.returncode != 0: | ||
| raise RuntimeError(proc.stderr.strip() or "git diff failed") | ||
| return proc.stdout |
There was a problem hiding this comment.
git diff {tgt}...{src} relies on computing a merge-base. In CI with shallow fetches, the merge-base often isn't present and this form can fail (or produce an incomplete diff). Either fetch sufficient history/unshallow so the merge-base is available, or switch to a diff form that doesn't require merge-base (e.g., tgt..src) depending on the intended semantics.
| - job: CheckExternalUrls | ||
| displayName: "Check External Source URLs" | ||
| pool: |
There was a problem hiding this comment.
This job runs for non-PR builds as well, and then exits early in the script. To avoid consuming an extra agent/job on every branch build, add a job-level condition similar to other PR-only jobs (e.g. eq(variables['Build.Reason'], 'PullRequest')) instead of relying on an in-script check.
| # If CI is set to shallow fetch, target branch should be explicitly fetched. | ||
| # External URL exclusions are maintained in scripts/ci/external_url_exclusions.json. | ||
| git fetch origin --depth=1 $(System.PullRequest.TargetBranch) |
There was a problem hiding this comment.
The pipeline fetches the target branch with --depth=1, but the validator uses a git diff form that needs merge-base history. With shallow history this can cause the validation step to fail unexpectedly. Consider increasing fetch depth (or unshallow) enough to compute the merge base, or adjust the validator to avoid relying on merge-base diffs.
| # If CI is set to shallow fetch, target branch should be explicitly fetched. | |
| # External URL exclusions are maintained in scripts/ci/external_url_exclusions.json. | |
| git fetch origin --depth=1 $(System.PullRequest.TargetBranch) | |
| # If CI is set to shallow fetch, ensure enough history is available | |
| # for validators that may need to compute a merge base. | |
| if [[ "$(git rev-parse --is-shallow-repository)" == "true" ]]; then | |
| git fetch origin --unshallow | |
| fi | |
| # External URL exclusions are maintained in scripts/ci/external_url_exclusions.json. | |
| git fetch origin "$(System.PullRequest.TargetBranch):refs/remotes/origin/$(System.PullRequest.TargetBranch)" |
| parser = argparse.ArgumentParser(description="Check diff for forbidden raw github URL usage.") | ||
| parser.add_argument("--src", default="HEAD", help="Source ref/commit for git diff.") | ||
| parser.add_argument("--tgt", default="HEAD~1", help="Target ref/commit for git diff.") | ||
| parser.add_argument("--cached", action="store_true", help="Check staged changes in git index.") | ||
| args = parser.parse_args() | ||
|
|
||
| try: | ||
| _get_excluded_path_patterns() | ||
| diff_text = _run_diff(src=args.src, tgt=args.tgt, cached=args.cached) | ||
| except Exception as ex: # pylint: disable=broad-except | ||
| if args.cached: | ||
| print(f"Unable to evaluate staged diff: {ex}", file=sys.stderr) | ||
| else: | ||
| print(f"Unable to evaluate diff between '{args.tgt}' and '{args.src}': {ex}", file=sys.stderr) | ||
| return 1 | ||
|
|
||
| violations = _find_violations(diff_text) | ||
| if not violations: | ||
| print("No forbidden external github URL found in added lines.") | ||
| return 0 |
There was a problem hiding this comment.
User-facing output and argparse description use inconsistent capitalization ("github" vs "GitHub"). Since this is a CI validation message, it’s worth standardizing to "GitHub" throughout for clarity/professionalism.
| - job: CheckExternalUrls | ||
| displayName: "Check External Source URLs" |
There was a problem hiding this comment.
PR description mentions migrating VM image alias URLs from raw.githubusercontent.com to Azure Blob Storage, but the changes in this PR appear to only add the CI validator + exclusions/pipeline job. Either update the PR description to match the actual scope, or include the alias migration changes here if they’re intended to be part of this PR.
CodeGen Tools Feedback CollectionThank you for using our CodeGen tool. We value your feedback, and we would like to know how we can improve our product. Please take a few minutes to fill our codegen survey |
This reverts commit cdc1547.
Description
This PR removes the dependency on GitHub (raw.githubusercontent.com) VM image aliases, replacing it with Azure Blob Storage (azcliprod.blob.core.windows.net). This change enables Azure CLI to work properly in network isolated environments where GitHub access is blocked.
In addition, new validation added in CI pipeline to flag if any raw.githubusercontent.com URL is used in the code.
Background
In enterprise environments with strict network isolation policies, access to raw.githubusercontent.com is not allowed.
Changes
VM Image Alias Migration
Before:
https://raw.githubusercontent.com/Azure/azure-rest-api-specs/master/arm-compute/quickstart-templates/aliases.json
After:
https://azcliprod.blob.core.windows.net/cli/vm/aliases_master.json
Before:
https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/arm-compute/quickstart-templates/aliases.json
After:
https://azcliprod.blob.core.windows.net/cli/vm/aliases.json
This checklist is used to make sure that common guidelines for a pull request are followed.
Related command
General Guidelines
azdev style <YOUR_EXT>locally? (pip install azdevrequired)python scripts/ci/test_index.py -qlocally? (pip install wheel==0.30.0required)For new extensions:
About Extension Publish
There is a pipeline to automatically build, upload and publish extension wheels.
Once your pull request is merged into main branch, a new pull request will be created to update
src/index.jsonautomatically.You only need to update the version information in file setup.py and historical information in file HISTORY.rst in your PR but do not modify
src/index.json.