Large-Scale Reconnaissance Methodology¶
Overview¶
Large-scale reconnaissance is essential for analyzing enterprise targets with extensive digital infrastructure. This methodology covers systematic approaches to discovering and analyzing attack surfaces spanning hundreds or thousands of assets.
Pre-Reconnaissance Planning¶
1. Scope Definition¶
# Define primary targets and scope boundaries
echo "Primary Target: nba.com" > scope.txt
echo "Include: *.nba.com, *.wnba.com, *.gleague.nba.com" >> scope.txt
echo "Exclude: Social media accounts, third-party CDNs" >> scope.txt
2. Resource Planning¶
- Time Allocation: 2-4 hours for initial enumeration
- Storage Requirements: 50-100MB for output files per 1000 subdomains
- Compute Resources: Parallel processing capabilities
- Rate Limiting: Respect target infrastructure limits
3. Tool Preparation¶
# Ensure all reconnaissance tools are available and updated
export PATH=$PATH:/home/pierce/go/bin
subfinder -version
assetfinder --help
httpx -version
nuclei -version
Phase 1: Subdomain Discovery¶
1. Multi-Source Enumeration¶
# Primary subdomain discovery with subfinder
subfinder -d nba.com -all -recursive -o nba_subdomains_subfinder.txt
# Supplementary discovery with assetfinder
assetfinder --subs-only nba.com > nba_subdomains_assetfinder.txt
# Combine and deduplicate results
cat nba_subdomains_*.txt | sort -u > nba_all_subdomains.txt
2. Advanced Discovery Techniques¶
# Certificate transparency logs
curl -s "https://crt.sh/?q=%.nba.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > nba_crt_subdomains.txt
# DNS bruteforcing for additional discovery
amass enum -passive -d nba.com -o nba_amass_subdomains.txt
# Merge all sources
cat nba_*_subdomains.txt | sort -u > nba_final_subdomains.txt
3. Statistics and Analysis¶
# Generate discovery statistics
echo "Total subdomains discovered: $(wc -l < nba_final_subdomains.txt)"
echo "Subfinder results: $(wc -l < nba_subdomains_subfinder.txt)"
echo "Assetfinder results: $(wc -l < nba_subdomains_assetfinder.txt)"
echo "Certificate transparency: $(wc -l < nba_crt_subdomains.txt)"
Phase 2: Service Discovery and Fingerprinting¶
1. HTTP Service Discovery¶
# Discover live web services
httpx -l nba_final_subdomains.txt -o nba_live_services.txt
# Enhanced fingerprinting with additional data
httpx -l nba_final_subdomains.txt \
-title -tech-detect -status-code -content-length \
-o nba_web_services.json -json
2. Service Categorization¶
# Extract different service types from httpx output
jq -r 'select(.tech != null) | "\(.url) - \(.tech[])"' nba_web_services.json > nba_technology_stack.txt
# Categorize by response codes
jq -r 'select(.status_code == 200) | .url' nba_web_services.json > nba_200_services.txt
jq -r 'select(.status_code == 403) | .url' nba_web_services.json > nba_403_services.txt
jq -r 'select(.status_code == 404) | .url' nba_web_services.json > nba_404_services.txt
3. Pattern Analysis¶
# Identify service patterns
grep -E "(api|dev|staging|test|admin)" nba_live_services.txt > nba_interesting_services.txt
grep -E "(cms|portal|dashboard|admin)" nba_live_services.txt > nba_admin_services.txt
grep -E "(cdn|static|assets|media)" nba_live_services.txt > nba_cdn_services.txt
Phase 3: Detailed Analysis¶
1. Infrastructure Mapping¶
# Network analysis
nmap -sn --top-ports 1000 -iL nba_ip_addresses.txt > nba_network_scan.txt
# Service port scanning for critical assets
nmap -sV -sC --top-ports 1000 $(head -20 nba_critical_ips.txt) > nba_detailed_scan.txt
2. Technology Stack Analysis¶
# Extract and analyze technology patterns
grep -i "nginx\|apache\|cloudflare\|akamai" nba_technology_stack.txt | sort | uniq -c
grep -i "react\|angular\|vue\|wordpress\|drupal" nba_technology_stack.txt | sort | uniq -c
3. Geographic Distribution¶
# Analyze IP geolocation (if needed)
while read ip; do
curl -s "http://ip-api.com/json/$ip" | jq -r '"\(.query) - \(.country) - \(.org)"'
done < nba_unique_ips.txt > nba_geolocation.txt
Phase 4: Vulnerability Discovery¶
1. Automated Vulnerability Scanning¶
# Comprehensive nuclei scan
nuclei -l nba_live_services.txt \
-tags exposure,config,debug,files,logs \
-severity medium,high,critical \
-o nba_vulnerabilities.txt
# Specific scans for high-value targets
nuclei -l nba_admin_services.txt \
-tags auth-bypass,sqli,xss,ssrf \
-severity low,medium,high,critical \
-o nba_admin_vulnerabilities.txt
2. Manual Investigation Priorities¶
# Create prioritized target lists
echo "=== HIGH PRIORITY TARGETS ===" > nba_investigation_priorities.txt
grep -E "(admin|api|dev|staging)" nba_live_services.txt >> nba_investigation_priorities.txt
echo -e "\n=== TECHNOLOGY-SPECIFIC TARGETS ===" >> nba_investigation_priorities.txt
grep -E "(graphql|api|cms)" nba_technology_stack.txt >> nba_investigation_priorities.txt
echo -e "\n=== ERROR RESPONSES ===" >> nba_investigation_priorities.txt
head -10 nba_403_services.txt >> nba_investigation_priorities.txt
3. Client-Side Analysis¶
# Extract JavaScript files from live services
while read url; do
curl -s "$url" | grep -oE "https?://[^\"']*\.js" >> nba_js_files.txt
done < nba_200_services.txt
# Analyze JavaScript files for sensitive information
sort -u nba_js_files.txt | while read js_url; do
echo "Analyzing: $js_url"
curl -s "$js_url" | grep -iE "(api|key|secret|token|project.*id)" && echo "FOUND: $js_url"
done > nba_js_analysis.txt
NBA Case Study Results¶
Discovery Statistics¶
- Total Subdomains: 1,132 domains discovered
- Live Services: 779 active HTTP/HTTPS services
- Technology Diversity: 15+ different technology stacks identified
- Geographic Distribution: Primary US hosting with global CDN presence
Critical Findings Distribution¶
Service Category | Count | High-Value Targets
======================== | ===== | ==================
API Endpoints | 47 | 12 critical
Admin/CMS Interfaces | 23 | 8 restricted access
Development Environments | 15 | 7 potentially exposed
Static/CDN Services | 234 | 3 misconfigurations
Team-Specific Domains | 78 | 15 worthy investigation
Technology Stack Analysis¶
Technology | Prevalence | Security Implications
============== | ========== | ====================
Akamai CDN | 45% | Bot protection, caching
CloudFront | 23% | AWS integration
React.js | 18% | Client-side vulnerabilities
WordPress | 8% | Plugin vulnerabilities
Custom APIs | 12% | Business logic flaws
Automation and Scaling¶
1. Automated Pipeline¶
#!/bin/bash
# Large-scale reconnaissance automation script
TARGET_DOMAIN=$1
OUTPUT_DIR="recon_${TARGET_DOMAIN}_$(date +%Y%m%d)"
mkdir -p "$OUTPUT_DIR"
cd "$OUTPUT_DIR"
# Phase 1: Subdomain Discovery
echo "[+] Starting subdomain discovery..."
subfinder -d "$TARGET_DOMAIN" -all -o subdomains_subfinder.txt
assetfinder --subs-only "$TARGET_DOMAIN" > subdomains_assetfinder.txt
cat subdomains_*.txt | sort -u > all_subdomains.txt
# Phase 2: Service Discovery
echo "[+] Discovering live services..."
httpx -l all_subdomains.txt -o live_services.txt
httpx -l all_subdomains.txt -json -o web_services.json
# Phase 3: Vulnerability Scanning
echo "[+] Running vulnerability scans..."
nuclei -l live_services.txt -tags exposure,config -o vulnerabilities.txt
# Phase 4: Reporting
echo "[+] Generating summary report..."
echo "Reconnaissance Summary for $TARGET_DOMAIN" > summary.txt
echo "=======================================" >> summary.txt
echo "Total subdomains: $(wc -l < all_subdomains.txt)" >> summary.txt
echo "Live services: $(wc -l < live_services.txt)" >> summary.txt
echo "Vulnerabilities found: $(wc -l < vulnerabilities.txt)" >> summary.txt
echo "[+] Reconnaissance complete. Results in $OUTPUT_DIR"
2. Parallel Processing¶
# Split large subdomain lists for parallel processing
split -l 100 all_subdomains.txt subdomain_chunk_
# Process chunks in parallel
for chunk in subdomain_chunk_*; do
httpx -l "$chunk" -o "results_$(basename $chunk).txt" &
done
wait
# Combine results
cat results_subdomain_chunk_*.txt > all_results.txt
3. Resource Management¶
# Monitor resource usage during large scans
#!/bin/bash
while true; do
echo "$(date): CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}') Memory: $(free -h | grep Mem | awk '{print $3}')"
sleep 60
done > resource_usage.log &
# Rate limiting for respectful scanning
export HTTPX_RATE_LIMIT=50 # requests per second
export NUCLEI_RATE_LIMIT=30
Analysis and Reporting¶
1. Statistical Analysis¶
# Generate comprehensive statistics
python3 << EOF
import json
# Load httpx results
with open('web_services.json') as f:
services = [json.loads(line) for line in f]
# Analyze status codes
status_codes = {}
for service in services:
code = service.get('status_code', 'unknown')
status_codes[code] = status_codes.get(code, 0) + 1
print("Status Code Distribution:")
for code, count in sorted(status_codes.items()):
print(f" {code}: {count}")
# Analyze technologies
technologies = {}
for service in services:
tech_list = service.get('tech', [])
for tech in tech_list:
technologies[tech] = technologies.get(tech, 0) + 1
print("\nTop Technologies:")
for tech, count in sorted(technologies.items(), key=lambda x: x[1], reverse=True)[:10]:
print(f" {tech}: {count}")
EOF
2. Visualization¶
# Create visual representations of findings
python3 << EOF
import matplotlib.pyplot as plt
import json
# Load data and create charts
# (Implementation depends on specific visualization needs)
EOF
3. Report Generation¶
# Generate markdown report
cat << EOF > reconnaissance_report.md
# Large-Scale Reconnaissance Report
**Target**: $TARGET_DOMAIN
**Date**: $(date)
## Executive Summary
- **Total Subdomains**: $(wc -l < all_subdomains.txt)
- **Live Services**: $(wc -l < live_services.txt)
- **Vulnerabilities**: $(wc -l < vulnerabilities.txt)
## Key Findings
$(head -10 vulnerabilities.txt)
## Technology Stack
$(grep -i "nginx\|apache\|cloudflare" web_services.json | head -5)
## Recommendations
1. Address identified vulnerabilities
2. Review exposed development environments
3. Implement consistent security controls
EOF
Lessons Learned from NBA Analysis¶
1. Scale Considerations¶
- Large Attack Surface: 1,000+ subdomains require systematic approach
- Resource Management: Parallel processing essential for efficiency
- Rate Limiting: Respectful scanning prevents blocking
- Storage Planning: Significant disk space needed for comprehensive logs
2. Discovery Effectiveness¶
- Multi-Source Enumeration: Different tools find different assets
- Certificate Transparency: Valuable for historical subdomain discovery
- Technology Fingerprinting: Essential for prioritizing targets
- Pattern Recognition: Automated categorization saves time
3. Analysis Priorities¶
- Development Environments: Often have weaker security controls
- API Endpoints: High-value targets for business logic flaws
- Admin Interfaces: Critical for privilege escalation
- Third-Party Integrations: Expanded attack surface
4. Automation Benefits¶
- Consistency: Systematic approach prevents missed assets
- Scalability: Handles enterprise-level infrastructure
- Reproducibility: Can be repeated for monitoring changes
- Documentation: Automatic generation of comprehensive reports
Best Practices¶
1. Ethical Considerations¶
- Always respect target infrastructure limits
- Use appropriate rate limiting
- Focus on discovery, not exploitation
- Follow responsible disclosure practices
2. Technical Excellence¶
- Validate tool configurations before large scans
- Implement proper error handling
- Monitor resource usage during execution
- Maintain detailed logs for analysis
3. Continuous Improvement¶
- Regular tool updates and capability expansion
- Documentation of lessons learned
- Methodology refinement based on results
- Integration of new discovery techniques
Classification: Internal Methodology
Last Updated: September 2025
Validated Against: NBA enterprise infrastructure (1,132 subdomains)