Skip to content

Large-Scale Reconnaissance Methodology

Overview

Large-scale reconnaissance is essential for analyzing enterprise targets with extensive digital infrastructure. This methodology covers systematic approaches to discovering and analyzing attack surfaces spanning hundreds or thousands of assets.

Pre-Reconnaissance Planning

1. Scope Definition

# Define primary targets and scope boundaries
echo "Primary Target: nba.com" > scope.txt
echo "Include: *.nba.com, *.wnba.com, *.gleague.nba.com" >> scope.txt
echo "Exclude: Social media accounts, third-party CDNs" >> scope.txt

2. Resource Planning

  • Time Allocation: 2-4 hours for initial enumeration
  • Storage Requirements: 50-100MB for output files per 1000 subdomains
  • Compute Resources: Parallel processing capabilities
  • Rate Limiting: Respect target infrastructure limits

3. Tool Preparation

# Ensure all reconnaissance tools are available and updated
export PATH=$PATH:/home/pierce/go/bin
subfinder -version
assetfinder --help
httpx -version
nuclei -version

Phase 1: Subdomain Discovery

1. Multi-Source Enumeration

# Primary subdomain discovery with subfinder
subfinder -d nba.com -all -recursive -o nba_subdomains_subfinder.txt

# Supplementary discovery with assetfinder
assetfinder --subs-only nba.com > nba_subdomains_assetfinder.txt

# Combine and deduplicate results
cat nba_subdomains_*.txt | sort -u > nba_all_subdomains.txt

2. Advanced Discovery Techniques

# Certificate transparency logs
curl -s "https://crt.sh/?q=%.nba.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > nba_crt_subdomains.txt

# DNS bruteforcing for additional discovery
amass enum -passive -d nba.com -o nba_amass_subdomains.txt

# Merge all sources
cat nba_*_subdomains.txt | sort -u > nba_final_subdomains.txt

3. Statistics and Analysis

# Generate discovery statistics
echo "Total subdomains discovered: $(wc -l < nba_final_subdomains.txt)"
echo "Subfinder results: $(wc -l < nba_subdomains_subfinder.txt)"
echo "Assetfinder results: $(wc -l < nba_subdomains_assetfinder.txt)"
echo "Certificate transparency: $(wc -l < nba_crt_subdomains.txt)"

Phase 2: Service Discovery and Fingerprinting

1. HTTP Service Discovery

# Discover live web services
httpx -l nba_final_subdomains.txt -o nba_live_services.txt

# Enhanced fingerprinting with additional data
httpx -l nba_final_subdomains.txt \
    -title -tech-detect -status-code -content-length \
    -o nba_web_services.json -json

2. Service Categorization

# Extract different service types from httpx output
jq -r 'select(.tech != null) | "\(.url) - \(.tech[])"' nba_web_services.json > nba_technology_stack.txt

# Categorize by response codes
jq -r 'select(.status_code == 200) | .url' nba_web_services.json > nba_200_services.txt
jq -r 'select(.status_code == 403) | .url' nba_web_services.json > nba_403_services.txt
jq -r 'select(.status_code == 404) | .url' nba_web_services.json > nba_404_services.txt

3. Pattern Analysis

# Identify service patterns
grep -E "(api|dev|staging|test|admin)" nba_live_services.txt > nba_interesting_services.txt
grep -E "(cms|portal|dashboard|admin)" nba_live_services.txt > nba_admin_services.txt
grep -E "(cdn|static|assets|media)" nba_live_services.txt > nba_cdn_services.txt

Phase 3: Detailed Analysis

1. Infrastructure Mapping

# Network analysis
nmap -sn --top-ports 1000 -iL nba_ip_addresses.txt > nba_network_scan.txt

# Service port scanning for critical assets
nmap -sV -sC --top-ports 1000 $(head -20 nba_critical_ips.txt) > nba_detailed_scan.txt

2. Technology Stack Analysis

# Extract and analyze technology patterns
grep -i "nginx\|apache\|cloudflare\|akamai" nba_technology_stack.txt | sort | uniq -c
grep -i "react\|angular\|vue\|wordpress\|drupal" nba_technology_stack.txt | sort | uniq -c

3. Geographic Distribution

# Analyze IP geolocation (if needed)
while read ip; do
    curl -s "http://ip-api.com/json/$ip" | jq -r '"\(.query) - \(.country) - \(.org)"'
done < nba_unique_ips.txt > nba_geolocation.txt

Phase 4: Vulnerability Discovery

1. Automated Vulnerability Scanning

# Comprehensive nuclei scan
nuclei -l nba_live_services.txt \
    -tags exposure,config,debug,files,logs \
    -severity medium,high,critical \
    -o nba_vulnerabilities.txt

# Specific scans for high-value targets
nuclei -l nba_admin_services.txt \
    -tags auth-bypass,sqli,xss,ssrf \
    -severity low,medium,high,critical \
    -o nba_admin_vulnerabilities.txt

2. Manual Investigation Priorities

# Create prioritized target lists
echo "=== HIGH PRIORITY TARGETS ===" > nba_investigation_priorities.txt
grep -E "(admin|api|dev|staging)" nba_live_services.txt >> nba_investigation_priorities.txt

echo -e "\n=== TECHNOLOGY-SPECIFIC TARGETS ===" >> nba_investigation_priorities.txt
grep -E "(graphql|api|cms)" nba_technology_stack.txt >> nba_investigation_priorities.txt

echo -e "\n=== ERROR RESPONSES ===" >> nba_investigation_priorities.txt
head -10 nba_403_services.txt >> nba_investigation_priorities.txt

3. Client-Side Analysis

# Extract JavaScript files from live services
while read url; do
    curl -s "$url" | grep -oE "https?://[^\"']*\.js" >> nba_js_files.txt
done < nba_200_services.txt

# Analyze JavaScript files for sensitive information
sort -u nba_js_files.txt | while read js_url; do
    echo "Analyzing: $js_url"
    curl -s "$js_url" | grep -iE "(api|key|secret|token|project.*id)" && echo "FOUND: $js_url"
done > nba_js_analysis.txt

NBA Case Study Results

Discovery Statistics

  • Total Subdomains: 1,132 domains discovered
  • Live Services: 779 active HTTP/HTTPS services
  • Technology Diversity: 15+ different technology stacks identified
  • Geographic Distribution: Primary US hosting with global CDN presence

Critical Findings Distribution

Service Category          | Count | High-Value Targets
======================== | ===== | ==================
API Endpoints            |    47 | 12 critical
Admin/CMS Interfaces     |    23 | 8 restricted access
Development Environments |    15 | 7 potentially exposed
Static/CDN Services      |   234 | 3 misconfigurations
Team-Specific Domains    |    78 | 15 worthy investigation

Technology Stack Analysis

Technology     | Prevalence | Security Implications
============== | ========== | ====================
Akamai CDN     |      45%   | Bot protection, caching
CloudFront     |      23%   | AWS integration
React.js       |      18%   | Client-side vulnerabilities
WordPress      |       8%   | Plugin vulnerabilities
Custom APIs    |      12%   | Business logic flaws

Automation and Scaling

1. Automated Pipeline

#!/bin/bash
# Large-scale reconnaissance automation script

TARGET_DOMAIN=$1
OUTPUT_DIR="recon_${TARGET_DOMAIN}_$(date +%Y%m%d)"

mkdir -p "$OUTPUT_DIR"
cd "$OUTPUT_DIR"

# Phase 1: Subdomain Discovery
echo "[+] Starting subdomain discovery..."
subfinder -d "$TARGET_DOMAIN" -all -o subdomains_subfinder.txt
assetfinder --subs-only "$TARGET_DOMAIN" > subdomains_assetfinder.txt
cat subdomains_*.txt | sort -u > all_subdomains.txt

# Phase 2: Service Discovery
echo "[+] Discovering live services..."
httpx -l all_subdomains.txt -o live_services.txt
httpx -l all_subdomains.txt -json -o web_services.json

# Phase 3: Vulnerability Scanning
echo "[+] Running vulnerability scans..."
nuclei -l live_services.txt -tags exposure,config -o vulnerabilities.txt

# Phase 4: Reporting
echo "[+] Generating summary report..."
echo "Reconnaissance Summary for $TARGET_DOMAIN" > summary.txt
echo "=======================================" >> summary.txt
echo "Total subdomains: $(wc -l < all_subdomains.txt)" >> summary.txt
echo "Live services: $(wc -l < live_services.txt)" >> summary.txt
echo "Vulnerabilities found: $(wc -l < vulnerabilities.txt)" >> summary.txt

echo "[+] Reconnaissance complete. Results in $OUTPUT_DIR"

2. Parallel Processing

# Split large subdomain lists for parallel processing
split -l 100 all_subdomains.txt subdomain_chunk_

# Process chunks in parallel
for chunk in subdomain_chunk_*; do
    httpx -l "$chunk" -o "results_$(basename $chunk).txt" &
done
wait

# Combine results
cat results_subdomain_chunk_*.txt > all_results.txt

3. Resource Management

# Monitor resource usage during large scans
#!/bin/bash
while true; do
    echo "$(date): CPU: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}') Memory: $(free -h | grep Mem | awk '{print $3}')"
    sleep 60
done > resource_usage.log &

# Rate limiting for respectful scanning
export HTTPX_RATE_LIMIT=50  # requests per second
export NUCLEI_RATE_LIMIT=30

Analysis and Reporting

1. Statistical Analysis

# Generate comprehensive statistics
python3 << EOF
import json

# Load httpx results
with open('web_services.json') as f:
    services = [json.loads(line) for line in f]

# Analyze status codes
status_codes = {}
for service in services:
    code = service.get('status_code', 'unknown')
    status_codes[code] = status_codes.get(code, 0) + 1

print("Status Code Distribution:")
for code, count in sorted(status_codes.items()):
    print(f"  {code}: {count}")

# Analyze technologies
technologies = {}
for service in services:
    tech_list = service.get('tech', [])
    for tech in tech_list:
        technologies[tech] = technologies.get(tech, 0) + 1

print("\nTop Technologies:")
for tech, count in sorted(technologies.items(), key=lambda x: x[1], reverse=True)[:10]:
    print(f"  {tech}: {count}")
EOF

2. Visualization

# Create visual representations of findings
python3 << EOF
import matplotlib.pyplot as plt
import json

# Load data and create charts
# (Implementation depends on specific visualization needs)
EOF

3. Report Generation

# Generate markdown report
cat << EOF > reconnaissance_report.md
# Large-Scale Reconnaissance Report
**Target**: $TARGET_DOMAIN
**Date**: $(date)

## Executive Summary
- **Total Subdomains**: $(wc -l < all_subdomains.txt)
- **Live Services**: $(wc -l < live_services.txt)
- **Vulnerabilities**: $(wc -l < vulnerabilities.txt)

## Key Findings
$(head -10 vulnerabilities.txt)

## Technology Stack
$(grep -i "nginx\|apache\|cloudflare" web_services.json | head -5)

## Recommendations
1. Address identified vulnerabilities
2. Review exposed development environments
3. Implement consistent security controls
EOF

Lessons Learned from NBA Analysis

1. Scale Considerations

  • Large Attack Surface: 1,000+ subdomains require systematic approach
  • Resource Management: Parallel processing essential for efficiency
  • Rate Limiting: Respectful scanning prevents blocking
  • Storage Planning: Significant disk space needed for comprehensive logs

2. Discovery Effectiveness

  • Multi-Source Enumeration: Different tools find different assets
  • Certificate Transparency: Valuable for historical subdomain discovery
  • Technology Fingerprinting: Essential for prioritizing targets
  • Pattern Recognition: Automated categorization saves time

3. Analysis Priorities

  • Development Environments: Often have weaker security controls
  • API Endpoints: High-value targets for business logic flaws
  • Admin Interfaces: Critical for privilege escalation
  • Third-Party Integrations: Expanded attack surface

4. Automation Benefits

  • Consistency: Systematic approach prevents missed assets
  • Scalability: Handles enterprise-level infrastructure
  • Reproducibility: Can be repeated for monitoring changes
  • Documentation: Automatic generation of comprehensive reports

Best Practices

1. Ethical Considerations

  • Always respect target infrastructure limits
  • Use appropriate rate limiting
  • Focus on discovery, not exploitation
  • Follow responsible disclosure practices

2. Technical Excellence

  • Validate tool configurations before large scans
  • Implement proper error handling
  • Monitor resource usage during execution
  • Maintain detailed logs for analysis

3. Continuous Improvement

  • Regular tool updates and capability expansion
  • Documentation of lessons learned
  • Methodology refinement based on results
  • Integration of new discovery techniques

Classification: Internal Methodology
Last Updated: September 2025
Validated Against: NBA enterprise infrastructure (1,132 subdomains)