File Upload and Storage

Overview

File upload and storage systems are critical infrastructure components that enable users to persist and retrieve documents, images, videos, and other binary data. Modern applications must handle file uploads securely, efficiently, and at scale while managing storage costs, access controls, and performance requirements.

This guide covers best practices for implementing file upload workflows, integrating with cloud storage providers, optimizing file delivery through CDNs, and managing file lifecycle from upload through archival or deletion.

Core Principles

Never store files in application servers - Use dedicated cloud storage services (S3, Azure Blob, Google Cloud Storage) for scalability, durability, and cost efficiency
Validate at multiple layers - Perform client-side pre-validation for UX, server-side validation for security, and virus scanning for safety
Use presigned URLs for direct uploads - Bypass your application servers for large files to reduce infrastructure costs and improve upload performance
Implement proper access controls - Files often contain sensitive data; enforce authentication, authorization, and time-limited access
Optimize for delivery - Use CDNs for serving frequently accessed files, implement proper caching headers, and transform images/videos on-demand

File Upload Best Practices

Upload Size and Type Validation

File validation must occur at multiple layers to balance user experience with security. Client-side validation provides immediate feedback, while server-side validation enforces security boundaries that cannot be bypassed.

Client-side validation improves UX by catching issues before upload begins:

// React file input with client-side validation
const FileUploader: React.FC = () => {
  const [error, setError] = useState<string | null>(null);

  const MAX_FILE_SIZE = 10 * 1024 * 1024; // 10MB
  const ALLOWED_TYPES = ['image/jpeg', 'image/png', 'application/pdf'];

  const handleFileChange = (event: React.ChangeEvent<HTMLInputElement>) => {
    const file = event.target.files?.[0];
    if (!file) return;

    setError(null);

    // Validate file size
    if (file.size > MAX_FILE_SIZE) {
      setError('File size must be less than 10MB');
      return;
    }

    // Validate MIME type
    if (!ALLOWED_TYPES.includes(file.type)) {
      setError('Only JPEG, PNG, and PDF files are allowed');
      return;
    }

    uploadFile(file);
  };

  return (
    <div>
      <input type="file" onChange={handleFileChange} accept=".jpg,.jpeg,.png,.pdf" />
      {error && <span className="error">{error}</span>}
    </div>
  );
};

The accept attribute on the file input provides a hint to the browser's file picker, but it does not provide security - users can easily bypass this restriction. This is why server-side validation is mandatory.

Server-side validation enforces security boundaries that cannot be circumvented:

// Spring Boot file upload validation
@RestController
@RequestMapping("/api/files")
public class FileUploadController {

    private static final long MAX_FILE_SIZE = 10 * 1024 * 1024; // 10MB
    private static final Set<String> ALLOWED_CONTENT_TYPES = Set.of(
        "image/jpeg",
        "image/png",
        "application/pdf"
    );

    @PostMapping(consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
    public ResponseEntity<FileUploadResponse> uploadFile(
            @RequestParam("file") MultipartFile file) {

        // Validate file is present
        if (file.isEmpty()) {
            throw new InvalidFileException("File cannot be empty");
        }

        // Validate file size
        if (file.getSize() > MAX_FILE_SIZE) {
            throw new InvalidFileException("File size exceeds 10MB limit");
        }

        // Validate content type - use detected type, not client-provided
        String contentType = detectContentType(file);
        if (!ALLOWED_CONTENT_TYPES.contains(contentType)) {
            throw new InvalidFileException("File type not allowed: " + contentType);
        }

        // Validate file extension matches content type
        String filename = file.getOriginalFilename();
        if (!isValidFileExtension(filename, contentType)) {
            throw new InvalidFileException("File extension does not match content type");
        }

        String fileId = fileStorageService.store(file);
        return ResponseEntity.ok(new FileUploadResponse(fileId));
    }

    private String detectContentType(MultipartFile file) {
        // Use Apache Tika or similar to detect actual content type
        // Never trust the Content-Type header from the client
        try {
            Tika tika = new Tika();
            return tika.detect(file.getInputStream());
        } catch (IOException e) {
            throw new InvalidFileException("Could not detect file type");
        }
    }

    private boolean isValidFileExtension(String filename, String contentType) {
        String extension = filename.substring(filename.lastIndexOf('.') + 1).toLowerCase();
        return switch (contentType) {
            case "image/jpeg" -> extension.equals("jpg") || extension.equals("jpeg");
            case "image/png" -> extension.equals("png");
            case "application/pdf" -> extension.equals("pdf");
            default -> false;
        };
    }
}

Why content type detection matters: Attackers can upload malicious files (e.g., executable scripts) disguised as images by setting a fake Content-Type header. Content type detection using libraries like Apache Tika examines the file's binary structure to determine its true type, preventing this attack vector.

Virus and Malware Scanning

For applications handling user-generated files, virus scanning is essential to prevent malware distribution. Cloud storage providers often offer integrated scanning solutions, or you can use dedicated services.

// Spring Boot integration with ClamAV or cloud-based scanning
@Service
public class FileStorageService {

    private final VirusScannerClient virusScanner;
    private final S3Client s3Client;

    public String store(MultipartFile file) {
        // Scan file before storing
        ScanResult scanResult = virusScanner.scan(file.getInputStream());

        if (!scanResult.isClean()) {
            log.warn("Virus detected in uploaded file: {}", scanResult.getDetails());
            throw new VirusDetectedException("File contains malicious content");
        }

        // Generate unique file key
        String fileKey = generateFileKey(file.getOriginalFilename());

        // Upload to S3
        s3Client.putObject(PutObjectRequest.builder()
            .bucket(bucketName)
            .key(fileKey)
            .contentType(file.getContentType())
            .build(),
            RequestBody.fromInputStream(file.getInputStream(), file.getSize()));

        return fileKey;
    }
}

Virus scanning adds latency to the upload process. For better user experience, consider asynchronous scanning: upload files to a quarantine bucket, scan asynchronously, then move clean files to the production bucket. This allows the API to return quickly while scanning happens in the background.

Chunked and Resumable Uploads

For large files (videos, backups, datasets), chunked uploads improve reliability by allowing resume on failure and providing upload progress feedback.

Multipart upload workflow:

Initialize multipart upload to get an upload ID
Split file into chunks (typically 5-100MB per chunk)
Upload chunks in parallel
Complete the multipart upload to assemble chunks

// Frontend chunked upload implementation
interface UploadChunk {
  partNumber: number;
  etag: string;
}

async function uploadLargeFile(file: File) {
  const CHUNK_SIZE = 10 * 1024 * 1024; // 10MB chunks
  const totalChunks = Math.ceil(file.size / CHUNK_SIZE);

  // Step 1: Initialize multipart upload
  const { uploadId, fileKey } = await fetch('/api/files/initiate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      filename: file.name,
      contentType: file.type,
      size: file.size
    })
  }).then(res => res.json());

  // Step 2: Upload chunks in parallel (limit concurrency)
  const uploadedParts: UploadChunk[] = [];

  for (let i = 0; i < totalChunks; i++) {
    const start = i * CHUNK_SIZE;
    const end = Math.min(start + CHUNK_SIZE, file.size);
    const chunk = file.slice(start, end);

    // Get presigned URL for this chunk
    const { presignedUrl } = await fetch(`/api/files/${fileKey}/chunk-url`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ uploadId, partNumber: i + 1 })
    }).then(res => res.json());

    // Upload chunk directly to S3
    const response = await fetch(presignedUrl, {
      method: 'PUT',
      body: chunk
    });

    const etag = response.headers.get('ETag');
    uploadedParts.push({ partNumber: i + 1, etag: etag! });

    // Update progress
    const progress = ((i + 1) / totalChunks) * 100;
    console.log(`Upload progress: ${progress}%`);
  }

  // Step 3: Complete the multipart upload
  await fetch(`/api/files/${fileKey}/complete`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ uploadId, parts: uploadedParts })
  });

  return fileKey;
}

The backend orchestrates the multipart upload by generating presigned URLs for each chunk:

// Spring Boot multipart upload coordination
@Service
public class MultipartUploadService {

    private final S3Client s3Client;

    public InitiateUploadResponse initiateUpload(String filename, String contentType) {
        String fileKey = generateFileKey(filename);

        CreateMultipartUploadResponse response = s3Client.createMultipartUpload(
            CreateMultipartUploadRequest.builder()
                .bucket(bucketName)
                .key(fileKey)
                .contentType(contentType)
                .build()
        );

        return new InitiateUploadResponse(fileKey, response.uploadId());
    }

    public String generateChunkPresignedUrl(String fileKey, String uploadId, int partNumber) {
        UploadPartRequest uploadPartRequest = UploadPartRequest.builder()
            .bucket(bucketName)
            .key(fileKey)
            .uploadId(uploadId)
            .partNumber(partNumber)
            .build();

        UploadPartPresignRequest presignRequest = UploadPartPresignRequest.builder()
            .signatureDuration(Duration.ofMinutes(15))
            .uploadPartRequest(uploadPartRequest)
            .build();

        PresignedUploadPartRequest presigned = s3Presigner.presignUploadPart(presignRequest);
        return presigned.url().toString();
    }

    public void completeUpload(String fileKey, String uploadId, List<UploadPartInfo> parts) {
        List<CompletedPart> completedParts = parts.stream()
            .map(part -> CompletedPart.builder()
                .partNumber(part.partNumber())
                .eTag(part.etag())
                .build())
            .toList();

        s3Client.completeMultipartUpload(
            CompleteMultipartUploadRequest.builder()
                .bucket(bucketName)
                .key(fileKey)
                .uploadId(uploadId)
                .multipartUpload(CompletedMultipartUpload.builder()
                    .parts(completedParts)
                    .build())
                .build()
        );
    }
}

Why multipart uploads are important: They solve three key problems: (1) Reliability - if the connection drops during upload, only the current chunk needs to be retried, not the entire file; (2) Performance - chunks can be uploaded in parallel using multiple connections; (3) User experience - progress can be reported accurately, and users can pause/resume uploads.

Cloud Storage Integration

Choosing a Storage Provider

Modern applications typically use object storage services rather than file systems or block storage. The three major providers offer similar capabilities with different pricing models and regional availability:

AWS S3 - Most mature ecosystem, extensive integration options, 11 9's durability (99.999999999%), storage classes for different access patterns
Azure Blob Storage - Strong integration with Azure ecosystem, hierarchical namespace support, three access tiers (Hot, Cool, Archive)
Google Cloud Storage - Unified API across storage classes, automatic transitions, strong consistency guarantees

All three provide:

Near-unlimited scalability
Built-in redundancy and durability
Versioning and lifecycle policies
Access control and encryption
Event notifications for file changes

Storage class selection impacts both cost and access latency:

Access Pattern	AWS S3 Class	Use Case	Cost
Frequent access	S3 Standard	Active user uploads, profile photos	$$$
Infrequent access	S3 Standard-IA	Monthly reports, backups	$$
Archive	S3 Glacier	Compliance records, old documents	$

Use lifecycle policies to automatically transition objects between storage classes as they age, optimizing costs without manual intervention.

Storage Configuration

Configure storage buckets with appropriate security settings from the start. Retroactively fixing security misconfigurations is difficult and risky.

// Spring Boot S3 configuration
@Configuration
public class S3Config {

    @Bean
    public S3Client s3Client(
            @Value("${aws.region}") String region,
            @Value("${aws.accessKeyId}") String accessKeyId,
            @Value("${aws.secretAccessKey}") String secretAccessKey) {

        AwsBasicCredentials credentials = AwsBasicCredentials.create(
            accessKeyId,
            secretAccessKey
        );

        return S3Client.builder()
            .region(Region.of(region))
            .credentialsProvider(StaticCredentialsProvider.create(credentials))
            .build();
    }

    @Bean
    public S3Presigner s3Presigner(
            @Value("${aws.region}") String region,
            @Value("${aws.accessKeyId}") String accessKeyId,
            @Value("${aws.secretAccessKey}") String secretAccessKey) {

        AwsBasicCredentials credentials = AwsBasicCredentials.create(
            accessKeyId,
            secretAccessKey
        );

        return S3Presigner.builder()
            .region(Region.of(region))
            .credentialsProvider(StaticCredentialsProvider.create(credentials))
            .build();
    }
}

Never hardcode credentials in application code. Use environment variables, AWS IAM roles (when running on EC2/ECS/Lambda), or secret management services. See Secrets Management for detailed guidance.

Bucket security checklist:

Block all public access unless explicitly required
[GOOD] Enable versioning for critical data
[GOOD] Enable server-side encryption (SSE-S3, SSE-KMS, or SSE-C)
[GOOD] Configure CORS only for specific origins
[GOOD] Enable access logging for audit trails
[GOOD] Use bucket policies to enforce encryption in transit (require HTTPS)
[GOOD] Enable object lock for compliance requirements

File Metadata Management

Store file metadata separately from the files themselves in your application database. This enables efficient searching, filtering, and access control without querying the storage service.

// File metadata entity
@Entity
@Table(name = "files")
public class FileMetadata {

    @Id
    private UUID id;

    @Column(nullable = false)
    private String storageKey;  // S3 object key

    @Column(nullable = false)
    private String originalFilename;

    @Column(nullable = false)
    private String contentType;

    @Column(nullable = false)
    private Long sizeBytes;

    @Column(nullable = false)
    private String uploadedBy;  // User ID

    @Column(nullable = false)
    private Instant uploadedAt;

    @Enumerated(EnumType.STRING)
    @Column(nullable = false)
    private FileStatus status;  // SCANNING, AVAILABLE, REJECTED

    private String virusScanResult;

    @ElementCollection
    @CollectionTable(name = "file_tags")
    private Set<String> tags = new HashSet<>();

    // Image-specific metadata
    private Integer imageWidth;
    private Integer imageHeight;

    // Document-specific metadata
    private Integer pageCount;
}

public enum FileStatus {
    SCANNING,    // Virus scan in progress
    AVAILABLE,   // Ready for download
    REJECTED,    // Failed virus scan
    ARCHIVED,    // Moved to cold storage
    DELETED      // Soft deleted
}

Storing metadata in your database allows you to:

Query files by uploader, upload date, tags, or content type
Implement access control without hitting the storage service
Track file lifecycle status (scanning, available, archived)
Generate reports on storage usage by user or department
Implement soft delete (mark as deleted without removing from storage)

When a file is uploaded, first create the metadata record, then upload to storage:

@Service
@Transactional
public class FileService {

    private final FileMetadataRepository fileRepository;
    private final S3Client s3Client;
    private final VirusScannerService virusScanner;

    public FileMetadata uploadFile(MultipartFile file, String userId) {
        // Create metadata record
        FileMetadata metadata = new FileMetadata();
        metadata.setId(UUID.randomUUID());
        metadata.setStorageKey(generateStorageKey());
        metadata.setOriginalFilename(file.getOriginalFilename());
        metadata.setContentType(file.getContentType());
        metadata.setSizeBytes(file.getSize());
        metadata.setUploadedBy(userId);
        metadata.setUploadedAt(Instant.now());
        metadata.setStatus(FileStatus.SCANNING);

        fileRepository.save(metadata);

        // Upload to S3
        s3Client.putObject(
            PutObjectRequest.builder()
                .bucket(bucketName)
                .key(metadata.getStorageKey())
                .contentType(metadata.getContentType())
                .build(),
            RequestBody.fromInputStream(file.getInputStream(), file.getSize())
        );

        // Trigger async virus scan
        virusScanner.scanAsync(metadata.getId());

        return metadata;
    }

    private String generateStorageKey() {
        // Use UUID to avoid filename collisions and path traversal attacks
        return UUID.randomUUID().toString();
    }
}

Presigned URLs for Direct Uploads

Presigned URLs allow clients to upload files directly to cloud storage, bypassing your application servers. This architecture reduces bandwidth costs, improves upload performance, and eliminates the need to scale application servers for file upload traffic.

Generating Presigned URLs

A presigned URL contains authentication information in the URL itself, allowing temporary access without requiring AWS credentials on the client.

// Backend: Generate presigned URL for upload
@RestController
@RequestMapping("/api/files")
public class FileUploadController {

    private final S3Presigner s3Presigner;
    private final FileMetadataRepository fileRepository;

    @PostMapping("/upload-url")
    public ResponseEntity<PresignedUploadResponse> getUploadUrl(
            @RequestBody UploadUrlRequest request,
            @AuthenticationPrincipal UserDetails user) {

        // Validate file metadata
        validateUploadRequest(request);

        // Create file metadata record
        FileMetadata metadata = new FileMetadata();
        metadata.setId(UUID.randomUUID());
        metadata.setStorageKey(generateStorageKey());
        metadata.setOriginalFilename(request.filename());
        metadata.setContentType(request.contentType());
        metadata.setSizeBytes(request.size());
        metadata.setUploadedBy(user.getUsername());
        metadata.setUploadedAt(Instant.now());
        metadata.setStatus(FileStatus.UPLOADING);

        fileRepository.save(metadata);

        // Generate presigned URL valid for 15 minutes
        PutObjectRequest objectRequest = PutObjectRequest.builder()
            .bucket(bucketName)
            .key(metadata.getStorageKey())
            .contentType(request.contentType())
            .contentLength(request.size())
            .build();

        PutObjectPresignRequest presignRequest = PutObjectPresignRequest.builder()
            .signatureDuration(Duration.ofMinutes(15))
            .putObjectRequest(objectRequest)
            .build();

        PresignedPutObjectRequest presignedRequest = s3Presigner.presignPutObject(presignRequest);

        return ResponseEntity.ok(new PresignedUploadResponse(
            metadata.getId(),
            presignedRequest.url().toString(),
            presignedRequest.expiration()
        ));
    }

    @PostMapping("/{fileId}/complete")
    public ResponseEntity<Void> completeUpload(
            @PathVariable UUID fileId,
            @AuthenticationPrincipal UserDetails user) {

        FileMetadata metadata = fileRepository.findById(fileId)
            .orElseThrow(() -> new FileNotFoundException(fileId));

        // Verify the user who initiated the upload is completing it
        if (!metadata.getUploadedBy().equals(user.getUsername())) {
            throw new ForbiddenException("Cannot complete upload for another user");
        }

        // Verify file exists in S3
        HeadObjectResponse response = s3Client.headObject(
            HeadObjectRequest.builder()
                .bucket(bucketName)
                .key(metadata.getStorageKey())
                .build()
        );

        // Update metadata
        metadata.setStatus(FileStatus.SCANNING);
        fileRepository.save(metadata);

        // Trigger virus scan
        virusScannerService.scanAsync(fileId);

        return ResponseEntity.noContent().build();
    }
}

Frontend usage of presigned URLs:

// Client-side direct upload to S3
async function uploadFileDirectly(file: File) {
  // Step 1: Request presigned URL
  const uploadResponse = await fetch('/api/files/upload-url', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      filename: file.name,
      contentType: file.type,
      size: file.size
    })
  });

  const { fileId, presignedUrl, expiration } = await uploadResponse.json();

  // Step 2: Upload directly to S3 using presigned URL
  const uploadResult = await fetch(presignedUrl, {
    method: 'PUT',
    headers: {
      'Content-Type': file.type,
      'Content-Length': file.size.toString()
    },
    body: file
  });

  if (!uploadResult.ok) {
    throw new Error('Upload to S3 failed');
  }

  // Step 3: Notify backend that upload is complete
  await fetch(`/api/files/${fileId}/complete`, {
    method: 'POST'
  });

  return fileId;
}

Security considerations for presigned URLs:

Short expiration times (5-15 minutes) limit the window for URL abuse
Include content-type and size in signature to prevent clients from uploading different files than declared
Validate completion by checking the file exists in S3 before marking it available
Rate limit presigned URL generation to prevent abuse

Image Optimization and Transformation

Images often represent the bulk of web application bandwidth. Optimizing image delivery improves page load times, reduces bandwidth costs, and enhances user experience, especially on mobile networks.

On-the-Fly Image Transformation

Rather than storing multiple sizes of each image, transform images on-demand based on request parameters. This reduces storage costs and eliminates the need to pre-generate variants for every conceivable use case.

Common transformations:

Resize: Generate thumbnails or fit images to specific dimensions
Format conversion: Convert to WebP or AVIF for better compression
Quality adjustment: Reduce quality for thumbnails
Cropping: Extract specific regions (e.g., square profile photos from rectangular uploads)

Many CDNs provide built-in image transformation capabilities (Cloudflare Images, Cloudinary, imgix). For AWS, use Lambda@Edge or CloudFront Functions:

// CloudFront Function for basic image resizing
function handler(event) {
    var request = event.request;
    var uri = request.uri;

    // Extract transformation parameters from query string
    var params = request.querystring;
    var width = params.w ? params.w.value : null;
    var height = params.h ? params.h.value : null;
    var quality = params.q ? params.q.value : '85';

    // Rewrite URI to include transformation parameters
    // This allows caching different sizes separately
    if (width || height) {
        var transformKey = [];
        if (width) transformKey.push('w' + width);
        if (height) transformKey.push('h' + height);
        transformKey.push('q' + quality);

        // Insert transformation key into path
        var parts = uri.split('/');
        var filename = parts.pop();
        parts.push(transformKey.join('-'));
        parts.push(filename);
        request.uri = parts.join('/');
    }

    return request;
}

For more complex transformations, use Lambda@Edge with image processing libraries:

// Lambda@Edge for advanced image processing
const sharp = require('sharp');
const AWS = require('aws-sdk');
const s3 = new AWS.S3();

exports.handler = async (event) => {
    const response = event.Records[0].cf.response;
    const request = event.Records[0].cf.request;

    // If origin returned 404, check if we need to generate the image
    if (response.status === '404') {
        const uri = request.uri;
        const params = parseTransformParams(uri);

        if (params.originalKey) {
            // Fetch original image from S3
            const originalImage = await s3.getObject({
                Bucket: 'my-bucket',
                Key: params.originalKey
            }).promise();

            // Transform image using Sharp
            let transform = sharp(originalImage.Body);

            if (params.width || params.height) {
                transform = transform.resize(params.width, params.height, {
                    fit: 'inside',
                    withoutEnlargement: true
                });
            }

            if (params.format === 'webp') {
                transform = transform.webp({ quality: params.quality });
            }

            const transformedImage = await transform.toBuffer();

            // Upload transformed image to S3 for caching
            await s3.putObject({
                Bucket: 'my-bucket',
                Key: uri.substring(1),
                Body: transformedImage,
                ContentType: `image/${params.format || 'jpeg'}`
            }).promise();

            // Return transformed image
            return {
                status: '200',
                headers: {
                    'content-type': [{ value: `image/${params.format || 'jpeg'}` }],
                    'cache-control': [{ value: 'public, max-age=31536000' }]
                },
                body: transformedImage.toString('base64'),
                bodyEncoding: 'base64'
            };
        }
    }

    return response;
};

Caching transformed images is critical for performance. The first request triggers transformation, stores the result in S3, and subsequent requests serve the cached version directly from CloudFront without invoking Lambda.

Responsive Images

Serve appropriately sized images based on device screen size and pixel density. This prevents mobile devices from downloading massive desktop-sized images unnecessarily.

<!-- Responsive image using srcset and sizes -->
<img
  src="https://cdn.example.com/photo.jpg?w=800"
  srcset="
    https://cdn.example.com/photo.jpg?w=400 400w,
    https://cdn.example.com/photo.jpg?w=800 800w,
    https://cdn.example.com/photo.jpg?w=1200 1200w,
    https://cdn.example.com/photo.jpg?w=1600 1600w
  "
  sizes="(max-width: 640px) 100vw, (max-width: 1024px) 50vw, 800px"
  alt="Description"
  loading="lazy"
/>

The sizes attribute tells the browser which image width to download based on the viewport:

Mobile (≤640px): Use full viewport width (100vw)
Tablet (≤1024px): Use half viewport width (50vw)
Desktop: Use fixed 800px width

The browser automatically selects the optimal image from the srcset based on the sizes calculation and device pixel density.

Modern Image Formats

WebP and AVIF provide significantly better compression than JPEG and PNG while maintaining visual quality. Serve modern formats to supporting browsers with fallbacks for older browsers.

<!-- Using picture element for format fallbacks -->
<picture>
  <source srcset="image.avif" type="image/avif" />
  <source srcset="image.webp" type="image/webp" />
  <img src="image.jpg" alt="Description" />
</picture>

For dynamic image URLs, use content negotiation:

// Backend content negotiation for image formats
@GetMapping("/api/files/{fileId}/image")
public ResponseEntity<byte[]> getImage(
        @PathVariable UUID fileId,
        @RequestHeader(value = "Accept", defaultValue = "*/*") String accept) {

    FileMetadata metadata = fileMetadataRepository.findById(fileId)
        .orElseThrow(() -> new FileNotFoundException(fileId));

    // Determine best format based on Accept header
    String format = "jpeg"; // default
    if (accept.contains("image/avif")) {
        format = "avif";
    } else if (accept.contains("image/webp")) {
        format = "webp";
    }

    // Generate signed URL with format parameter
    String imageUrl = generateSignedImageUrl(metadata.getStorageKey(), format);

    // Redirect to CDN URL
    return ResponseEntity.status(HttpStatus.FOUND)
        .location(URI.create(imageUrl))
        .build();
}

Compression savings: WebP typically achieves 25-35% smaller file sizes than JPEG at the same visual quality. AVIF can achieve 50% savings but has slower encoding times and limited browser support (Safari 16+, Chrome 85+, Firefox 93+).

Access Control and Signed URLs

Files often contain sensitive data requiring access controls. Rather than proxying file downloads through your application (which wastes bandwidth and compute resources), use signed URLs to grant temporary access directly from the storage service.

Signed URL Generation

Signed URLs work similarly to presigned upload URLs but for downloads:

// Generate signed URL for downloading a file
@Service
public class FileAccessService {

    private final S3Presigner s3Presigner;
    private final FileMetadataRepository fileRepository;

    public String generateDownloadUrl(UUID fileId, String userId) {
        FileMetadata metadata = fileRepository.findById(fileId)
            .orElseThrow(() -> new FileNotFoundException(fileId));

        // Check access permissions
        if (!canUserAccessFile(userId, metadata)) {
            throw new ForbiddenException("User does not have access to this file");
        }

        // Generate signed URL valid for 1 hour
        GetObjectRequest objectRequest = GetObjectRequest.builder()
            .bucket(bucketName)
            .key(metadata.getStorageKey())
            .responseContentType(metadata.getContentType())
            .responseContentDisposition("attachment; filename=\"" + metadata.getOriginalFilename() + "\"")
            .build();

        GetObjectPresignRequest presignRequest = GetObjectPresignRequest.builder()
            .signatureDuration(Duration.ofHours(1))
            .getObjectRequest(objectRequest)
            .build();

        PresignedGetObjectRequest presignedRequest = s3Presigner.presignGetObject(presignRequest);

        return presignedRequest.url().toString();
    }

    private boolean canUserAccessFile(String userId, FileMetadata metadata) {
        // Implement your access control logic
        // Examples:
        // - User uploaded the file
        // - User is in the same organization
        // - File is marked as public
        // - User has been granted explicit access
        return metadata.getUploadedBy().equals(userId)
            || metadata.isPublic()
            || hasExplicitAccess(userId, metadata);
    }
}

API endpoint for file downloads:

@RestController
@RequestMapping("/api/files")
public class FileDownloadController {

    private final FileAccessService fileAccessService;

    @GetMapping("/{fileId}/download")
    public ResponseEntity<Void> downloadFile(
            @PathVariable UUID fileId,
            @AuthenticationPrincipal UserDetails user) {

        // Generate signed URL with access control check
        String signedUrl = fileAccessService.generateDownloadUrl(fileId, user.getUsername());

        // Redirect client to signed URL
        return ResponseEntity.status(HttpStatus.FOUND)
            .location(URI.create(signedUrl))
            .build();
    }
}

Frontend usage:

// Download file via signed URL
async function downloadFile(fileId: string) {
  // Request signed URL
  const response = await fetch(`/api/files/${fileId}/download`, {
    method: 'GET'
  });

  if (response.redirected) {
    // Browser automatically follows redirect to signed URL
    // File download begins automatically
    window.location.href = response.url;
  }
}

Access Control Patterns

Different applications require different access control models:

1. User-owned files - Only the uploader can access:

public boolean canUserAccessFile(String userId, FileMetadata metadata) {
    return metadata.getUploadedBy().equals(userId);
}

2. Organization-scoped files - All members of an organization can access:

public boolean canUserAccessFile(String userId, FileMetadata metadata) {
    User user = userRepository.findById(userId).orElseThrow();
    User uploader = userRepository.findById(metadata.getUploadedBy()).orElseThrow();
    return user.getOrganizationId().equals(uploader.getOrganizationId());
}

3. Explicit sharing - Granular access granted to specific users or groups:

@Entity
@Table(name = "file_access_grants")
public class FileAccessGrant {

    @Id
    private UUID id;

    @ManyToOne
    private FileMetadata file;

    private String grantedToUserId;

    @Enumerated(EnumType.STRING)
    private AccessLevel accessLevel;  // READ, WRITE, ADMIN

    private Instant grantedAt;
    private Instant expiresAt;
}

public boolean canUserAccessFile(String userId, FileMetadata metadata) {
    return metadata.getUploadedBy().equals(userId)
        || fileAccessGrantRepository.existsByFileIdAndGrantedToUserIdAndExpiresAtAfter(
            metadata.getId(),
            userId,
            Instant.now()
        );
}

4. Public files with authentication - Files are public but require a logged-in user:

public boolean canUserAccessFile(String userId, FileMetadata metadata) {
    return metadata.isPublic() && userId != null;
}

For detailed authorization patterns, see Authorization.

CDN Integration

Content Delivery Networks (CDNs) cache files at edge locations worldwide, reducing latency and bandwidth costs for frequently accessed files.

CDN Configuration

Configure your CDN to cache different content types with appropriate TTLs:

// CloudFront cache behavior configuration
{
  "PathPattern": "/files/*",
  "TargetOriginId": "S3-my-bucket",
  "ViewerProtocolPolicy": "redirect-to-https",
  "AllowedMethods": ["GET", "HEAD"],
  "CachedMethods": ["GET", "HEAD"],
  "Compress": true,
  "CachePolicyId": "custom-policy",
  "CachePolicy": {
    "MinTTL": 0,
    "MaxTTL": 31536000,  // 1 year
    "DefaultTTL": 86400,  // 1 day
    "ParametersInCacheKeyAndForwardedToOrigin": {
      "EnableAcceptEncodingGzip": true,
      "EnableAcceptEncodingBrotli": true,
      "HeadersConfig": {
        "HeaderBehavior": "whitelist",
        "Headers": ["Accept"]  // For content negotiation
      },
      "QueryStringsConfig": {
        "QueryStringBehavior": "whitelist",
        "QueryStrings": ["w", "h", "q", "format"]  // Image transform params
      }
    }
  }
}

Cache-Control headers from your origin control CDN caching behavior:

// Set cache headers when uploading to S3
s3Client.putObject(
    PutObjectRequest.builder()
        .bucket(bucketName)
        .key(fileKey)
        .contentType(contentType)
        .cacheControl("public, max-age=31536000, immutable")  // Cache for 1 year
        .build(),
    RequestBody.fromInputStream(inputStream, size)
);

Immutable content: Files with content-addressed keys (UUIDs or content hashes) never change, so they can be cached indefinitely. Use immutable directive to tell browsers they never need to revalidate.

Cache Invalidation

When files are updated or deleted, invalidate CDN cache to prevent serving stale content:

// Invalidate CloudFront cache after file deletion
@Service
public class FileService {

    private final S3Client s3Client;
    private final CloudFrontClient cloudFrontClient;

    @Value("${cloudfront.distributionId}")
    private String distributionId;

    public void deleteFile(UUID fileId) {
        FileMetadata metadata = fileRepository.findById(fileId)
            .orElseThrow(() -> new FileNotFoundException(fileId));

        // Delete from S3
        s3Client.deleteObject(DeleteObjectRequest.builder()
            .bucket(bucketName)
            .key(metadata.getStorageKey())
            .build());

        // Invalidate CDN cache
        cloudFrontClient.createInvalidation(CreateInvalidationRequest.builder()
            .distributionId(distributionId)
            .invalidationBatch(InvalidationBatch.builder()
                .paths(Paths.builder()
                    .items("/files/" + metadata.getStorageKey())
                    .quantity(1)
                    .build())
                .callerReference(UUID.randomUUID().toString())
                .build())
            .build());

        // Mark as deleted in database
        metadata.setStatus(FileStatus.DELETED);
        fileRepository.save(metadata);
    }
}

Cost of invalidations: CloudFront allows 1,000 free invalidation paths per month; additional invalidations cost money. For high-volume updates, consider using versioned URLs (e.g., /files/abc123?v=2) instead of invalidations - changing the URL automatically bypasses the cache.

CDN Performance Optimization

Enable compression for text-based files (CSS, JavaScript, JSON):

s3Client.putObject(
    PutObjectRequest.builder()
        .bucket(bucketName)
        .key(fileKey)
        .contentType("application/json")
        .contentEncoding("gzip")  // Pre-compress before uploading
        .build(),
    RequestBody.fromInputStream(gzipInputStream, size)
);

Or configure CloudFront to compress responses automatically using the Compress setting shown above.

HTTP/2 and HTTP/3 improve performance by multiplexing multiple file requests over a single connection. Ensure your CDN supports these protocols (CloudFront enables HTTP/2 by default, HTTP/3 can be enabled).

Prefetch DNS and preconnect to CDN domains for faster initial connections:

<link rel="dns-prefetch" href="https://cdn.example.com" />
<link rel="preconnect" href="https://cdn.example.com" crossorigin />

File Lifecycle Management

Files have lifecycles: they're created, accessed frequently, accessed less over time, and eventually archived or deleted. Implement lifecycle policies to optimize storage costs.

Lifecycle Policies

Automatically transition files between storage classes or delete them based on age:

// Configure S3 lifecycle policy
@Configuration
public class S3LifecycleConfig {

    @Bean
    public CommandLineRunner configureLifecyclePolicy(S3Client s3Client) {
        return args -> {
            LifecycleRule rule = LifecycleRule.builder()
                .id("archive-old-files")
                .status(ExpirationStatus.ENABLED)
                .filter(LifecycleRuleFilter.builder()
                    .prefix("files/")
                    .build())
                .transitions(
                    // Move to Infrequent Access after 30 days
                    Transition.builder()
                        .days(30)
                        .storageClass(TransitionStorageClass.STANDARD_IA)
                        .build(),
                    // Move to Glacier after 90 days
                    Transition.builder()
                        .days(90)
                        .storageClass(TransitionStorageClass.GLACIER)
                        .build()
                )
                .expiration(LifecycleExpiration.builder()
                    .days(365)  // Delete after 1 year
                    .build())
                .build();

            s3Client.putBucketLifecycleConfiguration(
                PutBucketLifecycleConfigurationRequest.builder()
                    .bucket("my-bucket")
                    .lifecycleConfiguration(BucketLifecycleConfiguration.builder()
                        .rules(rule)
                        .build())
                    .build()
            );
        };
    }
}

Use cases for lifecycle policies:

Logs and backups: Delete after retention period (e.g., 90 days for logs)
User uploads: Move to cheaper storage after initial access period
Temporary files: Auto-delete after short period (e.g., 24 hours for preview files)
Compliance archives: Move to Glacier for long-term retention at minimal cost

Soft Delete

Rather than immediately deleting files, mark them as deleted and remove them after a grace period:

@Service
@Transactional
public class FileService {

    public void softDeleteFile(UUID fileId, String userId) {
        FileMetadata metadata = fileRepository.findById(fileId)
            .orElseThrow(() -> new FileNotFoundException(fileId));

        // Check permissions
        if (!metadata.getUploadedBy().equals(userId)) {
            throw new ForbiddenException("Cannot delete another user's file");
        }

        // Mark as deleted
        metadata.setStatus(FileStatus.DELETED);
        metadata.setDeletedAt(Instant.now());
        fileRepository.save(metadata);

        // Schedule permanent deletion after 30 days
        schedulePermanentDeletion(fileId);
    }

    public void restoreFile(UUID fileId, String userId) {
        FileMetadata metadata = fileRepository.findById(fileId)
            .orElseThrow(() -> new FileNotFoundException(fileId));

        if (metadata.getStatus() != FileStatus.DELETED) {
            throw new IllegalStateException("File is not deleted");
        }

        if (!metadata.getUploadedBy().equals(userId)) {
            throw new ForbiddenException("Cannot restore another user's file");
        }

        // Restore file
        metadata.setStatus(FileStatus.AVAILABLE);
        metadata.setDeletedAt(null);
        fileRepository.save(metadata);
    }

    @Scheduled(cron = "0 0 2 * * *")  // Run at 2 AM daily
    public void permanentlyDeleteExpiredFiles() {
        Instant cutoff = Instant.now().minus(30, ChronoUnit.DAYS);

        List<FileMetadata> expiredFiles = fileRepository
            .findByStatusAndDeletedAtBefore(FileStatus.DELETED, cutoff);

        for (FileMetadata metadata : expiredFiles) {
            // Delete from S3
            s3Client.deleteObject(DeleteObjectRequest.builder()
                .bucket(bucketName)
                .key(metadata.getStorageKey())
                .build());

            // Delete metadata
            fileRepository.delete(metadata);
        }
    }
}

Soft delete provides a safety net against accidental deletions and simplifies compliance with data retention requirements.

Security Best Practices

Prevent Path Traversal

Never use user-provided filenames directly as storage keys. Attackers can use path traversal sequences (../) to overwrite or access arbitrary files:

// VULNERABLE - DO NOT USE
public void uploadFile(MultipartFile file) {
    String filename = file.getOriginalFilename();  // Could be "../../etc/passwd"
    s3Client.putObject(
        PutObjectRequest.builder()
            .bucket(bucketName)
            .key("uploads/" + filename)  // Path traversal vulnerability
            .build(),
        RequestBody.fromInputStream(file.getInputStream(), file.getSize())
    );
}

// SECURE - Use generated keys
public void uploadFile(MultipartFile file) {
    String storageKey = UUID.randomUUID().toString();  // Random, no user input
    s3Client.putObject(
        PutObjectRequest.builder()
            .bucket(bucketName)
            .key("uploads/" + storageKey)
            .build(),
        RequestBody.fromInputStream(file.getInputStream(), file.getSize())
    );
}

Store the original filename in metadata, not in the storage key.

Prevent MIME Type Confusion

Browsers execute content based on MIME type. If an attacker uploads an HTML file but your application serves it with the wrong content type, it can lead to XSS attacks:

// VULNERABLE - Trusts client-provided MIME type
public void serveFile(UUID fileId, HttpServletResponse response) {
    FileMetadata metadata = fileRepository.findById(fileId).orElseThrow();
    response.setContentType(metadata.getContentType());  // From client upload
    // ... serve file
}

// SECURE - Detect actual content type
public void serveFile(UUID fileId, HttpServletResponse response) {
    FileMetadata metadata = fileRepository.findById(fileId).orElseThrow();

    // Download file from S3
    S3Object s3Object = s3Client.getObject(GetObjectRequest.builder()
        .bucket(bucketName)
        .key(metadata.getStorageKey())
        .build());

    // Detect actual MIME type
    Tika tika = new Tika();
    String actualContentType = tika.detect(s3Object);

    // Force download for executable types
    if (isDangerous(actualContentType)) {
        response.setContentType("application/octet-stream");
        response.setHeader("Content-Disposition", "attachment");
    } else {
        response.setContentType(actualContentType);
    }

    // ... serve file
}

private boolean isDangerous(String contentType) {
    return contentType.equals("text/html")
        || contentType.equals("application/javascript")
        || contentType.equals("application/x-sh");
}

For more on XSS prevention, see Input Validation.

Content Security Policy for User Uploads

If serving user-uploaded content from the same domain as your application, use Content Security Policy (CSP) to prevent execution of malicious scripts:

@Configuration
public class SecurityHeadersConfig {

    @Bean
    public FilterRegistrationBean<ContentSecurityPolicyFilter> cspFilter() {
        FilterRegistrationBean<ContentSecurityPolicyFilter> registration = new FilterRegistrationBean<>();
        registration.setFilter(new ContentSecurityPolicyFilter());
        registration.addUrlPatterns("/files/*");
        return registration;
    }
}

public class ContentSecurityPolicyFilter implements Filter {

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
            throws IOException, ServletException {

        HttpServletResponse httpResponse = (HttpServletResponse) response;

        // Prevent scripts from executing in user-uploaded content
        httpResponse.setHeader("Content-Security-Policy", "default-src 'none'; style-src 'unsafe-inline'; sandbox");
        httpResponse.setHeader("X-Content-Type-Options", "nosniff");
        httpResponse.setHeader("X-Frame-Options", "DENY");

        chain.doFilter(request, response);
    }
}

Better approach: Serve user uploads from a separate domain (e.g., uploads.example.com vs. app.example.com). This creates a security boundary - even if an attacker uploads malicious content, it cannot access cookies or local storage from your main application domain.

Monitoring and Cost Optimization

Storage Metrics

Track file storage usage to identify cost optimization opportunities:

@Service
public class StorageMetricsService {

    private final FileMetadataRepository fileRepository;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 300000)  // Every 5 minutes
    public void recordStorageMetrics() {
        // Total file count
        long totalFiles = fileRepository.count();
        meterRegistry.gauge("storage.files.total", totalFiles);

        // Total storage size
        Long totalBytes = fileRepository.sumSizeBytes();
        meterRegistry.gauge("storage.bytes.total", totalBytes != null ? totalBytes : 0);

        // Files by status
        Map<FileStatus, Long> countByStatus = fileRepository.countByStatus();
        countByStatus.forEach((status, count) ->
            meterRegistry.gauge("storage.files.by_status", Tags.of("status", status.name()), count)
        );

        // Average file size
        double avgSize = totalFiles > 0 ? (double) totalBytes / totalFiles : 0;
        meterRegistry.gauge("storage.bytes.average", avgSize);
    }
}

For more on observability, see Metrics.

Cost Optimization Strategies

Use lifecycle policies to automatically move old files to cheaper storage classes
Enable intelligent tiering for S3 to automatically optimize storage class based on access patterns
Compress files before uploading (especially logs, backups, and text documents)
Delete unused files - implement retention policies and soft delete with automatic cleanup
Use CDN caching to reduce bandwidth costs from storage egress
Monitor large files - identify and review unusually large uploads that may indicate abuse or inefficient storage

// Alert on unusually large files
@Service
public class FileUploadService {

    private static final long LARGE_FILE_THRESHOLD = 100 * 1024 * 1024; // 100MB

    public FileMetadata uploadFile(MultipartFile file, String userId) {
        if (file.getSize() > LARGE_FILE_THRESHOLD) {
            log.warn("Large file upload detected: user={}, filename={}, size={}MB",
                userId,
                file.getOriginalFilename(),
                file.getSize() / 1024 / 1024);

            // Could trigger manual review or additional validation
            notificationService.notifyAdmins("Large file upload",
                "User " + userId + " uploaded " + file.getSize() / 1024 / 1024 + "MB file");
        }

        return storeFile(file, userId);
    }
}

API Design - Designing file upload and download APIs
Security Overview - Security principles for file handling
Secrets Management - Managing cloud storage credentials
Observability - Monitoring file storage systems
Caching - CDN caching strategies

Overview​

Core Principles​

File Upload Best Practices​

Upload Size and Type Validation​

Virus and Malware Scanning​

Chunked and Resumable Uploads​

Cloud Storage Integration​

Choosing a Storage Provider​

Storage Configuration​

File Metadata Management​

Presigned URLs for Direct Uploads​

Generating Presigned URLs​

Image Optimization and Transformation​

On-the-Fly Image Transformation​

Responsive Images​

Modern Image Formats​

Access Control and Signed URLs​

Signed URL Generation​

Access Control Patterns​

CDN Integration​

CDN Configuration​

Cache Invalidation​

CDN Performance Optimization​

File Lifecycle Management​

Lifecycle Policies​

Soft Delete​

Security Best Practices​

Prevent Path Traversal​

Prevent MIME Type Confusion​

Content Security Policy for User Uploads​

Monitoring and Cost Optimization​

Storage Metrics​

Cost Optimization Strategies​

Related Guidelines​

Further Reading​