System Context
Overview
The infrastructure supports two primary product offerings under a unified brand:
- Public Infrastructure: Hosting
risu.techand public-facing applications. - Private Home Cloud: A secure environment for lab, family, and personal services, accessible only via LAN or secure VPN.
Goals and Constraints
The system is designed around the following core requirements:
- Unified Naming Scheme: Consistent use of subdomains across both public and private contexts.
- Secure Boundary Enforcement: Implementation of a “hard boundary” to prevent accidental exposure of private services to the public internet.
- Frictionless Remote Access: Remote connectivity must provide an experience identical to local network access (the “couch Wi-Fi” experience).
- High Availability: Resilient workload management across multiple nodes, ensuring data consistency and service uptime.
- Accessibility: A low-friction user experience (UX) tailored for non-technical users.
Role Definitions
This section defines the core architectural roles required to realize the homelab’s vision and requirements. Each role represents a logical area of responsibility and governs specific aspects of the system’s design and operation.
Identity and Access
- Identity & Access: Centralized authentication and authorization.
- Remote Access: Secure extension of the home network.
Traffic and Connectivity
- Domain & Naming: Split-horizon DNS and naming conventions.
- Edge & Boundary: Security demarcation and traffic governance.
- Public Ingress: External traffic routing and TLS termination.
- Internal Ingress: Internal traffic routing and authentication enforcement.
Compute and Data
- Workload Placement & Scheduling: Orchestration and self-healing applications.
- Data & Storage: Data integrity, availability, and replication.
- Backup & Disaster Recovery: Recoverability and long-term data protection.
Operations and Governance
- Observability: Health monitoring, logging, and alerting.
- Configuration & Documentation: Source of truth, change control, and runbooks.
Role: Identity & Access
Purpose: Provide secure, centralized, and user-friendly authentication and authorization across all infrastructure services.
Responsibilities:
- Maintain a centralized Identity Provider (IdP) for accounts, groups, and multi-factor authentication (MFA).
- Implement Single Sign-On (SSO) to provide a unified login experience.
- Manage granular authorization policies for resource access.
- Handle session management, including timeouts and credential revocation.
Guarantees:
- A unified login experience is provided across all supported services.
- Multi-factor authentication is enforced for sensitive and external access.
- Unauthorized access attempts are blocked at the identity layer.
Out of Scope:
- Network-level access control (VPN/Firewall boundaries).
- Application-specific business logic authorization.
- Management of physical access tokens or hardware keys.
Role: Remote Access
Purpose: Provide a secure and seamless extension of the home network for remote devices.
Responsibilities:
- Manage secure device enrollment and hardware trust.
- Authenticate users during connection attempts via integrated identity services.
- Enforce network access policies for remote clients.
- Oversee the lifecycle of remote access credentials and revocation procedures.
Guarantees:
- Remote devices experience connectivity identical to local network access.
- Only verified devices and authenticated users are permitted to join the network.
- Encrypted communication channels are maintained for all remote traffic.
Out of Scope:
- Maintenance of physical ISP connections or external networking hardware.
- Routing policies for internal service-to-service communication.
- Application-level authorization.
Role: Domain & Naming
Purpose:
Ensure that *.risu.tech domain names resolve to the correct endpoints based on the user’s network context.
Responsibilities:
- Manage DNS records for public-facing services through the public registrar.
- Maintain a private DNS authority for internal service resolution.
- Implement split-horizon DNS to provide context-aware resolution for LAN and VPN clients.
- Standardize naming conventions across public and private environments.
Guarantees:
- Service names resolve to correct IP addresses based on the requester’s network origin.
- A unified naming scheme is maintained across all infrastructure components.
- Internal service names and metadata are not exposed to the public internet.
Out of Scope:
- IP address assignment and DHCP management.
- Traffic routing beyond the DNS resolution layer.
- Management of top-level domain registration beyond
risu.tech.
Role: Edge & Boundary
Purpose: Enforce the demarcation between the public internet and the private internal network to ensure secure traffic flow.
Responsibilities:
- Govern inbound traffic policies from the public internet.
- Enforce security policies for VPN-connected devices.
- Implement routing, firewall, and network segmentation rules for the local network (LAN).
- Prevent unauthorized access and indexing of private services by external entities.
Guarantees:
- Private services are not reachable from the public internet.
- Traffic is strictly isolated according to defined boundaries and security levels.
- The boundary remains resilient against common external scanning and discovery attempts.
Out of Scope:
- Application-level authentication (handled by Identity & Access).
- Service-to-service traffic encryption within the secure boundary.
- Hardware maintenance of physical networking equipment.
Role: Public Ingress
Purpose: Route HTTPS traffic from the public internet to designated public-facing applications.
Responsibilities:
- Terminate TLS for
*.risu.techpublic domains. - Route traffic exclusively to public-facing backend services.
- Implement basic security headers and rate limiting for public endpoints.
- Provide logging and observability for external traffic patterns.
Guarantees:
- Public services are reachable via standard HTTPS protocols.
- Hostname-based routing is deterministic and reliable.
- Traffic is never routed to internal-only services.
Out of Scope:
- Authentication for private services.
- Internal-only traffic routing.
- User identity storage.
Role: Internal Ingress
Purpose: Route HTTPS traffic to internal services accessible only from LAN or VPN.
Responsibilities:
- Terminate TLS for
*.risu.techinternal domains. - Enforce authentication before routing.
- Reject traffic originating from public internet.
Guarantees:
- No internal service is reachable without LAN/VPN presence.
- Hostname-based routing is deterministic.
Out of Scope:
- User identity storage.
- Application-level authorization.
Role: Workload Placement & Scheduling
Purpose: Orchestrate the deployment and lifecycle of applications across the compute cluster to ensure high availability and resource efficiency.
Responsibilities:
- Distribute application workloads to optimize resource utilization across the cluster.
- Provide self-healing capabilities by automatically rescheduling workloads upon node failure.
- Enforce resource governance through CPU and RAM limits and reservations.
- Facilitate dynamic service discovery for inter-service communication.
Guarantees:
- Applications remain highly available across individual node failures.
- Cluster resources are utilized efficiently according to defined priorities.
- Service endpoints are dynamically updated and discoverable.
Out of Scope:
- Internal application business logic and code quality.
- Provisioning and maintenance of physical server hardware or operating systems.
- Management of persistent data volumes (handled by Data & Storage).
Role: Data & Storage
Purpose: Maintain the integrity, availability, and performance of data across the infrastructure.
Responsibilities:
- Provide standardized storage interfaces (e.g., CSI) for application persistence.
- Manage data replication and physical distribution across failure domains.
- Enforce data consistency models suitable for various workload types.
- Oversight of data lifecycle management, including snapshots and retention.
Guarantees:
- Data remains available and consistent despite individual hardware failures.
- Protection is provided against silent data corruption through checksums and scrubbing.
- Storage performance is maintained according to application requirements.
Out of Scope:
- Application-specific database schemas and indexing.
- Long-term offsite archival and backup (handled by Backup & Disaster Recovery).
- Management of transient, non-persistent application data.
Role: Backup & Disaster Recovery
Purpose: Ensure the recoverability of critical data and services from various failure modes, including hardware failure and security incidents.
Responsibilities:
- Maintain immutable or offline copies of critical system and application data.
- Regularly perform and document restoration verification procedures.
- Establish and adhere to defined Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
- Coordinate the long-term archival of historical data.
Guarantees:
- Critical data is recoverable following a catastrophic site or cluster failure.
- Backup integrity is systematically verified through periodic restore testing.
- Recovery procedures are documented and actionable during an incident.
Out of Scope:
- High availability and real-time failover (handled by Workload and Storage roles).
- Real-time data synchronization.
- Backup of non-critical or transient application data.
Role: Observability
Purpose: Provide actionable insights into the health, performance, and security of the entire infrastructure.
Responsibilities:
- Collect, aggregate, and visualize system and application performance metrics.
- Centralize logging from all nodes, ingresses, and applications for analysis.
- Implement intelligent alerting and notification routing to ensure timely incident response.
- (Optional) Maintain distributed tracing for complex service interactions.
Guarantees:
- Full visibility is maintained into the health of all critical infrastructure components.
- Alerts are delivered to the correct stakeholders with sufficient context for resolution.
- Historical data is available for performance trending and capacity planning.
Out of Scope:
- Automated incident remediation or self-healing (handled by Workload Scheduling).
- Business-level analytics and reporting.
- Real-time user session tracking for marketing purposes.
Role: Configuration & Documentation
Purpose: Ensure the infrastructure is reproducible, auditable, and well-documented through a centralized source of truth.
Responsibilities:
- Maintain a central repository for all configurations, architecture decisions (ADRs), and diagrams.
- Establish and maintain an automated documentation delivery system.
- Formalize the change control process for infrastructure and application updates.
- Develop and maintain actionable runbooks for maintenance and troubleshooting.
- Establish and enforce policies for secure secrets storage and rotation.
Guarantees:
- The entire infrastructure configuration is reproducible from the source of truth.
- A permanent record of significant architectural decisions and changes is maintained.
- Maintenance procedures are clearly documented and accessible to all operators.
Out of Scope:
- Manual configuration of non-automated hardware.
- Code-level documentation for individual third-party applications.
- Procurement of hardware assets.
Non-Functional Requirements
This document details the non-functional requirements (NFRs) that govern the design, implementation, and operation of the homelab infrastructure.
Security
- Secure Boundary Enforcement: Private services must be strictly isolated to prevent accidental exposure to the public internet.
- Identity & Access Management: A centralized identity provider must be utilized, supporting multi-factor authentication (MFA).
- Secrets Governance: All credentials and sensitive data must be managed through defined storage and rotation policies.
- Network Segmentation: Traffic flow between services must be restricted according to clearly defined security policies.
Connectivity & Networking
- Seamless Remote Access: Remote devices must maintain an experience identical to local network connectivity via secure VPN.
- Naming Consistency: A unified naming scheme (
*.risu.tech) must be maintained across both public and private services using split-horizon DNS.
Availability & Reliability
- High Availability (HA): The system must remain operational across multiple nodes, ensuring service continuity and data consistency.
- Workload Rescheduling: Applications must automatically relocate to healthy nodes in the event of hardware or software failure.
- Data Persistence: The storage fabric must guarantee data consistency and replication across failure domains.
Data Protection
- Resilient Backup: Critical data must be protected through immutable and offline copies.
- Disaster Recovery: Restoration procedures must meet defined Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
- Restore Verification: Backup integrity must be regularly validated through systematic restore testing.
Usability
- Low-Friction UX: The infrastructure must provide an intuitive and accessible experience for non-technical users.
- Single Sign-On (SSO): Authentication must be streamlined to minimize login prompts through a unified session.
Maintainability
- Advanced Observability: Centralized logging and metrics must be implemented to facilitate rapid troubleshooting and performance analysis.
- Reproducibility: The entire infrastructure configuration must be defined within a central source-of-truth repository.
- Documentation: Maintenance tasks must be supported by clear, actionable runbooks.
- Automated Documentation Delivery: The source of truth for documentation must be automatically built and deployed to ensure accessibility and consistency.
Pipelines
Pipelines are found in the .forgejo/workflows/ directory in the source code repository, utilizing Forgejo Actions.
- docs_deploy: Build mdBook and deploy static HTML to the internal documentation server via rsync/SSH.
Architecture Decision Records
This directory contains a historical log of significant architectural decisions made throughout the evolution of the homelab project. Each record details the context, decision, and resulting consequences to provide transparency and rationale for the system’s design.
Records Index
- ADR 0001: Record Architecture Decisions
- ADR 0002: Split-Horizon DNS for Unified Naming
- ADR 0003: Documentation Delivery System
ADR 0001: Record Architecture Decisions
Status
Accepted
Context
A formal mechanism is required to document architectural decisions made during the development and evolution of the homelab project. This ensures long-term consistency, provides critical context for future modifications, and facilitates knowledge transfer.
Decision
The project will utilize Architecture Decision Records (ADRs) to document significant architectural choices. These records will be maintained within the doc/src/adr/ directory, following a sequential numbering scheme.
Consequences
- Enhanced Transparency: Provides clear visibility into the reasoning behind key architectural choices.
- Historical Context: Establishes a permanent record of the system’s evolution.
- Sustainable Maintenance: Facilitates easier onboarding and long-term system maintenance by preserving intent.
ADR 0002: Split-Horizon DNS for Unified Naming
Status
Accepted
Context
The project requires a unified naming scheme (*.risu.tech) that functions seamlessly across both public and private services. Key requirements include maintaining strict isolation for private services and providing a frictionless remote access experience that mirrors local network connectivity.
Decision
We will implement a split-horizon DNS architecture:
- Public DNS Authority: Resolves records exclusively for public-facing endpoints.
- Private DNS Authority: Resolves records for internal services and serves as the primary authority for LAN and VPN clients.
- Context-Aware Routing: Ingress controllers will enforce hostname-based routing determined by the traffic’s origin (public vs. private).
Consequences
- Unified User Experience: Users utilize consistent service names regardless of their physical or network location.
- Enhanced Security Profile: Internal service names and metadata are not exposed to public DNS.
- Operational Complexity: Requires the management and synchronization of two distinct sets of DNS records.
ADR 0003: Documentation Delivery System
Status
Accepted
Context
Infrastructure documentation must be easily accessible to all authorized users and updated automatically to reflect the current state of the repository. The documentation is authored in Markdown and managed by mdBook. We need a robust pipeline to build and deliver this documentation to a private (internal server) destination.
Decision
We will implement an automated documentation delivery system with the following components:
- Source of Truth: The
homelabrepository on Codeberg. - Build Engine: Forgejo Actions (using Forgejo Runners), triggered on pushes to the
mainbranch (specifically for changes within thedoc/directory) or via manual trigger (workflow_dispatch). - Single-Target Delivery:
- Private: Automated deployment to an internal server at
/var/www/docvia SSH/rsync for local access.
- Private: Automated deployment to an internal server at
- Security: SSH-based deployment will use a dedicated, restricted user and an SSH key stored as a secret in the CI environment.
- Serving: Nginx will be used to serve the static HTML output on the internal server.
Consequences
- Automated Consistency: Documentation is guaranteed to be up-to-date with the repository’s
mainbranch. - Reduced Complexity: Focusing on a single, internal delivery target simplifies the pipeline and avoids dependency on external “best-effort” services.
- Standardized Process: Leverages Forgejo Actions, providing compatibility with GitHub Actions-style workflows and existing Runner infrastructure.
- Secret Management: Requires careful handling of SSH keys within the CI platform.