230 lines
7.1 KiB
Markdown
230 lines
7.1 KiB
Markdown
# Szymiserver Infrastructure Runbook
|
|
|
|
## Overview
|
|
|
|
This server runs a Kubernetes cluster (kind) hosting:
|
|
|
|
| Service | URL | Notes |
|
|
|---------|-----|-------|
|
|
| Forgejo | https://git.szymi.ddns.net | Git + CI/CD |
|
|
| ArgoCD | https://argocd.szymi.ddns.net | GitOps deployment |
|
|
| Registry | https://registry.szymi.ddns.net | Private Docker registry |
|
|
| Nextcloud | https://nextcloud.szymi.ddns.net | File storage |
|
|
| Draw.io | https://drawio.szymi.ddns.net | Diagrams |
|
|
|
|
All TLS certificates are automatically issued and renewed via Let's Encrypt (cert-manager).
|
|
|
|
Apps are deployed via **ArgoCD GitOps** — push to Forgejo → ArgoCD syncs automatically.
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- Linux machine with Docker installed
|
|
- `kind` installed (`go install sigs.k8s.io/kind@latest`)
|
|
- `kubectl` installed
|
|
- Ports 80, 443, 2222 open on the firewall/router
|
|
- DNS A records pointing to server's public IP for all subdomains
|
|
- `/media/ssd` mounted (SSD for persistent data)
|
|
|
|
---
|
|
|
|
## Fresh Install From Scratch
|
|
|
|
### Step 1: Clone the repo
|
|
|
|
```bash
|
|
git clone https://git.szymi.ddns.net/szymi/szymiserver.git
|
|
cd szymiserver
|
|
```
|
|
|
|
### Step 2: Create the kind cluster
|
|
|
|
```bash
|
|
kind create cluster --config cluster-config.yml
|
|
```
|
|
|
|
This creates a cluster named `szymicluster` with:
|
|
- Port 80, 443 exposed (for ingress HTTP/HTTPS)
|
|
- Port 2222 exposed (for Forgejo SSH)
|
|
- `/media/ssd` mounted into the kind container (persistent volumes)
|
|
- `/var/run/docker.sock` mounted (for Forgejo runner)
|
|
|
|
### Step 3: Deploy infrastructure
|
|
|
|
```bash
|
|
chmod +x k8s/deploy-infrastructure.sh
|
|
./k8s/deploy-infrastructure.sh
|
|
```
|
|
|
|
**What the script does, in order:**
|
|
1. Fixes kind container DNS (forces IPv4 — prevents ImagePullBackOff on servers with broken IPv6)
|
|
2. Creates required host directories on `/media/ssd`
|
|
3. Applies all infrastructure via kustomize (ingress-nginx, cert-manager, CoreDNS, registry, ArgoCD, Forgejo)
|
|
4. Waits for cert-manager to be ready, then applies the ClusterIssuer
|
|
5. Waits for ingress-nginx to be ready
|
|
6. Restarts CoreDNS to pick up internal DNS config
|
|
7. Applies ArgoCD Application definitions (deploys apps via GitOps)
|
|
|
|
### Step 4: Wait for pods and certificates
|
|
|
|
```bash
|
|
# Watch pods come up (takes 2-5 min)
|
|
kubectl get pods -A -w
|
|
|
|
# Check TLS certificates (takes 1-3 min after pods are up)
|
|
kubectl get certificates -A
|
|
```
|
|
|
|
All certificates should show `READY = True`.
|
|
|
|
### Step 5: Get ArgoCD admin password
|
|
|
|
```bash
|
|
kubectl -n argocd get secret argocd-initial-admin-secret \
|
|
-o jsonpath="{.data.password}" | base64 -d && echo
|
|
```
|
|
|
|
Login at https://argocd.szymi.ddns.net with `admin` / above password.
|
|
|
|
---
|
|
|
|
## Manual Steps After Fresh Install
|
|
|
|
These cannot be automated and must be done once:
|
|
|
|
### Forgejo: Create admin account
|
|
1. Open https://git.szymi.ddns.net
|
|
2. Complete the setup wizard (first user becomes admin)
|
|
3. Create the repositories: `szymiserver`, `nextcloud`, `drawio`
|
|
|
|
### Forgejo Runner: Register
|
|
The runner pod expects `/media/ssd/forgejo/runner-data/` to contain a registered config.
|
|
|
|
```bash
|
|
# Get a runner registration token from Forgejo:
|
|
# Site Administration → Actions → Runners → Create new Runner
|
|
|
|
# Register the runner (exec into the pod):
|
|
kubectl exec -it -n forgejo deployment/forgejo-runner -- \
|
|
forgejo-runner register \
|
|
--instance https://git.szymi.ddns.net \
|
|
--token <TOKEN_FROM_FORGEJO> \
|
|
--name szymiserver \
|
|
--no-interactive
|
|
```
|
|
|
|
---
|
|
|
|
## Day-to-Day Operations
|
|
|
|
### Deploying a new app via ArgoCD
|
|
|
|
1. Create Kubernetes manifests in your app repo (see `drawio` or `nextcloud` repos as examples)
|
|
2. Add `nextcloud.szymi.ddns.net` to CoreDNS internal hosts:
|
|
```yaml
|
|
# k8s/infrastructure/coredns/coredns-custom.yaml
|
|
10.96.0.100 newapp.szymi.ddns.net
|
|
```
|
|
3. Create an ArgoCD Application in `k8s/argocd-apps/newapp.yaml`
|
|
4. Push to Forgejo and apply:
|
|
```bash
|
|
kubectl apply -f k8s/infrastructure/coredns/coredns-custom.yaml
|
|
kubectl rollout restart deployment coredns -n kube-system
|
|
kubectl apply -f k8s/argocd-apps/newapp.yaml
|
|
```
|
|
|
|
### After server restart (if pods have ImagePullBackOff)
|
|
|
|
This should no longer happen — `/etc/docker/daemon.json` is configured with `"dns": ["8.8.8.8"]` to prevent the router's broken IPv6 DNS from being injected into containers. The kind container also has `restart=always` so it auto-starts with Docker.
|
|
|
|
If it does happen anyway (e.g. daemon.json was reset), quick fix:
|
|
|
|
```bash
|
|
docker exec szymicluster-control-plane bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
|
|
kubectl delete pods -A --field-selector=status.phase!=Running 2>/dev/null
|
|
```
|
|
|
|
### Checking overall health
|
|
|
|
```bash
|
|
# All pods status
|
|
kubectl get pods -A | grep -v Running | grep -v Completed
|
|
|
|
# Certificate status
|
|
kubectl get certificates -A
|
|
|
|
# ArgoCD app sync status
|
|
kubectl get applications -n argocd
|
|
```
|
|
|
|
### Forcing ArgoCD to re-sync
|
|
|
|
```bash
|
|
kubectl patch application <app-name> -n argocd \
|
|
--type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Internet
|
|
│
|
|
▼
|
|
Router (ports 80, 443, 2222 → server)
|
|
│
|
|
▼
|
|
Linux Host (/media/ssd for persistent data)
|
|
│
|
|
▼
|
|
kind container (szymicluster-control-plane)
|
|
│
|
|
├── ingress-nginx ← routes all HTTP/HTTPS traffic
|
|
├── cert-manager ← issues Let's Encrypt TLS certs
|
|
├── CoreDNS ← internal DNS (routes *.szymi.ddns.net to ingress)
|
|
├── ArgoCD ← watches Forgejo, deploys apps automatically
|
|
├── Forgejo ← git server + CI/CD runner
|
|
├── Registry ← private Docker image registry
|
|
├── Nextcloud ← deployed via ArgoCD
|
|
└── Draw.io ← deployed via ArgoCD
|
|
```
|
|
|
|
### Why CoreDNS has internal hosts
|
|
cert-manager needs to verify domain ownership via HTTP-01 challenge. Inside the cluster, `*.szymi.ddns.net` would resolve to the external IP, which doesn't route back in. CoreDNS overrides these to point directly to ingress-nginx's fixed ClusterIP (`10.96.0.100`).
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
```
|
|
szymiserver/
|
|
├── cluster-config.yml # kind cluster definition (run once)
|
|
├── RUNBOOK.md # this file
|
|
├── k8s/
|
|
│ ├── deploy-infrastructure.sh # main deployment script
|
|
│ ├── infrastructure/
|
|
│ │ ├── kustomization.yaml # applies all infrastructure
|
|
│ │ ├── coredns/ # internal DNS overrides
|
|
│ │ ├── ingress-nginx/ # ingress controller
|
|
│ │ ├── cert-manager/ # TLS certificates
|
|
│ │ ├── argocd/ # GitOps controller
|
|
│ │ ├── forgejo/ # git server + runner
|
|
│ │ └── registry/ # private Docker registry
|
|
│ └── argocd-apps/
|
|
│ ├── drawio.yaml # ArgoCD app for draw.io
|
|
│ └── nextcloud.yaml # ArgoCD app for Nextcloud
|
|
```
|
|
|
|
---
|
|
|
|
## Persistent Data (on host at /media/ssd)
|
|
|
|
| Path | Used by |
|
|
|------|---------|
|
|
| `/media/ssd/forgejo/forgejo-data` | Forgejo repos, config, DB |
|
|
| `/media/ssd/forgejo/runner-data` | Runner registration + config |
|
|
| `/media/ssd/registry` | Docker image layers |
|
|
| `/media/ssd/nextcloud` | Nextcloud files |
|
|
| `/media/ssd/mariadb` | Nextcloud database |
|