- Published on
GitLab在docker和Kubernetes之间折腾
- Authors
- Name
- 老杨的博客
GitLab在docker和Kubernetes之间折腾
@[toc]
1. 概述
最近用上了Kubernetes,刚好又要求Gitlab AutoDev配合Kubernetes,所以将旧的Gitlab升级下,并迁移成了helm版本。
但是在使用过程中,发现并不如docker版本稳定,特别是pod在重新分配后,在节点上pull image失败问题,即使配置了镜像加速,虽然有办法解决(我是多个节点去pull,某个成功后,tag,push到私有仓库,再到pod分配节点pull,tag)。
另外,helm版本的gitlab组件全部分离,相互依赖,非常复杂。在不熟悉gitlab内部原理和通信的情况下,排查问题非常困难。
于是,又将helm版本迁移到docker版本,反而是遇到了坑。本想放弃迁移回docker版本,谁想在2018年12月11日,git突然push失败了,而且问题排查半天,也没发现有用的信息。转而解决在docker版本下恢复helm版本的gitlab数据对象存储问题,好在解决了,虽然也有些问题,至少能用了。
2. Gitlab从docker迁移到Kubernetes
Gitlab docker: GitLab-CE 9.5.4 Gitlab Kubernetes: GitLab-CE 11.4.3
2.1 备份恢复过程
进入gitlab docker:
docker exec -it mygitlab /bin/bash
执行备份:
gitlab-rake gitlab:backup:create & # 避免超时自动退出
helm安装gitlab:
helm upgrade --install gitlab --timeout 600 --namespace devops . \
--set gitlab.migrations.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-rails-ce \
--set gitlab.sidekiq.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-sidekiq-ce \
--set gitlab.unicorn.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-unicorn-ce \
--set gitlab.unicorn.workhorse.image=registry.gitlab.com/gitlab-org/build/cng/gitlab-workhorse-ce \
--set gitlab.task-runner.image.repository=registry.gitlab.com/gitlab-org/build/cng/gitlab-task-runner-ce \
--set certmanager-issuer.email=29ygq@sina.com \
--set gitlab.migrations.initialRootPassword.key="Git@domain.com" \
--set glabal.initialRootPassword.key="Git@domain.com" \
--set global.hosts.domain=domain.com \
--set global.smtp.password.secret=163mail \
--set global.time_zone="Asia/Shanghai" \
--set gitlab.gitaly.persistence.storageClass=ceph-rbd \
--set gitlab.gitaly.persistence.size=20Gi \
--set gitlab.gitaly.persistence.storageClass=ceph-rbd \
--set postgresql.persistence.size=5Gi \
--set postgresql.persistence.storageClass=ceph-rbd \
--set minio.persistence.size=5Gi \
--set minio.persistence.storageClass=ceph-rbd \
--set redis.persistence.size=1Gi \
--set redis.persistence.storageClass=ceph-rbd \
--set nginx-ingress.enabled=false \
--set prometheus.install=false \
--set certmanager.install=false \
--set gitlab.gitlab-shell.service.externalPort=2222 \
--set gitlab.gitlab-shell.service.internalPort=2222 \
--set gitlab.gitlab-runner.rbac.clusterWideAccess=true \
--set gitlab.gitlab-runner.rbac.create=true \
--set gitlab.gitlab-runner.runners.privileged=true
将备份文件拷出来,并同步到Kubernetes master节点:
docker cp mygitlab:/var/opt/gitlab/backups/1541148206_2018_11_02_9.5.4_gitlab_backup.tar /tmp/
rsync -avzP /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar root@master_ip:/tmp/
进入Kubernetes上的gitlab task-runner pod中:
kubectl exec <task-runner pod name> -it /bin/bash
执行恢复:
backup-utility --restore -f file:///tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar &
2.2 恢复失败解决
恢复中断:
[root@lab1 tmp]# kubectl describe pod gitlab-task-runner-6cfc59d65d-zg64k -n devops
Name: gitlab-task-runner-6cfc59d65d-zg64k
Namespace: devops
Priority: 0
PriorityClassName: <none>
Node: lab1/
Start Time: Wed, 31 Oct 2018 17:04:30 +0800
Labels: app=task-runner
pod-template-hash=6cfc59d65d
release=gitlab
Annotations: checksum/config: dcbb3fa285b4af4cc7d76913713f826335d0826672270a02beeda74bdac8f528
cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container task-runner was using 11874956Ki, which exceeds its request of 0.
Gitlab上有issue说明,但是当前官方版本还没有修复。 https://gitlab.com/charts/gitlab/issues/705 https://github.com/kubernetes/enhancements/issues/361
我们按他的想法,给task-runner添加持久化存储。
pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gitlab-task-runner
namespace: devops
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-rbd
resources:
requests:
storage: 20Gi
kubectl apply -f pvc.yaml
kubectl edit deploy gitlab-task-runner -n devops
找到相关地方,添加挂载点和存储点:
volumeMounts:
- mountPath: /srv/gitlab/tmp
name: data-storage-tmp
volumes:
- name: data-storage-tmp
persistentVolumeClaim:
claimName: gitlab-task-runner
待POD重新启来,再将备份文件拷进去,重新恢复:
kubectl cp /tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar devops/gitlab-task-runner-5d859b8c8d-gbggx:/srv/gitlab/tmp/
kubectl exec -it gitlab-task-runner-5d859b8c8d-gbggx -n devops -- backup-utility --restore -f file:///srv/gitlab/tmp/1541148206_2018_11_02_9.5.4_gitlab_backup.tar
恢复后,登录系统,发现仓库为空。
尝试将目录repositories
同步过去恢复。
docker目录为/var/opt/gitlab/git-data/repositories
k8s里目录为/home/git/repositories
恢复后数据可用。
3. Gitlab从Kubernetes迁移到docker
Gitlab docker: GitLab-CE 11.5.2 Gitlab chart: 1.3.2 GitLab-CE 11.5.2
3.1 备份恢复过程
因为有好几G的数据量了,为了加快备份速度,不备份repositories,而是手动打包repositories。
kubectl exec <task-runner pod name> -i /bin/bash backup-utility --skip repositories
备份好的文件类似下面文件名: 1544529772_2018_12_11_11.5.2_gitlab_backup.tar
提前安装docker版gitlab,当时我装的时候版本为11.5.2。
sudo docker run --detach \
--hostname gitlab.domain.com \
--publish 443:443 --publish 80:80 --publish 10022:22 \
--name gitlab \
--restart always \
--volume /data/gitlab/config:/etc/gitlab \
--volume /data/gitlab/logs:/var/log/gitlab \
--volume /data/gitlab/data:/var/opt/gitlab \
gitlab/gitlab-ce:latest
因为恢复过程中,它会寻找/data/gitlab/data/backups/
目录下的备份打包文件,因此,我们可以手动准备已经解压好的备份文件,然后打包一个小文件为1544529772_2018_12_11_11.5.2_gitlab_backup.tar
以减少恢复过程中的解压时间(特别对于多次恢复)。
恢复命令: docker exec -it gitlab gitlab-rake gitlab:backup:restore
直接将备份恢复后,在页面上随便都会报500错误,日志提示'Object Storage is not enabled'
,需要按下文开启lfs功能再恢复数据。
###3.2 500错误 * 情况一:
上文提到的'Object Storage is not enabled'
,gitlab.rb
需要开启lfs功能。
### Job Artifacts
# gitlab_rails['artifacts_enabled'] = true
# gitlab_rails['artifacts_path'] = "/var/opt/gitlab/gitlab-rails/shared/artifacts"
####! Job artifacts Object Store
####! Docs: https://docs.gitlab.com/ee/administration/job_artifacts.html#using-object-storage
gitlab_rails['artifacts_object_store_enabled'] = true
gitlab_rails['artifacts_object_store_direct_upload'] = true
gitlab_rails['artifacts_object_store_background_upload'] = true
gitlab_rails['artifacts_object_store_proxy_download'] = true
gitlab_rails['artifacts_object_store_remote_directory'] = "artifacts"
gitlab_rails['artifacts_object_store_connection'] = {
'provider' => 'AWS',
'region' => 'eu-west-1',
'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT',
# # The below options configure an S3 compatible host instead of AWS
'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
'endpoint' => 'http://127.0.0.1:9000', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
'host' => 'localhost',
'path_style' => true # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}
### Git LFS
gitlab_rails['lfs_enabled'] = true
gitlab_rails['lfs_storage_path'] = "/var/opt/gitlab/gitlab-rails/shared/lfs-objects"
gitlab_rails['lfs_object_store_enabled'] = true
gitlab_rails['lfs_object_store_direct_upload'] = true
gitlab_rails['lfs_object_store_background_upload'] = true
gitlab_rails['lfs_object_store_proxy_download'] = true
gitlab_rails['lfs_object_store_remote_directory'] = "lfs-objects"
gitlab_rails['lfs_object_store_connection'] = {
'provider' => 'AWS',
'region' => 'eu-west-1',
'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT#',
# # The below options configure an S3 compatible host instead of AWS
'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
# 'endpoint' => 'https://s3.amazonaws.com', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
'host' => 'localhost',
'endpoint' => 'http://127.0.0.1:9000',
'path_style' => true
# # 'path_style' => false # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}
### GitLab uploads
###! Docs: https://docs.gitlab.com/ee/administration/uploads.html
gitlab_rails['uploads_storage_path'] = "/var/opt/gitlab/gitlab-rails/public"
gitlab_rails['uploads_base_dir'] = "uploads/-/system"
gitlab_rails['uploads_object_store_enabled'] = true
gitlab_rails['uploads_object_store_direct_upload'] = true
gitlab_rails['uploads_object_store_background_upload'] = true
gitlab_rails['uploads_object_store_proxy_download'] = true
gitlab_rails['uploads_object_store_remote_directory'] = "uploads"
gitlab_rails['uploads_object_store_connection'] = {
'provider' => 'AWS',
'region' => 'eu-west-1',
'aws_access_key_id' => 'Z1Xh28dpKo0Oc9Xjjq35n0lCceGYxHmGwpibz2WQ9acLtiUTBHftVTKxcLiISSld',
'aws_secret_access_key' => 'ebRmMNRHh9R9ve869SkspkC3xMOyPBmo0FGhud4JqBZu7zjuiMCu36xn7aEVNEeT',
# # # The below options configure an S3 compatible host instead of AWS
'host' => 'localhost',
'aws_signature_version' => 4, # For creation of signed URLs. Set to 2 if provider does not support v4.
'endpoint' => 'http://127.0.0.1:9000', # default: nil - Useful for S3 compliant services such as DigitalOcean Spaces
'path_style' => true # Use 'host/bucket_name/object' instead of 'bucket_name.host/object'
}
恢复备份数据后,gitlab的docker中执行以下命令迁移数据至对象存储:
# Avatars
gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Project, :avatar]"
gitlab-rake "gitlab:uploads:migrate[AvatarUploader, Group, :avatar]"
gitlab-rake "gitlab:uploads:migrate[AvatarUploader, User, :avatar]"
# Attachments
gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Note, :attachment]"
gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :logo]"
gitlab-rake "gitlab:uploads:migrate[AttachmentUploader, Appearance, :header_logo]"
# Markdown
gitlab-rake "gitlab:uploads:migrate[FileUploader, Project]"
gitlab-rake "gitlab:uploads:migrate[PersonalFileUploader, Snippet]"
gitlab-rake "gitlab:uploads:migrate[NamespaceFileUploader, Snippet]"
gitlab-rake "gitlab:uploads:migrate[FileUploader, MergeRequest]"
然后再将repositories
覆盖到/data/gitlab/data/git-data/repositories/
,并进gitlab docker把目录属主改为git。 在gitlab UI界面,部分项目可能提示为空项目,可以用高权限的维护者,把仓库pull下来,修改下,再push,即可解决项目为空问题。
- 情况二: 有可能为DB数据关系错误,需要升级数据库关系
输入以下指令查看数据升级状态
gitlab-rake db:migrate:status
果然发现有一些显示为Down,显示为Up即表示正常同,再执行数据库关系升级
gitlab-rake db:migrate
执行完成再重复重建、重启命令,问题解决。
4. helm版本问题记录
最后,记录下helm版本,push失败问题。日后再研究。
$ git push
warning: redirecting to https://gitlab.domain.com/devops/docs.git/
Enumerating objects: 10, done.
Counting objects: 100% (10/10), done.
Compressing objects: 100% (6/6), done.
Writing objects: 100% (7/7), 3.65 KiB | 1.22 MiB/s, done.
Total 7 (delta 2), reused 0 (delta 0)
remote: GitLab: Failed to authorize your Git request: internal API unreachable
To https://gitlab.domain.com/devops/docs
! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://gitlab.domain.com/devops/docs'
参考资料: [1] https://docs.gitlab.com/omnibus/README.html [2] https://docs.gitlab.com/ce/raketasks/backup\_restore.html [3] https://docs.gitlab.com/omnibus/settings/backups.html [4] https://docs.gitlab.com/ce/workflow/lfs/lfs\_administration.html [5] https://docs.gitlab.com/ce/administration/uploads.html [6] https://docs.gitlab.com/ce/administration/raketasks/check.html