Deploy Arvados on GKE

I have been trying to deploy Arvados on GKE and came across the following load balancer error from one of the Arvados services. How to fix this problem

cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl get svc
NAME                         TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                       AGE
arvados-api-server           LoadBalancer   10.88.12.90    34.89.54.152   444:31588/TCP                 31m
arvados-keep-proxy           LoadBalancer   10.88.11.130   34.89.54.152   25107:31630/TCP               31m
arvados-keep-store           ClusterIP      None           <none>         25107/TCP                     31m
arvados-keep-web             LoadBalancer   10.88.5.66     34.89.54.152   9002:32663/TCP                31m
arvados-postgres             ClusterIP      10.88.12.232   <none>         5432/TCP                      31m
arvados-slurm-compute        ClusterIP      None           <none>         6818/TCP                      31m
arvados-slurm-controller-0   ClusterIP      10.88.14.128   <none>         6817/TCP                      31m
arvados-workbench            LoadBalancer   10.88.8.200    <pending>      443:30734/TCP,445:32051/TCP   31m
arvados-ws                   LoadBalancer   10.88.5.207    34.89.54.152   9003:30153/TCP                31m
kubernetes                   ClusterIP      10.88.0.1      <none>         443/TCP                       22h
cibin@cibins-beast-13-9380:~/EBI/arvados-k8s/charts/arvados$ kubectl describe service/arvados-workbench
Name:                     arvados-workbench
Namespace:                default
Labels:                   app=arvados
                          app.kubernetes.io/managed-by=Helm
                          chart=arvados-0.1.0
                          heritage=Helm
                          release=arvados
Annotations:              cloud.google.com/neg: {"ingress":true}
                          meta.helm.sh/release-name: arvados
                          meta.helm.sh/release-namespace: default
Selector:                 app=arvados-workbench
Type:                     LoadBalancer
IP Families:              <none>
IP:                       10.88.8.200
IPs:                      10.88.8.200
IP:                       34.89.54.152
Port:                     wb2  443/TCP
TargetPort:               443/TCP
NodePort:                 wb2  30734/TCP
Endpoints:                10.84.2.18:443
Port:                     wb  445/TCP
TargetPort:               445/TCP
NodePort:                 wb  32051/TCP
Endpoints:                10.84.2.18:445
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type     Reason                  Age                   From                Message
  ----     ------                  ----                  ----                -------
  Normal   EnsuringLoadBalancer    2m38s (x11 over 28m)  service-controller  Ensuring load balancer
  Warning  SyncLoadBalancerFailed  2m34s (x11 over 28m)  service-controller  Error syncing load balancer: failed to ensure load balancer: failed to create forwarding rule for load balancer (ae0291ffb3043451580fc197edd8a34e(default/arvados-workbench)): googleapi: Error 400: Invalid value for field 'resource.IPAddress': '34.89.54.152'. Specified IP address is in-use and would result in a conflict., invalid

Hi cibinsb, I’m going to have a look but haven’t had time yet.

Any progress on this issue?

Yeah, I’m still working on it, but I think that what is happening here is that we are running into the “All-ports” behavior of the GKE controller. There is a reference to this at https://cloud.google.com/kubernetes-engine/docs/how-to/service-parameters:

All-ports

The GKE controller automatically sets the allPorts field in the forwarding rule if there are 5 or more ports in the service spec in GKE versions 1.20.6 and later or versions 1.21 and later.

Our Helm chart defines 6 ports forwarded to 5 different pods. The last service to come up loses out - number 5 was rewritten to use allPorts, so number 6 conflicts with the error you saw, e.g.:

Error syncing load balancer: failed to ensure load balancer: failed to create forwarding rule for load balancer (a8d425e2449f445a8a04b39
52a776333(default/arvados-workbench)): googleapi: Error 400: Invalid value for field ‘resource.IPAddress’: ‘x.x.x.x’. Specified IP address is in-use and would result in a conflict., invalid

I haven’t found a way to disable this behavior yet. It may be time to add an ingress to the chart to work around this problem.