nomad简明教程

k8s其实太复杂,对于小团队来说,光是hold住k8s就要花好几个人力,而且k8s更新迭代太快了,有种上了贼船就下不来的感觉。 nomad就简单多了,就是一个调度器,啥也不带,显得有点简陋,不过,对于中小型团队来说,完全足够。

下面是 k8s 和 nomad 的简单对比:

对比项 k8s nomad
简单易用 复杂 简单
概念复杂度 复杂 中等
更新迭代速度 太快了 合理
安装复杂度 复杂 简单
功能丰富程度 丰富 简单

手动安装 nomad 集群

安装之前要了解一些术语:https://www.nomadproject.io/docs/internals/architecture.html

  • 首先需要把二进制文件下载过来,wget https://releases.hashicorp.com/nomad/0.9.1/nomad_0.9.1_linux_amd64.zip
  • 然后解压,把二进制文件放到 /usr/local/bin
  • 改用户 sudo chown root:root /usr/local/bin/nomad
  • 编辑 /etc/systemd/system/nomad.service,内容是:

    [Unit]
    Description=Nomad
    Documentation=https://nomadproject.io/docs/
    Wants=network-online.target
    After=network-online.target
    
    # When using Nomad with Consul it is not necessary to start Consul first. These
    # lines start Consul before Nomad as an optimization to avoid Nomad logging
    # that Consul is unavailable at startup.
    #Wants=consul.service
    #After=consul.service
    
    [Service]
    ExecReload=/bin/kill -HUP $MAINPID
    ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
    KillMode=process
    KillSignal=SIGINT
    LimitNOFILE=infinity
    LimitNPROC=infinity
    Restart=on-failure
    RestartSec=2
    StartLimitBurst=3
    StartLimitIntervalSec=10
    TasksMax=infinity
    
    [Install]
    WantedBy=multi-user.target
    
  • 创建nomad用于存储数据用的文件夹 sudo mkdir -p /opt/nomad/

  • 创建nomad用于存储配置用的文件夹 sudo mkdir -p /etc/nomad.d/

  • 创建公共配置,路径是 /etc/nomad.d/nomad.hcl,内容是:

datacenter就是一个数据中心,data_dir就是刚才我们创建的nomad存储数据用的

datacenter = "home"
data_dir = "/opt/nomad"
  • 创建server的配置,路径是 /etc/nomad.d/server.hcl,内容是:

server_join 里把我们的三个节点都加进去。也可以只加自己,之后用命令来加

server {
  enabled = true
  bootstrap_expect = 3

  server_join {
    retry_join = ["192.168.122.116:4648", "192.168.122.221:4648", "192.168.122.20:4648"]
  }
}

把上面的步骤改成一个脚本,每台机器上执行一下即可:

#!/bin/bash

sudo chown root:root /usr/local/bin/nomad
cat > /etc/systemd/system/nomad.service << EOF
[Unit]
Description=Nomad
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target

# When using Nomad with Consul it is not necessary to start Consul first. These
# lines start Consul before Nomad as an optimization to avoid Nomad logging
# that Consul is unavailable at startup.
#Wants=consul.service
#After=consul.service

[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity

[Install]
WantedBy=multi-user.target
EOF

sudo mkdir -p /opt/nomad/
sudo mkdir -p /etc/nomad.d/

cat > /etc/nomad.d/nomad.hcl << EOF
datacenter = "home"
data_dir = "/opt/nomad"
EOF

cat > /etc/nomad.d/server.hcl << EOF
server {
  enabled = true
  bootstrap_expect = 5

  server_join {
    retry_join = ["192.168.122.116:4648", "192.168.122.221:4648", "192.168.122.20:4648"]
  }
}
EOF
  • 创建client的配置,路径是 /etc/nomad.d/client.hcl,内容是:

把server地址写进去,写一个即可,因为他们会同步的

client {
  enabled = true
  servers = ["192.168.122.116:4647"]
}

注意,要先创建好这些,然后再启动nomad。不能启动之后,再复制虚拟机。如果这样干的话,就会发现,nomad server members 有输出,但是 nomad node status 却只有一个。解决方案是把nomad停止服务,把 /opt/nomad 的数据删掉,再把 nomad 重新启动。

启动之后,就发现搭好了,这比 k8s 简单了几个数量级啊:

jiajun@ubuntu1:~$ nomad server members
Name            Address          Port  Status  Leader  Protocol  Build  Datacenter  Region
ubuntu1.global  192.168.122.116  4648  alive   true    2         0.9.1  home        global
ubuntu2.global  192.168.122.221  4648  alive   false   2         0.9.1  home        global
ubuntu3.global  192.168.122.20   4648  alive   false   2         0.9.1  home        global
jiajun@ubuntu1:~$ nomad node status
ID        DC    Name     Class   Drain  Eligibility  Status
1e1a0cb2  home  ubuntu3  <none>  false  eligible     ready
3afa1c4e  home  ubuntu2  <none>  false  eligible     ready
880defee  home  ubuntu1  <none>  false  eligible     ready

如果发现死都选不出leader,那么重来一遍即可。

起一个job

终于搭好了一个nomad集群,然后跑一个job:

job "httpecho" {
    region = "global"
    datacenters = ["home"]
    type = "service"

    group "web" {
        count = 2

        task "server" {
            driver = "docker"

            config {
                image = "hashicorp/http-echo"
                args  = ["-text", "hello world"]
                port_map = {
                    http = 5678
                }
            }

            resources {
                network {
                    mbits = 250

                    port "http" {}
                }
            }
        }
    }
}

执行一下:

jiajun@ubuntu1:~$ nomad job run httpecho.nomad
==> Monitoring evaluation "ef18e4d2"
    Evaluation triggered by job "httpecho"
    Allocation "20515739" created: node "1e1a0cb2", group "web"
    Allocation "bfa03c3c" created: node "3afa1c4e", group "web"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "ef18e4d2" finished with status "complete"
jiajun@ubuntu1:~$ logout
Connection to 192.168.122.116 closed.
jiajun@idea  ~ $ ansible ubuntu -a 'docker ps -a'
ubuntu3 | CHANGED | rc=0 >>
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                                            NAMES
e0c56865dd11        hashicorp/http-echo   "/http-echo -text 'h…"   5 seconds ago       Up 5 seconds        192.168.122.20:27191->5678/tcp, 192.168.122.20:27191->5678/udp   server-20515739-01f9-e7d4-ffff-07b4c64df802

ubuntu2 | CHANGED | rc=0 >>
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                                              NAMES
0c3c7a9ad015        hashicorp/http-echo   "/http-echo -text 'h…"   7 seconds ago       Up 6 seconds        192.168.122.221:21704->5678/tcp, 192.168.122.221:21704->5678/udp   server-bfa03c3c-64f3-ca87-4b28-e157ea980181

ubuntu1 | CHANGED | rc=0 >>
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

jiajun@idea  ~ $ http 192.168.122.221:21704
HTTP/1.1 200 OK
Content-Length: 12
Content-Type: text/plain; charset=utf-8
Date: Fri, 31 May 2019 06:34:49 GMT
X-App-Name: http-echo
X-App-Version: 0.2.3

hello world

大功告成!

总结

初步体验下来,nomad比k8s易用性上简直好太多,但是有一个阻碍中大型团队使用的问题:开源版本木有namespace支持。。。可惜啊可惜, 眼看容器调度的天下都要被k8s吃完了,还不快放开功能来抢一抢份额。


参考资料:


更多文章
  • 使用 PostgreSQL 搭建 JuiceFS
  • PostgreSQL 配置优化和日志分析
  • 有GitHub Copilot?那就可以搭建你的ChatGPT4服务
  • 窗口函数的使用(以PG为例)
  • 读《为什么学生不喜欢上学》
  • OpenAI Prompt Engineering 摘录和总结
  • 读《打造真正的新产品》
  • 2023年终总结
  • VueJS 总结
  • Linux 自动挂载 alist 提供的webdav
  • FreeBSD 使用 vm-bhyve 安装Debian虚拟机
  • FreeBSD 和 Linux 网卡聚合实现提速
  • GPT 帮我搞定了时区转换问题
  • 长任务系统如何处理?
  • macOS/Linux 编译 InputLeap