Job And CronJob简介

我们在日常的工作中经常都会遇到一些需要进行批量数据处理和分析的需求,当然也会有按时间来进行调度的工作,在我们的 Kubernetes 集群中为我们提供了 JobCronJob 两种资源对象来应对我们的这种需求。

Job 负责处理任务,即仅执行一次的任务,它保证批处理任务的一个或多个 Pod 成功结束。而CronJob 则就是在 Job 上加上了时间调度。

Job

先创建一个Job资源对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@pool1 Job_CronJob]# vi Job.yaml 
apiVersion: batch/v1
kind: Job
metadata:
name: job-demo
spec:
template:
spec:
restartPolicy: Never
containers:
- name: counter
image: centos:7
imagePullPolicy: IfNotPresent
command:
- "bin/sh"
- "-c"
- "for i in 9 8 7 6 5 4 3 2 1; do echo $i; done && ls /"

RestartPolicy仅支持Never和Onfailure两种,不支持Always,因为Job是一次性任务处理资源,执行完该Pod就结束了,不会再有其他操作,如果写了Always的话,没执行结束一次就会再次重启,陷入死循环;

创建Job对象

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[root@pool1 Job_CronJob]# kubectl apply -f Job.yaml 
job.batch/job-demo created
[root@pool1 Job_CronJob]# kubectl get pod
NAME READY STATUS RESTARTS AGE
job-demo-nzjfk 0/1 Completed 0 4s
[root@pool1 Job_CronJob]# kubectl describe pod job-demo-nzjfk
Name: job-demo-nzjfk
Namespace: default
Priority: 0
Node: pool3/10.99.2.162
Start Time: Thu, 16 Dec 2021 17:18:58 +0800
Labels: controller-uid=8a008103-d977-49ec-a3d0-6e90c2b044d0
job-name=job-demo
Annotations: cni.projectcalico.org/containerID: cb5a471e2f45c49fe325b2c26eeb6c237e104abe5c56d987d90a31f28a1c6e02
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 10.244.206.27
IPs:
IP: 10.244.206.27
Controlled By: Job/job-demo
Containers:
counter:
Container ID: docker://46f3453f2ca5a188b3e6b14a9a34bb87b8b3a1499fa811c9f6698f9ebd207827
Image: centos:7
Image ID: docker://sha256:5e35e350aded98340bc8fcb0ba392d809c807bc3eb5c618d4a0674d98d88bccd
Port: <none>
Host Port: <none>
Command:
bin/sh
-c
for i in 9 8 7 6 5 4 3 2 1; do echo $i; done && ls /
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 16 Dec 2021 17:18:59 +0800
Finished: Thu, 16 Dec 2021 17:18:59 +0800
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-nzxhw (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-nzxhw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-nzxhw
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16s default-scheduler Successfully assigned default/job-demo-nzjfk to pool3
Normal Pulled 15s kubelet Container image "centos:7" already present on machine
Normal Created 15s kubelet Created container counter
Normal Started 15s kubelet Started container counter

看到Pod状态为Completed,这正是因为Pod执行任务结束正常退出;

查看Job日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@pool1 Job_CronJob]# kubectl logs job-demo-nzjfk
9
8
7
6
5
4
3
2
1
anaconda-post.log
bin
dev
etc
home
lib
lib64
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var

如果的任务执行失败了,我们这里定义了 restartPolicy=Never,那么任务在执行失败后 Job 控制器就会不断地尝试创建一个新 Pod,当然,这个尝试肯定不能无限进行下去。我们可以通过 Job 对象的 spec.backoffLimit 字段来定义重试次数,另外需要注意的是 Job 控制器重新创建 Pod 的间隔是呈指数增加的,即下一次重新创建 Pod 的动作会分别发生在 10s、20s、40s… 后。

如果我们定义的 restartPolicy=OnFailure,那么任务执行失败后,Job 控制器就不会去尝试创建新的 Pod了,它会不断地尝试重启 Pod 里的容器。