LINUX.ORG.RU

Помогите настроить proxmox claster

 


0

1

есть два сервера proxmox, оба сервера имеют

Kernel VersionLinux 4.4.49-1-pve #1 SMP PVE 4.4.49-86 
Создан кластер noda0,1.

noda1 отпадывает от кластера через пару минут

Apr 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
Apr 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dc
Apr 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
Apr 07 13:17:50 noda1 corosync[1289]: [TOTEM ] FAILED TO RECEIVE
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] notice: members: 2/1230
Apr 07 13:17:51 noda1 pmxcfs[1230]: [status] notice: members: 2/1230
Apr 07 13:17:51 noda1 pmxcfs[1230]: [status] notice: node lost quorum
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] crit: received write while not quorate - trigger resync
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] crit: leaving CPG group
Apr 07 13:17:51 noda1 pve-ha-lrm[1348]: unable to write lrm status file - unable to open file '/etc/pve/nodes/noda1/lrm_status.tmp.1348' - Отказано в доступе
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] notice: start cluster connection
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] notice: members: 2/1230
Apr 07 13:17:51 noda1 pmxcfs[1230]: [dcdb] notice: all data is up to date
Apr 07 13:18:00 noda1 pvedaemon[1919]: stop VM 200: UPID:noda1:0000077F:00007911:58E76758:qmstop:200:root@pam:
Apr 07 13:18:00 noda1 pvedaemon[1329]: <root@pam> starting task UPID:noda1:0000077F:00007911:58E76758:qmstop:200:root@pam:
Apr 07 13:18:00 noda1 kernel: vmbr0: port 2(tap200i0) entered disabled state
на первой ноде
systemctl status corosync.service
[MAIN  ] Completed service synchronization, ready to provide service.
апр 07 13:26:18 noda0 corosync[1244]: [TOTEM ] A new membership (192.168.40.250:13004) was formed. Members
апр 07 13:26:18 noda0 corosync[1244]: [QUORUM] Members[1]: 1
апр 07 13:26:18 noda0 corosync[1244]: [MAIN  ] Completed service synchronization, ready to provide service.
апр 07 13:26:20 noda0 corosync[1244]: [TOTEM ] A new membership (192.168.40.250:13008) was formed. Members
апр 07 13:26:20 noda0 corosync[1244]: [QUORUM] Members[1]: 1
апр 07 13:26:20 noda0 corosync[1244]: [MAIN  ] Completed service synchronization, ready to provide service.
апр 07 13:26:21 noda0 corosync[1244]: [TOTEM ] A new membership (192.168.40.250:13012) was formed. Members
апр 07 13:26:21 noda0 corosync[1244]: [QUORUM] Members[1]: 1

на второй ноде

systemctl status corosync.serviceапр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dc
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dc
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dc
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dc
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] Retransmit List: 4dd
апр 07 13:17:50 noda1 corosync[1289]: [TOTEM ] FAILED TO RECEIVE

на первой ноде

pvecm status
Quorum information
------------------
Date:             Fri Apr  7 13:28:37 2017
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/13388
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.40.250 (local)
 pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 noda0 (local)

на второй ноде

 pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 noda1 (local)
root@noda1:/home/kresh#  pvecm status
Quorum information
------------------
Date:             Fri Apr  7 13:30:00 2017
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000002
Ring ID:          2/11580
Quorate:          No

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.40.240 (local)

 pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 noda1 (local)

помогите настроить


tail -f /var/log/syslog
Apr  7 13:58:42 noda1 pmxcfs[5307]: [status] notice: update cluster info (cluste                                          r name  kell, version = 2)
Apr  7 13:58:42 noda1 pmxcfs[5307]: [dcdb] notice: members: 2/5307
Apr  7 13:58:42 noda1 pmxcfs[5307]: [dcdb] notice: all data is up to date
Apr  7 13:58:42 noda1 pmxcfs[5307]: [status] notice: members: 2/5307
Apr  7 13:58:42 noda1 pmxcfs[5307]: [status] notice: all data is up to date
Apr  7 13:58:43 noda1 pve-ha-crm[1335]: ipcc_send_rec failed: Конечная точка пер                                          едачи не подсоединена
Apr  7 13:58:43 noda1 pve-ha-crm[1335]: ipcc_send_rec failed: В соединении отказ                                          ано
Apr  7 13:58:43 noda1 pve-ha-crm[1335]: ipcc_send_rec failed: В соединении отказ                                          ано
Apr  7 13:58:43 noda1 systemd[1]: Started The Proxmox VE cluster filesystem.
Apr  7 13:58:45 noda1 pvestatd[1309]: ipcc_send_rec failed: Конечная точка перед                                          ачи не подсоединена
kresh1
() автор топика
Ответ на: комментарий от Deleted

время идентично, я думаю, может из-за того что я сначала сделал кластер, а потом на второй ноде сконфигурировал бридж

kresh1
() автор топика
Ответ на: комментарий от kresh1

точно из-за сетевых настроек , сделал кластер до сетевой настройки, работает, создал vmbr все нода отпала, вернул все обратно -заработало.

kresh1
() автор топика
Ответ на: комментарий от kresh1

Я пересоздал кластер вновь две ноды, при создании кластера, все вроде номально, но через несколько минут одна из нод отваливается лог на первой ноде

[TOTEM ] Retransmit List: 47d 47e 47f
Apr 10 11:46:03 noda0 corosync[3178]:  [TOTEM ] Retransmit List: 47d 47e 47f
Apr 10 11:46:03 noda0 corosync[3178]:  [TOTEM ] Retransmit List: 47d 47e 47f
Apr 10 11:46:03 noda0 corosync[3178]:  [TOTEM ] A new membership (192.168.40.250:30920) was formed. Members left: 2
Apr 10 11:46:03 noda0 corosync[3178]:  [TOTEM ] Failed to receive the leave message. failed: 2
Apr 10 11:46:03 noda0 pmxcfs[3191]: [dcdb] notice: members: 1/3191
Apr 10 11:46:03 noda0 corosync[3178]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
Apr 10 11:46:03 noda0 corosync[3178]:  [QUORUM] Members[1]: 1
Apr 10 11:46:03 noda0 corosync[3178]:  [MAIN  ] Completed service synchronization, ready to provide service.
Apr 10 11:46:03 noda0 pmxcfs[3191]: [status] notice: members: 1/3191
Apr 10 11:46:03 noda0 pmxcfs[3191]: [status] notice: node lost quorum
лог второй ноды
 ipcc_send_rec failed: Конечная точка передачи не подсоединена
Apr 10 11:42:03 noda01 pvedaemon[9702]: <root@pam> successful auth for user 'root@pam'
Apr 10 11:46:04 noda01 corosync[9657]:  [TOTEM ] FAILED TO RECEIVE
Apr 10 11:46:05 noda01 corosync[9657]:  [TOTEM ] A new membership (192.168.40.240:30920) was formed. Members left: 1
Apr 10 11:46:05 noda01 corosync[9657]:  [TOTEM ] Failed to receive the leave message. failed: 1
Apr 10 11:46:05 noda01 pmxcfs[9673]: [dcdb] notice: members: 2/9673
Apr 10 11:46:05 noda01 pmxcfs[9673]: [status] notice: members: 2/9673
Apr 10 11:46:05 noda01 corosync[9657]:  [QUORUM] This node is within the non-primary component and will NOT provide any services.
Apr 10 11:46:05 noda01 corosync[9657]:  [QUORUM] Members[1]: 2
Apr 10 11:46:05 noda01 corosync[9657]:  [MAIN  ] Completed service synchronization, ready to provide service.
Apr 10 11:46:05 noda01 pmxcfs[9673]: [status] notice: node lost quorum
Apr 10 11:46:05 noda01 pmxcfs[9673]: [dcdb] crit: received write while not quorate - trigger resync
Apr 10 11:46:05 noda01 pmxcfs[9673]: [dcdb] crit: leaving CPG group
Apr 10 11:46:05 noda01 pve-ha-lrm[1308]: unable to write lrm status file - unable to open file '/etc/pve/nodes/noda01/lrm_status.tmp.1308' - Отказано в доступе
Apr 10 11:46:05 noda01 pmxcfs[9673]: [dcdb] notice: start cluster connection
на обеих нодах

 journalctl -u corosync.service -u pve-cluster.service -b
-- Logs begin at Пн 2017-04-10 10:38:33 MSK, end at Пн 2017-04-10 11:46:05 MSK. --
апр 10 10:38:52 noda01 systemd[1]: Starting The Proxmox VE cluster filesystem...
апр 10 10:38:55 noda01 systemd[1]: Started The Proxmox VE cluster filesystem.
апр 10 10:38:55 noda01 systemd[1]: Started Corosync Cluster Engine.
апр 10 10:41:25 noda01 pmxcfs[1263]: [main] notice: teardown filesystem
апр 10 10:41:25 noda01 systemd[1]: Stopping The Proxmox VE cluster filesystem...
апр 10 10:41:35 noda01 systemd[1]: pve-cluster.service stop-sigterm timed out. Killing.
апр 10 10:41:35 noda01 systemd[1]: pve-cluster.service: main process exited, code=killed, status=9/KILL
апр 10 10:41:35 noda01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
апр 10 10:41:35 noda01 systemd[1]: Unit pve-cluster.service entered failed state.
апр 10 10:41:36 noda01 systemd[1]: Starting The Proxmox VE cluster filesystem...
апр 10 10:41:36 noda01 pmxcfs[1545]: [quorum] crit: quorum_initialize failed: 2
апр 10 10:41:36 noda01 pmxcfs[1545]: [quorum] crit: can't initialize service
апр 10 10:41:36 noda01 pmxcfs[1545]: [confdb] crit: cmap_initialize failed: 2
апр 10 10:41:36 noda01 pmxcfs[1545]: [confdb] crit: can't initialize service
апр 10 10:41:36 noda01 pmxcfs[1545]: [dcdb] crit: cpg_initialize failed: 2
апр 10 10:41:36 noda01 pmxcfs[1545]: [dcdb] crit: can't initialize service
апр 10 10:41:36 noda01 pmxcfs[1545]: [status] crit: cpg_initialize failed: 2
апр 10 10:41:36 noda01 pmxcfs[1545]: [status] crit: can't initialize service
апр 10 10:41:37 noda01 systemd[1]: Started The Proxmox VE cluster filesystem.
апр 10 10:41:37 noda01 systemd[1]: Starting Corosync Cluster Engine...
апр 10 10:41:38 noda01 corosync[1568]: [MAIN  ] Corosync Cluster Engine ('2.4.2'): started and ready to provide service.
апр 10 10:41:38 noda01 corosync[1568]: [MAIN  ] Corosync built-in features: augeas systemd pie relro bindnow
если на обеих нодах выполнить
systemctl restart corosync.service 
systemctl restart pve-cluster.service
systemctl restart pvedaemon.service
systemctl restart pveproxy.service

кластер заработает до первой синхронизации. на нодах настроен бридж

kresh1
() автор топика
15 августа 2017 г.
9 октября 2017 г.
Ответ на: комментарий от morfair

exho 1 > /sys/devices/virtual/net/vmbr0/bridge/multicast_querier

anonymous
()
Вы не можете добавлять комментарии в эту тему. Тема перемещена в архив.