LINUX.ORG.RU
решено ФорумAdmin

rabbitmqctl join_cluster Error on AMQP «broker forced connection closure with reason 'shutdown'

 


0

1

Устанавливаю awx через ansible playbook https://github.com/nizarlazuardy/deploy_awx-rpm

Playbook валится с ошибкой:

TASK [nodes_join : Create RabbitMQ cluster] ******************************************************************************************************************************************************************
fatal: [172.26.9.173]: FAILED! => {
    "changed": true, 
    "cmd": "rabbitmqctl stop_app\nrabbitmqctl join_cluster rabbit@\"172.26.9.172\"\n", 
    "delta": "0:00:01.755605", 
    "end": "2020-03-02 10:01:41.939947", 
    "rc": 69, 
    "start": "2020-03-02 10:01:40.184342"
}

STDOUT:

Stopping rabbit application on node rabbit@awx-apatsev-3 ...
Clustering node rabbit@awx-apatsev-3 with rabbit@172.26.9.172


STDERR:

Error:
{:badrpc_multi, {:EXIT, {{:function_clause, [{:gen, :do_for_proc, [{:rex, {:error, {:node_name, :short}}}, #Function<0.131893493/1 in :gen.call/4>], [file: 'gen.erl', line: 220]}, {:gen_server, :call, 3, [file: 'gen_server.erl', line: 219]}, {:rpc, :do_call, 3, [file: 'rpc.erl', line: 327]}, {:lists, :foldl, 3, [file: 'lists.erl', line: 1263]}, {:rabbit_mnesia, :discover_cluster, 1, [file: 'src/rabbit_mnesia.erl', line: 744]}, {:rabbit_mnesia, :join_cluster, 2, [file: 'src/rabbit_mnesia.erl', line: 233]}, {:rpc, :"-handle_call_call/6-fun-0-", 5, [file: 'rpc.erl', line: 197]}]}, {:gen_server, :call, [{:rex, {:error, {:node_name, :short}}}, {:call, :rabbit_mnesia, :cluster_status_from_mnesia, [], #PID<0.62.0>}, :infinity]}}}, [error: {:node_name, :short}]}


MSG:

non-zero return code

fatal: [172.26.9.176]: FAILED! => {
    "changed": true, 
    "cmd": "rabbitmqctl stop_app\nrabbitmqctl join_cluster rabbit@\"172.26.9.172\"\n", 
    "delta": "0:00:01.792304", 
    "end": "2020-03-02 10:01:41.975966", 
    "rc": 69, 
    "start": "2020-03-02 10:01:40.183662"
}

STDOUT:

Stopping rabbit application on node rabbit@awx-apatsev-4 ...
Clustering node rabbit@awx-apatsev-4 with rabbit@172.26.9.172


STDERR:

Error:
{:badrpc_multi, {:EXIT, {{:function_clause, [{:gen, :do_for_proc, [{:rex, {:error, {:node_name, :short}}}, #Function<0.131893493/1 in :gen.call/4>], [file: 'gen.erl', line: 220]}, {:gen_server, :call, 3, [file: 'gen_server.erl', line: 219]}, {:rpc, :do_call, 3, [file: 'rpc.erl', line: 327]}, {:lists, :foldl, 3, [file: 'lists.erl', line: 1263]}, {:rabbit_mnesia, :discover_cluster, 1, [file: 'src/rabbit_mnesia.erl', line: 744]}, {:rabbit_mnesia, :join_cluster, 2, [file: 'src/rabbit_mnesia.erl', line: 233]}, {:rpc, :"-handle_call_call/6-fun-0-", 5, [file: 'rpc.erl', line: 197]}]}, {:gen_server, :call, [{:rex, {:error, {:node_name, :short}}}, {:call, :rabbit_mnesia, :cluster_status_from_mnesia, [], #PID<0.62.0>}, :infinity]}}}, [error: {:node_name, :short}]}


MSG:

non-zero return code

На ноде awx-apatsev-2 ошибок нет. На ноде awx-apatsev-4 ошибки такие же как и на awx-apatsev-3

Поиск ошибок в логах:

cat rabbit@awx-apatsev-3.log  | grep -A 1 error
2020-03-02 09:59:47.078 [error] <0.509.0> Error on AMQP connection <0.509.0> (127.0.0.1:45876 -> 127.0.0.1:5672, vhost: '/', user: 'guest', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2020-03-02 09:59:47.078 [error] <0.526.0> Error on AMQP connection <0.526.0> (127.0.0.1:45884 -> 127.0.0.1:5672, vhost: '/', user: 'guest', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2020-03-02 09:59:47.078 [error] <0.529.0> Error on AMQP connection <0.529.0> (127.0.0.1:45886 -> 127.0.0.1:5672, vhost: '/', user: 'guest', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
--
2020-03-02 09:59:58.807 [error] <0.453.0> ** Connection attempt from disallowed node 'rabbitmqcli-19480-rabbit@awx-apatsev-3' ** 
2020-03-02 09:59:59.224 [error] <0.455.0> ** Connection attempt from disallowed node 'rabbitmqcli-19480-rabbit@awx-apatsev-3' ** 
2020-03-02 10:01:30.814 [info] <0.8.0> Log file opened with Lager
--
2020-03-02 10:01:41.094 [error] <0.416.0> Error on AMQP connection <0.416.0> (127.0.0.1:46218 -> 127.0.0.1:5672, vhost: '/', user: 'guest', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

Ошибки в логе /var/log/messages

2020-03-02 09:57:01,960 ERROR    awx.conf.settings Database settings are not available, using defaults.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "conf_setting" does not exist
LINE 1: ...f_setting"."value", "conf_setting"."user_id" FROM "conf_sett...
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/conf/settings.py", line 87, in _ctit_db_wrapper
yield
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/conf/settings.py", line 415, in __getattr__
value = self._get_local(name)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/conf/settings.py", line 331, in _get_local
self._preload_cache()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/conf/settings.py", line 293, in _preload_cache
for setting in Setting.objects.filter(key__in=settings_to_cache.keys(), user__isnull=True).order_by('pk'):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
self._fetch_all()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1133, in execute_sql
cursor.execute(sql, params)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "conf_setting" does not exist
LINE 1: ...f_setting"."value", "conf_setting"."user_id" FROM "conf_sett...

И вот еще ошибка из /var/log/messages

2020-03-02 10:02:01,191 WARNING  kombu.mixins Broker connection error, trying again in 10.0 seconds: ConnectionRefusedError(111, 'Connection refused').
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/amqp/transport.py", line 137, in _connect
host, port, family, socket.SOCK_STREAM, SOL_TCP)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -9] Address family for hostname not supported
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kombu/utils/functional.py", line 344, in retry_over_time
return fun(*args, **kwargs)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kombu/connection.py", line 283, in connect
return self.connection
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kombu/connection.py", line 839, in connection
self._connection = self._establish_connection()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kombu/connection.py", line 794, in _establish_connection
conn = self.transport.establish_connection()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
conn.connect()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/amqp/connection.py", line 311, in connect
self.transport.connect()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/amqp/transport.py", line 77, in connect
self._connect(self.host, self.port, self.connect_timeout)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/amqp/transport.py", line 148, in _connect
"failed to resolve broker hostname"))
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/amqp/transport.py", line 161, in _connect
self.sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/bin/awx-manage", line 11, in <module>
load_entry_point('awx==9.2.0', 'console_scripts', 'awx-manage')()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/__init__.py", line 152, in manage
execute_from_command_line(sys.argv)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
self.execute(*args, **cmd_options)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
output = self.handle(*args, **options)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/main/management/commands/run_dispatcher.py", line 58, in handle
reaper.reap()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/main/dispatch/reaper.py", line 38, in reap
(changed, me) = Instance.objects.get_or_register()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/main/managers.py", line 134, in get_or_register
return (False, self.me())
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/awx/main/managers.py", line 116, in me
raise RuntimeError("No instance found with the current cluster host id")
RuntimeError: No instance found with the current cluster host id

Как исправить ошибку?

Если попытатся ввести ноду в кластер rabbitmq вручную, то будет ошибка:

rabbitmqctl join_cluster rabbit@"172.26.9.172"
Clustering node rabbit@awx-apatsev-3 with rabbit@172.26.9.172
Error:
{:badrpc_multi, {:EXIT, {{:function_clause, [{:gen, :do_for_proc, [{:rex, {:error, {:node_name, :short}}}, #Function<0.131893493/1 in :gen.call/4>], [file: 'gen.erl', line: 220]}, {:gen_server, :call, 3, [file: 'gen_server.erl', line: 219]}, {:rpc, :do_call, 3, [file: 'rpc.erl', line: 327]}, {:lists, :foldl, 3, [file: 'lists.erl', line: 1263]}, {:rabbit_mnesia, :discover_cluster, 1, [file: 'src/rabbit_mnesia.erl', line: 744]}, {:rabbit_mnesia, :join_cluster, 2, [file: 'src/rabbit_mnesia.erl', line: 233]}, {:rpc, :"-handle_call_call/6-fun-0-", 5, [file: 'rpc.erl', line: 197]}]}, {:gen_server, :call, [{:rex, {:error, {:node_name, :short}}}, {:call, :rabbit_mnesia, :cluster_status_from_mnesia, [], #PID<0.62.0>}, :infinity]}}}, [error: {:node_name, :short}]}

Я добавил в файл hosts имена нод из инвентори. Ошибка ушла.

https://gist.github.com/glennswest/6e43aa88f3de0a0cf4ecf00749a91fa1

Смотри, при запуске, если специально не настраивать, кролик назначает себе имя как «rabbit@короткое имя хоста»

И к названию этому очень привязан

Попытка в кластер с любым другим именем, включая FQDN, будет неудачной

Или в плейбуке правила переделывай

Или перенастраивай кроля

Я бы переделал плейбук.

P.S. ещё есть параметр cluster_partition_handling, из-за него может отваливаться вторая нода при остановке первой.

pkuutn ()
Последнее исправление: pkuutn (всего исправлений: 2)