zone transfer to OSAS dns broken from PHX

Description

I've been experiencing issues with hostnames showing outdated IPs recently in PHX.

Looks like zone transfer is broken between foreman.phx.ovirt.org and OSAS DNS servers, specifically ns1.osci.io and ns2.osci.io that are mentioned as authoritative for this zone.

Example:

dig staging-shift-int.phx.ovirt.org @foreman.phx.ovirt.org

;; ANSWER SECTION:
staging-shift-int.phx.ovirt.org. 86400 IN A 38.145.50.70

dig staging-shift-int.phx.ovirt.org @ns1.osci.io
;; ANSWER SECTION:
staging-shift-int.phx.ovirt.org. 86400 IN A 66.187.230.36

dig staging-shift-int.phx.ovirt.org @ns2.osci.io
;; ANSWER SECTION:
staging-shift-int.phx.ovirt.org. 86400 IN A 66.187.230.36

The correct IP is 38.145.50.70 in this case and the name server in PHX has already been restarted and re-transfer initiated several times yet unfortunately the changes are not visible on OSAS DNS which is blocking us from rebuilding the staging OpenShift environment.

Activity

Show:
Emil Natan
January 14, 2020, 4:05 PM

I do not think zone transfer is broken. All 3 NSes share the same SOA for the zone, so it looks they are on the same page:

dig soa phx.ovirt.org @ns01.phx.ovirt.org. +short
ns01.phx.ovirt.org. admin.phx.ovirt.org. 8192168 28800 3600 604800 86400

dig soa phx.ovirt.org @ns1.osci.io +short
ns01.phx.ovirt.org. admin.phx.ovirt.org. 8192168 28800 3600 604800 86400

dig soa phx.ovirt.org @ns2.osci.io +short
ns01.phx.ovirt.org. admin.phx.ovirt.org. 8192168 28800 3600 604800 86400

 

You can try to bump the SOA on the master, see it’s replicated to both slaves (all have same new SOA) and see if there is the problem persist. If yes, then probably someone is stealing the query and answering from faulty cache (and lying about TTL and AA).

Marc Dequènes (Duck)
January 14, 2020, 4:33 PM

Jan 14 10:23:44 polly.osci.io named[4118]: client @0x7f4623203d30 66.187.230.11#20729/key osci__ovirt_phx: received notify for zone 'phx.ovirt.org'
Jan 14 10:23:44 polly.osci.io named[4118]: zone phx.ovirt.org/IN: notify from 66.187.230.11#20729: zone is up to date

Now, from the outside I get 66.187.230.36 too. In history I see “vi /etc/named/phx.ovirt.org.zone” but this zone is only available internally, you need to update /etc/named/external-view/phx.ovirt.org.zone for this to reach the rest of the universe.

 

Evgheni Dereveanchin
January 15, 2020, 10:13 AM

Some more background: the BIND in PHX has two views - the external and internal one. This is due to PTR subnets since foreman can only update /24 PTR zones and we’ve got a /25 which is addressed differently.

 

nsupdate runs across the internal view of phx.ovirt.org that is then transferred to the external one. From there the transfer to OSAS is done.

 

I did some manual edits to the internal zone file but they are already reflected by the external zone so not sure why data didn’t propagate to OSAS even though the zone serial seems to be the same.

 

Moreover, I’ve just ran nsupdate to add a test record “staging-shift-test.phx.ovirt.org“ and that appeared on OSAS servers immediately while “staging-shift.phx.ovirt.org” and “staging-shift-int.phx.ovirt.org“ still show up as 66.187.230.36 even though I've ran them through nsupdate as well already.

Evgheni Dereveanchin
January 15, 2020, 10:20 AM

OK, after some massaging the zone using “nsupdate delete“ and “nsupdate add“ I think I was able to get the records to show up properly from like the third try. The main issue is resolved but I’d still like to know how a zone transfer can be forced in case manual edits to the zone are made.

Marc Dequènes (Duck)
March 19, 2020, 2:51 PM

said it must be due to some manual manipulation that went wrong but things are fine now, thus closing.

Fixed

Assignee

Marc Dequènes (Duck)

Reporter

Evgheni Dereveanchin

Blocked By

None

Priority

Medium
Configure