[08:29:45] we had a spike of traffic on db1105 s1, maybe s2. We should keep a look on that host in case it is server/host issues [08:30:12] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1105&var-port=13311&from=1568349279686&to=1568363222394 [09:39:30] you probably are aware already, but just in case: there was a regression reported to the Debian BTS for something breaking Mariadb replication introduced in 10.1.41 and 10.3.17 [09:39:32] https://jira.mariadb.org/browse/MDEV-20247 [09:39:37] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939866 [09:39:41] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=939819 [09:40:16] not sure if this actually affects our setup, but they made a regression release apparently [09:40:47] https://packages.qa.debian.org/m/mariadb-10.3/news/20190913T090048Z.html [09:41:36] (I noticed that db1074.eqiad.wmnet has 10.1.41-1 ) [09:42:13] thanks for the pinch, checking [09:42:28] moritzm: we do very slow upgrades, as you know [09:42:38] those are the reasons :-D [09:43:06] that's a good thing! The bug in Debian is also quite fresh [09:43:52] and https://jira.mariadb.org/browse/MDEV-20247 is actually reported from a Devuan system... [09:44:07] it seems fixed on .42, according to devs [09:44:40] moritzm: I know you are behind us on doing upgrades, but things like that are why we go slow, and I think you understand [09:44:59] you can only break your data once :-) [09:45:10] on the good side, our monitoring would have cought that [09:45:53] I wonder if that is caused due to the forced checks and upgrade that the debian package does [09:45:58] on start [09:46:02] which we don't like [09:46:07] and why we disable those [09:46:48] moritzm: we have 10.3.17 too [09:47:12] I will send an email to manuel so he is aware [09:47:17] that was a nice ping, moritzm [09:53:42] totally understood wrt upgrade frequency for sure! the report to the JIRA is from a Devuan system and the reporter states "since you no longer provide an init script and expect everyone is running systemd, the init script from a previous version has been take", so I read that as affecting the init script provided by mariadb upstream and that the reporter doesn't use a deb, but a local installaion [17:39:58] 10DBA, 10AbuseFilter, 10MW-1.34-notes (1.34.0-wmf.1; 2019-04-16), 10Wikimedia-production-error: Read timeout reached while viewing AbuseLog - https://phabricator.wikimedia.org/T221357 (10Daimona) 05Open→03Resolved The long-term goal is T220791, this issue was resolved, hence closing. [18:15:55] 10DBA, 10Phabricator, 10Release-Engineering-Team-TODO, 10Documentation, and 2 others: Make PHD run on the backup phabricator server (phab2001, currently) - https://phabricator.wikimedia.org/T232883 (10mmodell) [19:11:17] 10DBA, 10Phabricator, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810 (10Dzahn) 05Open→03Stalled [19:22:28] 10DBA, 10Phabricator, 10Release-Engineering-Team-TODO, 10Release-Engineering-Team (Development services): Switch phabricator production to codfw - https://phabricator.wikimedia.org/T164810 (10Dzahn) [21:18:25] 10DBA, 10Operations, 10ops-eqiad: db1114 crashed due to memory issues (server under warranty) - https://phabricator.wikimedia.org/T229452 (10wiki_willy) @Cmjohnson or @Jclark-ctr - can one of you guys check this out early next week? Thanks, Willy