[06:23:27] 10DBA, 10Cloud-Services: Prepare and check storage layer for ngwikimedia - https://phabricator.wikimedia.org/T240772 (10Marostegui) p:05Triage→03Normal Let us know when the wiki is created so we can sanitize its data [06:37:53] 10DBA, 10Operations, 10Wikimedia-Incident: Disallow 'weight: 0' for MW db config in dbctl - https://phabricator.wikimedia.org/T239901 (10Marostegui) >>! In T239901#5739472, @Krinkle wrote: > > Is there a use case for having a replica only listed in "general" with weight 0? (As opposed to the lowest weight o... [06:41:01] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) [07:06:36] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) p:05Triage→03Normal [07:28:39] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) Operating on db[1126-1138].eqiad.wmnet ` [07:22:35] marostegui@cumin1001:~$ sudo cumin db11[26-38].eqiad.wmnet 'megacli -AdpBbuCmd -a0 | grep "Auto-Learn"' 13 hosts will be targeted: db[1126-1138].eqiad.wmnet Confirm t... [07:37:09] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) [07:41:24] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) Same on codfw: ` [07:37:39] marostegui@cumin1001:~$ sudo cumin db21[03-35].codfw.wmnet 'echo "autoLearnMode=1" > /tmp/disable_learn && sudo megacli -AdpBbuCmd -SetBbuProperties -f /tmp/disable_learn -a0' 33 hosts will be... [07:41:53] 10DBA: Remove ar_comment from sanitarium triggers - https://phabricator.wikimedia.org/T234704 (10Marostegui) [08:53:19] 10DBA, 10Patch-For-Review: Remove ar_comment from sanitarium triggers - https://phabricator.wikimedia.org/T234704 (10Marostegui) [08:53:27] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) [08:53:29] 10DBA, 10Patch-For-Review: Remove ar_comment from sanitarium triggers - https://phabricator.wikimedia.org/T234704 (10Marostegui) 05Open→03Resolved All done [09:13:23] 10DBA, 10Data-Services: Reimport wikidatawiki.{pagelinks,page} on labsdb1010 - https://phabricator.wikimedia.org/T238399 (10Marostegui) >>! In T238399#5743379, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operations), href=https://tools.wmflabs.org/sal/log/AW8NjFK9fYQT6VcDIe0t} [2019-12... [11:53:55] db1130 MegaRAID CRITICAL 2019-12-16 11:52:39 0d 0h 0m 24s 1/3 CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [11:54:06] marostegui: important? [11:55:39] I may depool it just in case [11:58:09] i created a task for it [11:58:10] already [11:58:13] ah [11:58:18] I am about to depool it [11:58:22] will get to it later [11:58:22] should I? [11:58:30] "Enter y to confirm:" [11:58:30] sure, just to be safe [11:58:36] I will repool later if needed [11:58:43] sure [11:59:14] thanks [11:59:28] will get to it later [11:59:33] probably broken bbu [11:59:33] see you! [14:03:17] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) The host alerted again, but compared to the initial report, it looks like it is charging (I have done a few iterations of the command and the % keeps increasing): ` root@db1130:~# megacli -AdpBbuCmd -a0 BBU status for A... [14:12:33] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) 05Resolved→03Open We have to include x1 [14:12:56] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [14:13:39] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) [14:13:46] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) 05Resolved→03Open We have to include x1 [14:15:52] oh, actually for es too, no cdanis ? [14:15:53] ^ [14:16:09] x1 and es1/2/3 [14:16:13] \o/ [14:16:15] are all dbctl-ified [14:16:24] and I was about to send a patch to remove them from db-{eqiad,codfw}.php [14:16:32] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) [14:16:32] cool [14:16:42] and I think we can remove hostsByName as well [14:16:48] maybe codfw first, and we can test with mwdebug? [14:16:53] sure [14:17:05] do you want to test removing hostsByName? :D [14:17:14] sure! [14:17:19] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [14:17:59] let me figure out how I did these diffs that I did before [14:18:09] let's start with x1 and es? [14:18:26] and then figure out about hostsByName? [14:18:35] just in case we break something, we can isolate it to a given thing [14:18:53] ok sounds good [14:23:24] marostegui: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/558054 [14:23:53] checking [14:39:42] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [14:43:42] 10DBA: db1130 BBU possible issues - https://phabricator.wikimedia.org/T240823 (10Marostegui) As of now, BBU is at 100%: ` root@db1130:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType: BBU Voltage: 3935 mV Current: 0 mA Temperature: 29 C Battery State: Optimal BBU Firmware Status: Charging Sta... [14:49:10] ok let's remove hostsByName another day (I am thinking like Wednesday) -- there's one or two annoying things and I also want to do some more testing on debug [14:49:25] sure :) [14:49:50] it will feel very good to remove a few hundred lines that say "# do not remove or comment out" though :D [14:49:56] hahaha [14:50:21] the first time I had to remove one of those I asked j.ynus 200 times [14:50:37] hahah [14:50:51] "it just breaks the site horribly" [14:52:12] have we confirmed that we just need the local DC ones? [14:52:23] because that's what the new data will have AFAIUI [15:58:26] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) ` # for i in es1 es2 es3; do echo $i; dbctl -s eqiad section $i get | grep min ; done es1 "min_replicas": 1, es2 "min_replicas": 1, es3 "min_replicas": 1, # for i... [15:59:24] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) [15:59:46] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) ` # for i in x1; do echo $i; dbctl -s codfw section $i get | grep min ; done x1 "min_replicas": 1, # for i in x1; do echo $i; dbctl -s eqiad section $i get | grep min ; done x1... [16:00:05] 10DBA, 10conftool: set min_replicas on database sections in dbctl - https://phabricator.wikimedia.org/T231019 (10Marostegui) 05Open→03Resolved [16:03:57] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) [16:37:34] 10DBA, 10conftool: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1 - https://phabricator.wikimedia.org/T231018 (10Marostegui) [17:06:33] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1123 - https://phabricator.wikimedia.org/T240534 (10Marostegui) >>! In T240534#5736762, @wiki_willy wrote: > @Jclark-ctr - looks like this one is still under warranty, so you should be able to just RMA it. Thanks, Willy If we could try to RMA it today, a... [20:50:54] volans: I don't think so, I am going to do some testing but would love to hear for sure [20:57:34] cdanis: the most "historical" comment I get on those lines was from jy.nus when I started, so you might want to check with him. I think we just need the local ones and might be a relic from the past to maybe support cross-dc queries in an emergency. [20:57:50] adding in both DCs is also easy, like a one or two lines code change [20:58:00] I know, but feels wrong [22:06:44] I did not see anything bad in logs while testing this with mwdebug2001, and everything I tried (a few different wikis and projects) seemed to work [22:11:01] (this==hostsByName from etcd) [22:21:39] I also did the dbrepllag API for a wiki from each section, and that all looked proper (correct hosts, lag numbers filled in with something)