[06:48:18] 10DBA, 10Community-Tech, 10cloud-services-team, 10Security: create production ip_changes table for RangeContributions - https://phabricator.wikimedia.org/T173891#3557553 (10ArielGlenn) >>! In T173891#3546484, @Bawolff wrote: >>>! In T173891#3546429, @kaldari wrote: >>>The table should not be in dumps - it... [07:21:32] 10DBA, 10Operations, 10ops-eqiad: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3557580 (10Marostegui) This happened again, we definitely need to change the BBU //cc @Cmjohnson ``` root@db1055:~# megacli -AdpBbuCmd -a0 BBU status for Adapter: 0 BatteryType:... [07:28:27] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3557602 (10Marostegui) [07:35:02] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3557606 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1099.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reim... [07:35:49] 10DBA, 10Operations, 10ops-eqiad: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3557608 (10Marostegui) After forcing the re-learn again: ``` ˜/icinga-wm 9:34> RECOVERY - MegaRAID on db1055 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy ``` Let's ju... [07:50:41] 10DBA, 10Operations, 10ops-eqiad: BBU issues on db1055, RAID cache on WriteThrough - https://phabricator.wikimedia.org/T174265#3557621 (10Marostegui) p:05Triage>03High And failed again. Let's not spend more time on this and just replace it. [07:55:50] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3557631 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1099.eqiad.wmnet'] ``` and were **ALL** successful. [08:22:37] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Apply schema change to add 3D filetype for STL files - https://phabricator.wikimedia.org/T168661#3557672 (10Marostegui) db2051 - codfw master has been upgraded to 10.0.32 [11:18:47] volans: hey, let me know when you are around [11:19:52] Amir1_: I was about to go for lunch [11:21:11] volans: let me know when you're back :) [11:21:13] Thanks [11:21:30] Bon appetite [11:21:34] sure, merci [11:32:14] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3558277 (10jcrespo) Duplicate of T156869? We should fix it for all users, not only datasets_p? [12:10:31] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3558326 (10Marostegui) [12:33:30] 10DBA, 10Cloud-Services: Prepare and check storage layer for hi.wikivoyage - https://phabricator.wikimedia.org/T173027#3558381 (10MarcoAurelio) Hmm. I though you had to check previously, I mean, before the DB is created if there's enough disk space, etc.? In any case, we'll notify you again after the DB is cre... [12:34:46] 10DBA, 10Cloud-Services: Prepare and check storage layer for hi.wikivoyage - https://phabricator.wikimedia.org/T173027#3558388 (10Marostegui) >>! In T173027#3558381, @MarcoAurelio wrote: > Hmm. I though you had to check previously, I mean, before the DB is created if there's enough disk space, etc.? In any cas... [12:36:03] 10DBA, 10Cloud-Services: Prepare and check storage layer for hi.wikivoyage - https://phabricator.wikimedia.org/T173027#3558396 (10MarcoAurelio) Okie dockie :) Everything is set up already I think. We just need a sysadmin willing to do it and a deployment window. [12:55:57] 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3558458 (10Marostegui) Any objection to use db1076 as a testing host? [13:08:41] Amir1_: back now (sorry I was sidetracked) [13:08:58] no worries [13:09:05] Let me find a proper place [13:10:56] 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3558500 (10jcrespo) Ok to me, we just need to find the time. [13:11:39] 10DBA: Test reliability of RAID configuration/database hosts on single disk failure - https://phabricator.wikimedia.org/T174054#3558503 (10Marostegui) @Cmjohnson let us know when you'd be available to do this test Thanks! [13:15:53] okay, found it [13:19:46] volans: okay, I tried to use your method but I couldn't [13:19:59] which one? [13:20:14] using sed [13:20:33] I'm talking about this: https://gerrit.wikimedia.org/r/#/c/373854/ [13:20:35] sed? I proposed awk IIRC [13:20:48] yeah, sorry, I confuse these two all the time [13:21:09] so what's the issue? [13:22:33] let me check [13:23:22] I don't see any "Processed up to page" in non-gzipped files atm on terbium [13:23:34] the IDs you're looking for are only in gzipped files [13:23:54] volans: they are gizpped now because several days have passed since the last run [13:24:48] so that won't be an issue [13:25:17] that is a thing that could happen anyway for different reasons, so I think you might need to cycle gzip files too if you cannot find the ID in the plain ones [13:25:48] volans: I'm not sure, This will be running just for twenty days or less [13:25:59] and in that case we can manually find and add it [13:27:39] I have no idea how important is this, what breaks if the ID is not found or worse, if a wrong ID is found ;) [13:34:03] anyway, what was not working in the awk approach? [13:37:15] I tried but it couldn't find anything [13:38:07] and your grep+sed could? because right now as I said there are none ;) [13:39:45] volans: yeah, I tried it in terbium several times [13:40:24] volans: at that day, right now they are gzipped [13:41:26] what I pasted here was after testing it on terbium [13:42:27] volans: okay, can you add some lines in on of the files so we test it and I fix my patch? [13:42:38] in the log file [13:42:39] ? [13:43:12] yeah [13:43:16] I don't have the proper rights [13:44:37] done, and: [13:44:38] ls /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} | sort -r | tail -n 100 | xargs -d "\n" awk '/Processed up to page (\d+?)/ { print $5 }' [13:44:41] works for me [13:44:53] although I still don't like it for the tail -n100, that I don't get [13:45:27] to make things faster? [13:46:29] you're tailing the list of files [13:46:36] so seems that you expect more than 100 files there [13:46:46] and grapping only the last 100 [13:46:46] yeah, I think we can remove that part [13:55:41] Amir1_: so the ID you probably want should be gathered by: [13:55:42] ls /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} | sort -r | xargs -d "\n" awk '/Processed up to page (\d+?)/ { print $5 }' | tail -n1 [13:56:03] volans: yeah, Already on it, was searching if we could combine awk and tail [13:56:15] but doesn't matter I guess [13:58:25] let me try one thing [13:59:10] this is better ;) [13:59:11] ls /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} | sort | xargs -d "\n" tac | awk '/Processed up to page (\d+?)/ { print $5; exit }' [14:00:09] mmmh still has something I don't like [14:01:22] so, the sorting is broken by definition [14:01:59] because the files have dates in it and the one you want as first is without date, so comes first [14:02:19] can we trust file modification time? [14:02:35] volans: yeah [14:03:40] then this should do: [14:03:41] ls -t /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} | xargs -d '\n' tac | awk '/Processed up to page (\d+?)/ { print $5; exit }' [14:04:56] okay [14:05:43] order by time, newest first, pass to xargs, cat the files from the end all together, awk, match the first line, print $5 and exit [14:06:03] ladsgroup@terbium:/var/log/wikidata$ ls -t /var/log/wikidata/rebuildTermSqlIndex.log{,*[0-9]} | xargs -d '\n' tac | awk "/Processed up to page (\d+?)/ { print $5; exit }" [14:06:03] Processed up to page 1050413 (Q1103084) [14:06:25] still not optimal that we're catting all matching files but seems that because we gzip them after 2 days, so should be at most 2 files [14:07:12] single quotes in awk [14:07:22] otherwise $5 is replaced by bash to empty string [14:07:29] assuming is not defined [14:09:30] we can't do that in puppet [14:10:24] just escape them \' [14:11:04] thanks [14:11:23] (it should work, but you know, it's puppet... we can check with the compiler) [14:12:56] volans: it seems it's not working, I remember trying that too [14:13:17] you still have '\n', escape them too or use "" there [14:13:30] okay [14:13:31] ohh [14:13:32] sorry [14:14:04] otherwise we can trick puppet ;) [14:16:41] volans: okay, it seems ready for merge, can you take a look? [14:16:50] sure [14:17:31] thank you [14:17:44] Amir1_: one last thing, what happen if no ID is found? so if --from-id is empty [14:18:41] it will fail [14:18:50] in that case we should fix it manually [14:19:01] and how we'll know? [14:19:17] that it failed I mean [14:19:18] I monitor it [14:19:25] if it stopped working [14:20:09] ok, fair enough, manual icinga ;) [14:20:41] I can scream louder :P [14:22:22] Thanks [14:22:38] Amir1_: merged, do you want me to run puppet on terbium? [14:23:19] yeah [14:23:21] thanks [14:23:30] running [14:25:01] crontab installed, will run in 5 minutes [14:25:23] I have a meeting starting exactly at 30, will keep an eye, but better if you check it too ;) [14:27:12] sure, I need to change my place soon but hopefully won't be needed [14:29:07] ack [14:29:27] it's ok to start from that ID, I grab the last one in the gzipped file [14:29:42] thanks [14:35:22] volans: it seems it's not working, or it might haven't started yet [14:37:36] It's safe to say it's not working [14:38:06] sorry in the meeting, I can check in a bit if it's not super urgent [14:42:15] yeah [14:42:15] sure [15:18:31] Amir1_: sorry, I'm back [15:19:54] I can see the cron line in syslog [15:20:36] and from the log seems that it didn't find the ID, checking [15:23:57] Amir1_: did the timeout ever worked? (I wondering about PATH) [15:28:37] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3558849 (10bd808) >>! In T173511#3558277, @jcrespo wrote: > Duplicate of T156869? We should fix it for... [15:48:23] ok, found the error, looking for a fix [16:15:26] volans: thanks. I was afk [16:15:36] no prob, the CR still don't fix it [16:15:45] multiple stuff, I'm testing but also on another meeting :( [16:15:59] the other issue is the escaped quote [16:16:09] puppet print them as escaped instead of normal [16:16:23] shoot [16:19:28] I'm trying to make puppet print what we wants, testing with the comoiler now [16:24:46] I was also fooled by my own pebkac ofc :D [16:26:31] latest PS is working, I'm running it with timeout 100s now [16:27:51] I'm merging it [16:28:29] the only real issue was the {} expansion that is bash only, while cron runs in /bin/sh by defailt [16:28:32] and I don't want to change that [16:28:44] so I went for: /bin/ls -t /var/log/wikidata/rebuildTermSqlIndex.log /var/log/wikidata/rebuildTermSqlIndex.log*[0-9] [16:28:54] and added path to all executable just to be sure [16:36:15] Amir1_: merged and is running right now [16:37:14] (I forced the start manually because the 30 minute was passed, I'll keep an eye next hour too) [17:18:36] volans: Thanks! [17:20:17] Amir1_: yw, so far so good [17:20:31] in few minutes the timeout should trigger and then at 30 the cron should start [17:22:12] I keep monitoring it [17:26:50] great, thanks, timout triggered, not running anymore [17:27:09] (I had put a timeout of 3000 and started around minute 35) [17:30:16] and started again [17:30:21] seems to work for me Amir1_ :D [17:30:42] Awesome [17:30:43] please remind us when it's time to remove it in few weeks ;) [17:30:45] Thank you [17:30:48] sure [17:30:49] cheers [17:30:56] Prost [17:39:27] * volans off [17:44:41] 10DBA, 10MediaWiki-Parser, 10MediaWiki-Platform-Team, 10Performance-Team: WMF ParserCache disk space exhaustion - https://phabricator.wikimedia.org/T167784#3559476 (10Krinkle) [19:01:36] 10DBA, 10Operations, 10Wiki-Setup (Create): Create elections committee private wiki - https://phabricator.wikimedia.org/T174370#3559677 (10KTC) [19:27:17] 10DBA, 10Data-Services: Design a method for keeping user-created tables in sync across labsDBs - https://phabricator.wikimedia.org/T156869#2988296 (10bd808) I think we should stop allowing write access entirely on the wiki replica servers by Cloud Services users. I know this will cause some currently unspecifi... [19:29:09] 10DBA, 10Data-Services: Design a method for keeping user-created tables in sync across labsDBs - https://phabricator.wikimedia.org/T156869#3559949 (10Halfak) As the person who filed this task, I agree 100% with @bd808's assessment. Shall we close this "declined"? [19:33:15] 10DBA, 10Data-Services: Design a method for keeping user-created tables in sync across labsDBs - https://phabricator.wikimedia.org/T156869#2988296 (10chasemp) @halfak +1. It was great to get this all on-task though as we will no doubt reference it in the future. [19:35:06] 10DBA, 10Cloud-Services: Prepare and check storage layer for electcomwiki - https://phabricator.wikimedia.org/T174385#3559994 (10Framawiki) [19:35:15] 10DBA, 10Cloud-Services: Prepare and check storage layer for electcomwiki - https://phabricator.wikimedia.org/T174385#3559994 (10Framawiki) [19:37:24] 10DBA, 10Cloud-Services: Prepare and check storage layer for electcomwiki - https://phabricator.wikimedia.org/T174385#3559994 (10Reedy) Not sure a private task was specifically needed... https://gerrit.wikimedia.org/r/374384 adds it to private_wikis [20:36:23] 10DBA, 10Cloud-Services: Prepare and check storage layer for electcomwiki - https://phabricator.wikimedia.org/T174385#3560260 (10Framawiki) >>! In T174385#3560023, @Reedy wrote: > Not sure a private task was specifically needed... > > https://gerrit.wikimedia.org/r/374384 adds it to private_wikis Task create... [20:44:05] 10DBA, 10Cloud-Services: Prepare and check storage layer for electcomwiki - https://phabricator.wikimedia.org/T174385#3560299 (10Reedy) Bleugh. "private" should've been "subtask" There's less to do for a private wiki, but just needs doing first