[02:11:36] Hello, I'm at the SF wiki salon, and I'm interested in PAWS for mediawiki [02:11:50] This page seems a little out of date: https://www.mediawiki.org/wiki/Manual:Pywikibot/PAWS [02:12:02] I'd love to be bold and edit it, but I'm not sure what the new directions are [02:13:00] specifically, user-config.py isn't present in my shell home directory on startup [02:14:07] there's one in /srv/paws/user-config.py that I copied into the home directory [02:14:26] but then I got a stack trace with `$ pwb.py login` which ended in https://paste.pound-python.org/show/NSkeXl7Pc5C3VukDaeg2/ [02:16:02] I'm pretty sure the rest of the article is useless if you can't get pywikibot to login [02:27:28] audiodude: you shouldn't copy that file from /srv/paws. You should just create the file from scratch and put stuff in there [02:27:44] I see that wording is super confusing [02:27:49] not sure who wrote that :) [02:29:04] audiodude: I fixed it a little [02:29:07] should work now? [02:32:06] nope, but closer [02:32:42] audiodude: what's your username on wiki? also what error are you seeing now? [02:32:49] Audiodude [02:33:18] WARNING: API error mwoauth-invalid-authorization-invalid-user: The authorization headers in your request are for a user that does not exist here [02:33:52] audiodude: ah, that usually means you haven't visited testwiki ever. [02:34:09] audiodude: can you go to test.wikipedia.org as your user so it can create an account for you on testwiki? [02:34:19] I'm logged into testwiki [02:34:44] okay it works now [02:34:51] once visiting the page in the browser [02:34:59] audiodude: yup! [02:35:09] audiodude: does pwb.py login work now? [02:36:55] yes [02:56:08] audiodude: awesome! do feel free to edit that page if you can :) [02:58:12] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ben Creasy was created, changed by Ben Creasy link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Ben_Creasy edit summary: Created page with "{{Tools Access Request |Justification=Looking to make a gadget out of https://github.com/NickSto/wikihistory |Completed=false |User Name=Ben Creasy }}" [03:23:23] hey folks, this is Ben Creasy aka shell instance username jcrben hoping for approval of tools access... [03:36:18] hey shout-user [03:36:28] shout-user: are you at the SF wikisalon? [03:36:30] * yuvipanda does approve [03:36:41] yep [03:37:02] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ben Creasy was modified, changed by Yuvipanda link https://wikitech.wikimedia.org/w/index.php?diff=1754485 edit summary: [03:37:25] shout-user: should be complete in about 30s. [03:37:30] I won't be around for long tho [03:37:36] thanks :) gergo is here helping me [03:38:19] shout-user: ah awesome :) [06:17:41] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Eirtaza was created, changed by Eirtaza link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Eirtaza edit summary: Created page with "{{Tools Access Request |Justification=For my learning and research. |Completed=false |User Name=Eirtaza }}" [06:38:40] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:59:43] PROBLEM - Puppet run on tools-exec-1405 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:10:07] 10Tool-Labs-tools-Xtools, 03Community-Tech-Sprint: Have Edit Counter use same architecture and front-end as the other pieces that have been re-written - https://phabricator.wikimedia.org/T160481#3142882 (10Samwilson) Cool, okay, so I've got the top-sites list working so far by just looping through all the site... [07:13:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [07:34:46] RECOVERY - Puppet run on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [08:46:03] 10PAWS, 10MediaWiki-extensions-OAuth: Oauth for PAWS fails - presumably because of username change - https://phabricator.wikimedia.org/T161696#3143080 (10FaFlo) so, is there a solution? sounds like a quick fix would be replacing the name in the paws record. I would also be fine with my user being deleted and t... [10:15:39] !log wikispeech Deploy latest from Git master: 64cbd96 (T159545, T159811, T159809) [10:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispeech/SAL [10:15:45] T159809: Highlighting fails on bold/italicized utterance - https://phabricator.wikimedia.org/T159809 [10:15:45] T159811: Overlapping highlighting - https://phabricator.wikimedia.org/T159811 [10:15:45] T159545: Unicode characters increase length of highlighting - https://phabricator.wikimedia.org/T159545 [12:09:27] !log wikispeech Deploy latest from Git master: f79170e (T159669) [12:09:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispeech/SAL [12:09:30] T159669: Empty elements created - https://phabricator.wikimedia.org/T159669 [13:26:35] hiya, [13:26:48] I've added a password class to puppet private in prod [13:26:56] how do I make it so there is a fake class with the same name in labs? [13:29:55] ottomata: there is a labs/private repo that mirrors with fake data [13:30:04] iirc [13:30:07] ahhhhh ya i see it [13:30:08] ok thank you! [13:30:17] if i just merge something ther,e it'll show up on labs puppemaster? [13:31:17] ottomata: yes it should [13:39:29] (03PS1) 10Ottomata: Add passwords::mysql::analytics_labsdb class to match prod [labs/private] - 10https://gerrit.wikimedia.org/r/345562 [13:39:52] (03CR) 10Ottomata: [V: 032 C: 032] Add passwords::mysql::analytics_labsdb class to match prod [labs/private] - 10https://gerrit.wikimedia.org/r/345562 (owner: 10Ottomata) [16:35:40] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:15:41] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [17:29:10] !log tools Disabled puppet across tools in prep for T136712 [17:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:29:15] T136712: Virtualenvs slow on tool labs NFS - https://phabricator.wikimedia.org/T136712 [17:29:19] chasemp: ^ [17:29:33] kk [17:30:52] !log tools Updating tools project hiera config to add role::labs::nfsclient::lookupcache: all via Horizon (T136712) [17:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:32:15] madhuvishy: I'll spot check tools-exec-1401 [17:32:23] when ready [17:32:28] chasemp: okay i'm doing exec-1402 [17:32:31] ready now [17:32:43] off I go [17:34:12] madhuvishy: https://phabricator.wikimedia.org/P5164 [17:34:19] definitely not silent [17:34:26] that's first run [17:34:29] second ongoing [17:34:36] interesting, 1402 went fine [17:34:56] chasemp: and everything looks good there [17:35:02] hm [17:35:21] madhuvishy: could be the diff between a clean remount and a failed one I imagine [17:35:26] based on work happening there [17:35:32] yeah [17:35:44] i didn't drain exec [17:35:59] me neither [17:36:15] madhuvishy: I /think/ if there is an in use conflict it will error out for a particular mount and add to /etc/fstab [17:36:18] tested on tools-worker1001 also came up clean [17:36:21] staging to be added on remount [17:36:43] ya i guess [17:36:55] let's depool - puppet - repool then? [17:37:12] not a terribly big deal other than it won't take effect now in those cases [17:37:19] right [17:37:42] I'm going to reboot 1401 just to be sure [17:37:48] okay [17:37:58] root@tools-exec-1401:~# cat /proc/mounts | grep home [17:37:59] nfs-tools-project.svc.eqiad.wmnet:/project/tools/home /mnt/nfs/labstore-secondary-tools-home nfs4 rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.68.17.202,lookupcache=none,local_lock=none,addr=10.64.37.18 0 0 [17:38:12] cool [17:38:21] !log tools reboot tools-exec-1401 [17:38:24] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:38:28] notice still none there so we'll see post reboot [17:38:35] right [17:39:23] madhuvishy: tools-workers /already/ had this set to all fyi [17:39:26] so it's a true noop there [17:39:28] via hiera [17:39:28] ah yes [17:40:09] nfs-tools-project.svc.eqiad.wmnet:/project/tools/home /mnt/nfs/labstore-secondary-tools-home nfs4 rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.68.17.202,local_lock=none,addr=10.64.37.18 0 0 [17:40:29] right absence=all [17:40:42] I believe lookupcache is all by default and those are only non default params... [17:40:47] yep [17:40:50] there is a way to tell itto puke every option applied [17:40:54] and I can't remember [17:41:42] madhuvishy: 1401 is back to scheduling jobs so.... [17:41:52] chasemp: okay [17:42:12] madhuvishy: do you want to do the depool and repool dance? [17:42:16] to make it apply all over? [17:42:50] madhuvishy: fyi I'm fairly sure the bastions already have the setting as well :) [17:42:59] chasemp: yeah the bastions are all done [17:43:37] madhuvishy: so, roll out to tools and give it an hour or so? [17:43:41] I really think it's fine tho [17:43:46] i'm leaning towards rolling out to the execs [17:43:57] okay [17:44:11] yep [17:44:13] makes sense [17:44:32] this will be very nice to have consistent so we can stop thinking about it [17:45:10] yeah [17:45:27] after this looks fine, we can switch the class param default [17:45:33] agreed [17:45:35] and remove individual hiera settings [17:45:43] that may be a pain to track down idk [17:45:45] but yep [17:46:00] it exists in too many levels now, prefix for tools-worker, instance level for tools-bastions [17:46:03] dont know what else [17:46:04] right [17:49:02] PROBLEM - Puppet run on tools-exec-1421 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:50:32] stopping shinken for a bit [17:55:24] chasemp: so status, puppet fails still on tools-exec-1421, i'm looking may reboot [17:55:29] succeeds everywhere else [17:55:35] https://www.irccloud.com/pastebin/X0aYyhou/ [17:55:45] these 5 nodes still have lookupcache=none [17:55:52] k [17:57:00] 1421 is fine [17:57:03] shinken is back [17:59:04] RECOVERY - Puppet run on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [18:02:21] rolling out to rest of tools [18:03:41] RECOVERY - Puppet run on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:20] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [18:04:20] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:07] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:31] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:42] chasemp: rolled out to rest of tools, looks good - there are still the 5 execs and 1 webgrid node with lookupcache:none [18:14:43] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:56] madhuvishy: want to reboot those? [18:15:01] chasemp: yeah [18:15:07] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [18:23:27] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [18:28:48] chasemp: gah was grepping for /home in /proc/mounts - lots more showing up on /project [18:29:07] clush -w @all "sudo cat /proc/mounts | grep lookupcache" [18:29:12] that would make sense but bummer [18:29:29] yeah [18:29:34] this includes grid-master [18:29:58] that's not a terribly big deal I think but I would hate to levae it for later surprise [18:30:17] hit the rest and we can reason on the gridmaster post? [18:30:57] chasemp: yup [19:06:13] PROBLEM - Host tools-webgrid-generic-1401 is DOWN: CRITICAL - Host Unreachable (10.68.18.51) [19:14:42] PROBLEM - Host tools-webgrid-generic-1403 is DOWN: CRITICAL - Host Unreachable (10.68.18.52) [19:15:10] hmmm [19:17:16] madhuvishy: uh? yeah may want to stop the auto rebooting [19:17:36] 1403 at least seems down [19:19:01] yeah everything else seems fine [19:19:05] just these two nodes [19:19:14] huh [19:19:53] i have like 8 nodes left to hit, but of 30 so far, these two went down and didn't come up [19:20:47] RECOVERY - Host tools-webgrid-generic-1401 is UP: PING OK - Packet loss = 0%, RTA = 1.15 ms [19:21:07] chasemp: did you do anything? ^ [19:21:26] * andrewbogott is here to investigate dead instances, if there are any [19:21:38] madhuvishy: I did not [19:21:42] okay [19:21:48] so it came up on it's own [19:21:52] hopefully 1403 will too [19:23:22] andrewbogott: don't think it's dead, and based on 1401 being back up - just taking a long time to reboot [19:23:30] ok [19:29:19] RECOVERY - Host tools-webgrid-generic-1403 is UP: PING OK - Packet loss = 0%, RTA = 2.46 ms [19:32:00] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 385 bytes in 0.002 second response time [19:34:01] chasemp: ^ [19:37:46] andrewbogott yuvipanda chasemp is anything proxy related happening [19:37:51] all my reboots are done [19:38:00] not as far as I know... [19:38:07] not sure why tools.wmflabs.org is 503ing [19:38:14] the tool didn't just die? [19:38:43] Here's another tool that's fine… https://tools.wmflabs.org/openstack-browser/project/testlabs [19:38:47] So the proxy isn't completely ruined [19:39:02] okay [19:39:11] restarted the admin tool [19:39:32] oops, so did I :) [19:39:33] not sure why it didn't get rescheduled when i probably depooled [19:39:39] looks fine now anyway [19:39:41] he he okay [19:39:45] anyway, it's back [19:39:47] yep [19:42:01] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.011 second response time [19:50:02] PROBLEM - Host tools-webgrid-generic-1402 is DOWN: CRITICAL - Host Unreachable (10.68.18.50) [19:57:03] madhuvishy: another reboot? (It's in state 'active' so maybe is on its way…) [19:57:14] andrewbogott: yes, i rebooted it [19:57:19] ok! [19:57:19] waiting for it to come up [19:57:26] * andrewbogott goes back to not worrying [20:05:03] RECOVERY - Host tools-webgrid-generic-1402 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [20:21:16] I missed that excitement, seems ok [20:29:32] !log tools stop grid-master temporarily & umount -fl project nfs & remount & start grid-master [20:29:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:31:27] madhuvishy: so master is back up but a job on teh grid seems to queue waiting for me [20:31:54] madhuvishy: lot of this [20:31:54] scheduling info: queue instance "continuous@tools-exec-1411.tools.eqiad.wmflabs" dropped because it is temporarily not available [20:32:02] queue instance "task@tools-exec-1420.tools.eqiad.wmflabs" dropped because it is temporarily not available [20:32:04] that is normal i think [20:32:40] I don't think this sitting in qw is normal [20:33:11] hm [20:33:17] still digging [20:33:21] chasemp: what is in qw? [20:33:28] i see things running [20:33:42] sure, I'm mainly looking to test launching new things [20:33:53] but I think it's an irregularity between qsub and jsub default params [20:34:39] aah [20:34:54] maybe from teh precise removal [20:34:58] https://www.irccloud.com/pastebin/KRgo5yHP/ [20:35:43] madhu seems like somewhere qsub has defaults set, maybe for the whole grid [20:35:49] and that includes '-l h_vmem=256M,release=precise' [20:35:57] wah wah [20:36:17] ouch [20:36:35] https://www.irccloud.com/pastebin/dOndAAwe/ [20:36:51] yeah that's off [20:37:03] hm [20:37:37] Hi, im wondering would the maint your doing be affecting wikibugs? Seems like it stopped working 2 hours ago [20:37:38] https://phabricator.wikimedia.org/T161856 [20:37:43] madhuvishy: a normal task scheduled but idk about a webservice, can you do some digging? fyi I have to step away for awhile in an hour [20:38:06] paladox: can you restart it? [20:38:24] madhuvishy hi, nope, i have no access. [20:38:30] !log wikibugs tools.wikibugs@tools-bastion-03:~$ webservice restart [20:38:31] chasemp: Unknown project "wikibugs" [20:38:31] chasemp: Did you mean to say "tools.wikibugs" instead? [20:38:37] gah [20:38:41] !log tools.wikibugs tools.wikibugs@tools-bastion-03:~$ webservice restart [20:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:38:56] isn't there two components to this? [20:38:57] 499433 0.53870 wb2-irc tools.wikibu Rr 03/30/2017 18:41:34 continuous@tools-exec-1404.eqi 1 [20:39:34] Maybe. [20:39:34] https://www.mediawiki.org/wiki/Wikibugs [20:40:04] it seems it's missing this one [20:40:05] 4602105 0.39032 wb2-phab tools.wikibu r 03/24/2016 04:29:15 continuous@tools-exec-1401.eqi 1 [20:40:38] + i think the one that connects to gerrit too. Though that dosen't seem to be documented and i am unsure if there is a component for it connecting to gerrit. [20:42:28] madhuvishy: are webservices working? [20:42:36] chasemp: there's some part running on kubernetes [20:42:40] re: wikibugs [20:42:42] yes [20:43:01] ok I just ran the two jsub commands here https://www.mediawiki.org/wiki/Wikibugs [20:43:10] !log tools.wikibugs manual jsub commands per https://www.mediawiki.org/wiki/Wikibugs [20:43:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL [20:43:40] ya i see two webservices running [20:43:46] this has a lot of stuff in qw [20:43:46] 3173541 0.30012 veblenbot- tools.veblen qw 03/30/2017 19:30:03 [20:43:54] there's also [20:43:56] https://www.irccloud.com/pastebin/Zn7hnRV2/ [20:44:02] idonteven [20:44:18] qstat -u tools.veblenbot [20:44:33] there's so many jobs!! [20:44:47] 300+ [20:45:24] tools.veblenbot@tools-bastion-03:~$ qstat | grep -c qw [20:45:24] 772 [20:45:35] that is something weird [20:45:38] yes [20:45:49] repeated scheduling of jobs on cron may be [20:46:29] It seems wikibugs is still not reporting. did a rebase on https://gerrit.wikimedia.org/r/#/c/193434/ which should have reported in #wikimedia-dev [20:46:32] tools.veblenbot:*:51108:cbm [20:46:39] also thansk for running those ecommands [20:47:10] paladox: wikibugs => tools.wikibugs:*:51894:twentyafterfour,danny_b,legoktm,valhallasw,reedy,krenair [20:47:16] oh [20:47:18] thanks [20:48:07] root@tools-bastion-03:~# qstat -u '*' | grep qw | grep veblen -c [20:48:08] 772 [20:48:08] root@tools-bastion-03:~# qstat -u '*' | grep qw | grep -v veblen -c [20:48:10] 14 [20:48:34] 32 * * * * jlocal /data/project/veblenbot/veblen/VeblenBot/categories/qsub-category.sh [20:48:34] 30 * * * * jlocal /data/project/veblenbot/veblen/VeblenBot/size_limit/qsub-size.sh [20:48:43] yeah [20:48:44] 10Wikibugs: WikiBugs stopped reporting 2 hours, 10 minutes, 51 seconds - https://phabricator.wikimedia.org/T161856#3145265 (10Paladox) [20:48:56] wikibugs works with phabricator [20:49:56] chasemp: has been going on for days [20:49:58] 10Wikibugs: WikiBugs stopped reporting 2 hours, 10 minutes, 51 seconds - https://phabricator.wikimedia.org/T161856#3145215 (10Paladox) [20:50:32] chasemp: should we just individually kill the jobs and then kill the cron [20:50:38] madhuvishy: I'm going to comment out these crons, make a note, and we cna let hte user know [20:50:41] :) [20:52:10] !log tools.veblenbot del qw jobs (all 770+) of them as they are cluttering up the grid [20:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.veblenbot/SAL [20:52:18] chasemp could you run the same command you did but replace wb2-phab with wb2-grrrrit please? [20:52:59] 10Wikibugs: WikiBugs stopped reporting 2 hours, 12 minutes, 37 seconds - https://phabricator.wikimedia.org/T161856#3145277 (10demon) [20:53:01] chasemp: cool :) [20:53:26] i'll look into submitting jobs with qsub [20:55:10] /usr/bin/jsub -N wb2-grrrrit -l release=trusty -mem 1G -once -v PYTHONIOENCODING="utf8:backslashreplace" -continuous /data/project/wikibugs/py-wikibugs2/bin/python /data/project/wikibugs/wikibugs2/grrrrit.py --logfile /data/project/wikibugs/grrrrit.log [20:58:03] paladox: done [20:58:18] madhuvishy thanks :) [20:58:19] works [21:00:23] 10Wikibugs: WikiBugs stopped reporting 2 hours, 12 minutes, 37 seconds - https://phabricator.wikimedia.org/T161856#3145290 (10Paladox) 05Open>03Resolved a:03Paladox Works now :) [21:00:27] 10Wikibugs: WikiBugs stopped reporting 2 hours, 12 minutes, 37 seconds - https://phabricator.wikimedia.org/T161856#3145293 (10Paladox) a:05Paladox>03None [21:04:26] 06Labs, 10wikitech.wikimedia.org: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#3145305 (10Andrew) [21:04:41] 06Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Stop using OpenStackManager, make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161553#3145319 (10Andrew) [21:04:43] 06Labs, 10wikitech.wikimedia.org: Make Wikitech an SUL wiki - https://phabricator.wikimedia.org/T161859#3145318 (10Andrew) [21:04:54] 06Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Remove OpenStackManager from Wikitech - https://phabricator.wikimedia.org/T161553#3135041 (10Andrew) [21:08:23] 06Labs, 10Tool-Labs: Virtualenvs slow on tool labs NFS - https://phabricator.wikimedia.org/T136712#3145329 (10madhuvishy) I've now rolled out lookupcache:all to all of tools.