[00:00:58] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Czar was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=733712 edit summary: [05:06:04] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Dingruogu was created, changed by Dingruogu link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Dingruogu edit summary: Created page with "{{Tools Access Request |Justification=copy-paste translation of wikipedia articles |Completed=false |User Name=Dingruogu }}" [07:25:29] 06Labs, 10Deployment-Systems, 10wikitech.wikimedia.org: /etc/mediawiki/WikitechPrivateSettings.php not found on tin - https://phabricator.wikimedia.org/T139917#2446552 (10greg) [08:57:25] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2446675 (10jcrespo) 05Resolved>03Open [08:57:31] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2437677 (10jcrespo) This is still ongoing. [09:04:37] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2446683 (10Bugreporter) Now only one tab is running. [09:07:48] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2446687 (10Bugreporter) As there're issue even if there're only one tab, code must be modified. [09:19:22] Hey everyone, any tool labs admin here??? [09:19:42] I want some lab intances to be deleted please :) [09:34:09] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2446737 (10Bugreporter) 05Open>03stalled There're nothing I can do other than stop creating items (and probably other semi-automatic work) until http... [10:48:34] 06Labs, 10Tool-Labs, 10DBA, 10Wikidata: Petscan is being used with excesive parallelism by a user on Wikidata - https://phabricator.wikimedia.org/T139618#2446838 (10Magnus) 05stalled>03Resolved Added option for bot-account users to set concurrent threads (1-5). https://bitbucket.org/magnusmanske/petsca... [12:54:42] zhuyifei1999_ (venv)tools.video2commons-test@interact:~$ time pip -V [12:54:43] pip 1.5.6 from /data/project/video2commons-test/www/python/venv/local/lib/python2.7/site-packages (python 2.7) [12:54:44] real 0m0.900s [12:54:45] user 0m0.516s [12:54:45] sys 0m0.104s [12:54:46] on k8s :D [12:54:52] am moving video2commons-test over now, let's see [12:55:52] ok [12:55:58] zhuyifei1999_ https://tools.wmflabs.org/video2commons-test/ works now! \o/ [12:56:03] and is on k8s [12:56:13] nice [12:57:20] zhuyifei1999_ in that tool's home folder, you'll find an interact.bash. Running that should give you a shell on k8s that's got the faster NFS stuff going [12:57:36] zhuyifei1999_ can you confirm that video2commons-test actually does work? [12:58:27] wait [12:58:40] ok [12:59:48] yeah nothing weird so far [12:59:59] tried creating a tack then aborting it [13:00:01] thx [13:01:08] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Make webservice restart more efficient on kubernetes backend - https://phabricator.wikimedia.org/T139932#2447106 (10yuvipanda) [13:01:17] zhuyifei1999_ ok! [13:01:33] zhuyifei1999_ can you try the 'interact.bash' script as well? [13:01:47] that's the guts of what'll become 'webservice shell' [13:02:54] $ ./interact.bash [13:02:54] Waiting for pod video2commons-test/interact to be running, status is Pending, pod ready: false [13:02:55] Waiting for pod video2commons-test/interact to be running, status is Pending, pod ready: false [13:02:55] Waiting for pod video2commons-test/interact to be running, status is Pending, pod ready: false [13:03:11] ah nvm [13:03:37] awesome :) [13:03:41] it's a very bare-bones setup - only thing you can really do there is to setup a virtualenv [13:04:06] I'm wondering if that's ok or if I should setup an equivalent of 'dev_environ' - editors and stuff [13:04:53] hmm I thought k8s containers down know about the existence of nfs [13:05:07] but tools.video2commons-test@interact:~$ mount | grep nfs [13:05:07] labstore.svc.eqiad.wmnet:/project/tools/project/video2commons-test on /data/project/video2commons-test type nfs4 (rw,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.68.19.251,local_lock=none,addr=10.64.37.10) [13:05:09] they do [13:05:26] this is the webservice compatibility layer, so it has to depend on NFS [13:05:31] ok [13:05:37] they just don't know about gridengine [13:08:53] valhallasw: ^ one solution to the virtualenv slow problem [13:15:27] 06Labs, 10Tool-Labs, 13Patch-For-Review: Setup running uwsgi webservices on k8s - https://phabricator.wikimedia.org/T139783#2447143 (10yuvipanda) This works, but needs a nice way to create virtualenvs that target debian jessie. I'll file a bug for the 'webservice shell' command which formalizes the kubectl b... [13:25:19] YuviPanda: is the venv still on nfs here, but with different mount options? [13:26:01] I am trying to host a Flask app on Labs [13:26:08] I followed the instructions in https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Python_2_.28uwsgi.29 [13:26:16] What exactly do they mean by "Place your application in $HOME/www/python/src/app.py" [13:26:21] Currently say my app folder where all my code resides is called 'codebase'. 'codebase' contains app.py and a templates and static folder [13:26:26] should I place the entire 'codebase' folder in $HOME/www/python/src/?? [13:26:43] curry: no, your app.py should be in $HOME/www/python/src [13:26:57] valhallasw: yup, I moved all the jessie (k8s hosts) to not have lookupcache=none last week [13:27:22] YuviPanda: mmmm. [13:27:43] YuviPanda: could that cause caching issues when a pod is restarted? [13:29:14] 10Labs-project-wikistats: rewrite/fix versioncheck for wikistats table - https://phabricator.wikimedia.org/T38292#2447178 (10Danny_B) [13:29:14] if the premise was that the reason we turned on lookupcache=none was because of a bug in trusty's nfs implementation, I'm hopefully it'll have gone away in jessie. [13:29:37] My thinking on this is mostly 'let us do it and see what happens' and if problems arise use a more precise hammer than making virutalenv unusable everywhere... [13:30:40] Ah. Yes, that's fair. [13:49:26] 06Labs, 10Tool-Labs, 13Patch-For-Review: Figure out a way to keep MerlBot running when the HTTP POST loophole is closed - https://phabricator.wikimedia.org/T121279#2447206 (10BBlack) The general insecure access cutoff date is coming up tomorrow. We had exempted Merlbot from the 10% random failure rate a mon... [14:10:42] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447267 (10chasemp) [14:10:49] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447280 (10chasemp) p:05Triage>03High [14:13:13] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447290 (10chasemp) [14:13:29] : What do I do with the rest of the files related to the app? [14:17:27] curry (IRC): place it somewhere your app can find it? [14:18:04] curry (IRC): uwsgi will look for a callable 'app' in app.py [14:41:44] 06Labs, 10Labs-Infrastructure, 06Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2447400 (10MoritzMuehlenhoff) [15:05:30] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447554 (10Papaul) @chasemp yes i have 3*2TB SAS disks in spare [15:18:24] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447615 (10Papaul) a:05Papaul>03chasemp Disk replacement complete. [15:31:49] 06Labs, 10Labs-Kubernetes, 10Tool-Labs: Add ability to get a shell to webservice - https://phabricator.wikimedia.org/T139952#2447684 (10yuvipanda) [15:51:30] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Dingruogu was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=735044 edit summary: [16:12:32] 06Labs, 10Labs-Infrastructure: Strategies to avoid OOM on labvirt hosts - https://phabricator.wikimedia.org/T139954#2447863 (10Andrew) [16:37:24] 06Labs, 06Operations: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447968 (10chasemp) this array is resyncing now, I used my notes from https://phabricator.wikimedia.org/T127076#2067539. a few pointers here. Find present adapters: megacli -CfgDsply -a0 | grep Adapter this... [17:14:57] 06Labs, 06Operations: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2448421 (10Andrew) [17:19:40] 06Labs, 06Operations: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448488 (10Andrew) [17:20:12] 06Labs, 06Operations: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2448501 (10Andrew) [17:20:14] 06Labs, 06Operations: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448500 (10Andrew) [18:21:27] matanya: matanya, zhuyifei1999_, if you are around can we discuss https://phabricator.wikimedia.org/T139802 ? [18:21:55] * zhuyifei1999_ is here [18:22:32] I think I a m1.gigantic for encoding03 is okay [18:22:35] So, I have three goals here: 1) Get you some faster performance, 2) move your rendering instances to a host with more disk space, 3) find a test case to try out a couple of new servers [18:23:35] zhuyifei1999_: I'm wondering if it makes sense to /move/ your rendering instances, or if it would be possible to just drain, delete, and rebuild them on the new hosts. [18:23:47] andrewbogott: so you you need to rebuild, feel free to cut the disk space by half [18:23:51] hmm [18:23:56] * zhuyifei1999_ checks [18:23:57] (I don't understand the project enough to know what's reasonable and what will cause disruption) [18:26:32] andrewbogott: looks quite unused today http://tools.wmflabs.org/nagf/?project=video [18:27:07] ok… and rebuilding a fresh node is easy? [18:27:16] Is it something you'd be available to do in 10-20 minutes? [18:27:29] drained 03, checked server-side-upload queue, all okay [18:27:45] um without puppet life is hard [18:28:10] zhuyifei1999_: Hi, thanks for your help on the ticket :) [18:28:21] which ticket? [18:28:27] zhuyifei1999_: I don't know but the uwsgi service is getting stable by the day :) [18:28:53] zhuyifei1999_: https://phabricator.wikimedia.org/T139020 [18:29:23] andrewbogott: checked the rest of stuffs in /srv/, should be good [18:29:33] d3r1ck: np [18:29:52] * zhuyifei1999_ looks to 02 [18:29:57] zhuyifei1999_: hey, you TL admin??? [18:30:08] I need some lab instances to be deleted [18:30:16] zhuyifei1999_: can you do that? [18:30:20] d3r1ck: no, but old user [18:30:40] zhuyifei1999_: can you point me to one?? I need the favor [18:30:47] you can delete them if you're projectadmin of the owning instance [18:31:10] I don't think so, I would have deleted them myself if that was the case [18:31:25] Or maybe there is a way that I don't know??? [18:31:53] which instance? [18:32:25] andrewbogott: 02 drained, /srv should have nothing useful left [18:32:51] 'gsociftttdev', 'gsoc-ifttt-dev', and 'gsoc-dev'. Those are them zhuyifei1999_ [18:33:33] https://wikitech.wikimedia.org/w/index.php?search=gsoc-ifttt-dev&title=Special:Search&go=Go&searchToken=62vegzhiqd2d71u3sno7b2u54 no results [18:33:44] do you mean tool or instance? [18:34:05] zhuyifei1999_: sorry, tool. [18:34:29] zhuyifei1999_: I have been calling it instance althe time [18:34:43] zhuyifei1999_: thanks! I'm sidetracked now, shouldn't take all that long :/ [18:35:06] d3r1ck: https://phabricator.wikimedia.org/T133777 it's never done before [18:36:10] zhuyifei1999_: Ok [18:36:16] thanks [18:36:50] andrewbogott: encoding01 has 2 uploaded files belonging to matanya (but uploaded by yann) in /srv/workspace [18:38:07] nothing else seems useful, but it's the only not-drained instance right now [18:38:44] d3r1ck: feel free to file a subtask there so they'll know about it and don't forget [18:39:28] 06Labs, 10Labs-Kubernetes, 10Tool-Labs, 07Tracking: Initial Deployment of Kubernetes to Tool Labs (Tracking) - https://phabricator.wikimedia.org/T111885#2449103 (10Danny_B) [18:49:41] 06Labs, 10Labs-Other-Projects: video project: move rendering instances to SSD servers - https://phabricator.wikimedia.org/T139802#2449192 (10zhuyifei1999) encoding02 & 03 are drained and I don't see anything useful left. encoding01 has two looks-like-already-uploaded-to-commons file in `/srv/workspace` belon... [18:50:19] 06Labs, 10Labs-Infrastructure: Upgrade OpenStack to the Folsom release - https://phabricator.wikimedia.org/T48817#2449196 (10Danny_B) [18:54:53] 06Labs, 10Deployment-Systems, 10wikitech.wikimedia.org: /etc/mediawiki/WikitechPrivateSettings.php not found on tin - https://phabricator.wikimedia.org/T139917#2446552 (10Krenair) It's not supposed to exist there. I suppose we could make a dummy version of the file to put there... [19:00:00] I created a new tool, but it seems that I did not get write permission to create files in it: https://phabricator.wikimedia.org/P3368 [19:00:24] how can I fix this? [19:02:34] dalba can you open a bug? [19:02:35] I'll fix it manually in the meantime [19:05:51] andrewbogott: If I create some puppet code and run them with puppet apply, would it break future puppet agent runs? [19:06:08] zhuyifei1999_: hard to predict, it all depends on the specifics. [19:06:31] only if it conflicts w/ already managed resources [19:06:41] zhuyifei1999_: so, you gave a lot of details about those instances which I didn't quite follow. I think what I need to know is — is it ok if I delete them and then recreate fresh ones? [19:07:14] andrewbogott: I think matanya should give an okay on it [19:07:16] yuvipanda, thanks! sure, i'll open a bug later. [19:07:33] zhuyifei1999_: for all three? (That's fine, just clarifying) [19:07:41] leave one working instance [19:07:59] (so that's 01) [19:08:20] ok, so I can delete/rebuild 02 and 03 now? [19:08:44] and I think 01 can be rebuilt after 02 & 03 are rebuilt [19:08:44] or was that the 'ask matanya first' bit? Sorry if I'm being slow [19:09:18] yeah it's included in that bit [19:12:48] (I'll try to build some puppet code with one of the instances and apply it to the other, after the instances are rebuilt) [19:13:30] zhuyifei1999_: I updated https://phabricator.wikimedia.org/T139802 accordingly, feel free to update if I've misunderstood. [19:13:54] matanya: ^ same to you :) [19:14:57] andrewbogott: 01 needs to be drained just in case before #3 [19:15:09] true [19:16:02] updated [19:17:02] k [19:24:25] ah! [19:24:31] what do I do if labs host dns is all busted again? [19:24:42] my host is resolving to the wrong IP address. [19:24:59] and, the IP address reverse resolves to like 10 different hosts [19:25:03] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Epic: Tools web interface for tool authors (Brainstorming ticket) - https://phabricator.wikimedia.org/T128158#2065676 (10bd808) [19:27:18] hm, maybe its not a problem, actually the IP is correct [19:27:25] just weird that it reverse resolves to so many hosts [19:27:28] ottomata: contact andrewbogott I imagine, I"m not sure how common this is or what we have been doing to fix it [19:27:33] but I think he's very busy today [19:27:36] a task? [19:27:50] yuvipanda, can you give the ircredirector tool a copy of that interact.bash script please? [19:27:51] yeah, this happened like a month ago and i made a task, i think it was resolved for that one time [19:27:59] buuut, maybe its not a actually problem.... [19:28:39] tom29739 I actually have the 'shell' for webservice working, and can probably roll it out in ~30mins... :) [19:29:08] Thanks. :) [19:29:26] tom29739 I'll ping you when I have it [19:41:26] ottomata: there's a dns entry leak that appears no and then. I continue to have no idea when/why it happens. If you have something that's immediately broken/keeping you from working then file a bug for me, otherwise just tack your data on the end of https://phabricator.wikimedia.org/T115194 [19:44:57] 06Labs, 10Labs-Infrastructure, 06Operations: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2449556 (10Ottomata) Ok, except @Andrew just told me to tack these on... :) ``` otto@deployment-kafka03:~$ host 10.68.16.138 138.16.68.10.in-addr.arpa domain name... [19:45:07] andrewbogott: i think my stuff is working, so i guess its fine [19:45:15] added mine anyway [19:45:27] ok! [19:52:33] 06Labs, 10Tool-Labs: No permission after create a new tool - https://phabricator.wikimedia.org/T140004#2449622 (10Dalba) [20:04:16] 06Labs, 10Labs-Infrastructure, 06Operations, 07IPv6: Enable ipv6 on labs - https://phabricator.wikimedia.org/T37947#399081 (10FastLizard4) Bumping this task. We could use IPv6 connectivity for the account-creation-assistance project. Since IPv6 addresses are starting to show up on Wikipedia now, it becom... [20:16:26] tom29739 still around? [20:18:36] Yep. [20:19:15] yuvipanda, is it working now? [20:19:21] tom29739 I've setup a patched webservice for you. You can now run /tmp/tools/bin/webservice shell as ircredirector and get a shell. try? [20:23:31] yuvipanda, it errors out because my webservice is stopped, but I stopped the webservice because it kept erroring out because of the lack of a virtualenv. [20:23:51] tom29739 that shouldn't happen - what error are you getting? [20:24:06] webservice shell only supported for kubernetes backend [20:24:28] The command I stopped it with was webservice python2 stop --backend=kubernetes [20:24:31] ah [20:26:14] It works when I start the webservice, but it gives me an SNIMissingWarning and an InsecurePlatformWarning. [20:27:08] tom29739 yup, that'll go away just now - I deployed the 'webservice' command so you can just use regular webservice now [20:27:38] tom29739 so you do it and then setup your virtualenv (delete the previous one, create a new one) and then your webservice can run [20:30:46] zhuyifei1999_ ^ webservice shell is available now, do you want to move video2commons? [20:31:26] valhallasw: am gonna move contact now [20:33:09] yuvipanda: do you have a how-to guide? [20:33:25] not yet actually [20:33:31] I'll have one by tomorrow [20:33:33] (it's like 10pm here) [20:33:42] ok [20:34:21] andrewbogott: impressive how fast new servers can be setup thanks to puppet :) [20:34:31] It's 4am here and I'm still awake when I'm not supposed to ;P [20:36:35] * zhuyifei1999_ tries hard to sleep [20:37:20] 4am you have two hours till day light [20:37:29] It's 9:37 pm here. [20:37:40] And here. [20:38:02] Oh your on bst. [20:39:12] tom28739 ^^ [20:39:37] Woops spelled name wrong it's [20:39:50] tom29739 ^^ [20:48:33] 06Labs, 10Labs-Infrastructure, 06Operations: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2449987 (10hashar) @Ottomata Fair call sorry :-) Nodepool spawns instances with an incremental ID to give some indication about the progress: | Time (UTC) | ID |-... [20:49:16] yuvipanda, webservice shell acts strangely with autocomplete and then when using backspace. It spreads the autocomplete over 2 lines when it shouldn't and then when backspacing it goes up 2 lines. [20:49:38] tom29739 right, I think it's because the width of the terminal is confused. [20:51:08] tom29739 try stty rows 50 cols 150 [20:52:21] yuvipanda, it worked [20:52:29] \o/ cool [20:52:47] tom29739 https://github.com/kubernetes/kubernetes/issues/13585 is upstream issue for it [20:54:59] valhallasw: so contact is a pita because it's using ldapsupportlib.... [20:55:24] and my feelings towards it are summarized in https://phabricator.wikimedia.org/T114063 [20:55:28] ...which isn't pip installable [20:55:31] indeed [20:55:37] so I just copied it to src :D [20:55:43] fair enough [20:57:13] valhallasw: and I can't install ldap inside the venv because it requires some native library... [20:57:13] that isn't in the container [20:58:15] dumdumdum [20:58:17] right [20:58:31] and there's no apt-get infra yet [20:58:41] so it's actually a very difficult one [20:59:02] (although you can probably make the venv on a jessie bastion) [20:59:32] unless it's a dynamic lib, in which case... meh, never mind. [21:01:34] valhallasw: right [21:06:27] valhallasw: I think the policy should be that we'll add -dev packages but nothing else? [21:06:58] hashar: getting them to register with nova is easy, trusting them to not collapse a week later is harder :/ [21:07:12] every kernel manages to have a new virtualization bug [21:08:43] valhallasw: I don't think we'll ever allow users to apt-get themselves [21:09:32] andrewbogott: looked for me ? [21:10:25] YuviPanda: I thought we were going to do arbitrary pods at some point? [21:10:44] valhallasw: that should come with the PaaS, not with this. [21:10:50] Yes, fair enough [21:10:55] so err, I guess we'll never allow users to apt-get with the webservice stuff [21:11:07] since container building is a big pita to do securely [21:17:15] 06Labs, 10Labs-Other-Projects: video project: move rendering instances to SSD servers - https://phabricator.wikimedia.org/T139802#2450156 (10Matanya) It makes sense, but a few points: 1. I don't need the files in /srv, you can safely ignore them. 2. SSD would give us better performance but probably a GPU wou... [21:17:30] matanya: yes! regarding https://phabricator.wikimedia.org/T139802 [21:17:38] commented there andrewbogott [21:18:11] matanya: looks good. Is now an OK time for me to kill those instances? [21:18:26] yes, the pooling is done by me anyway [21:18:42] andrewbogott: will i loose the dns associations and proxies ? [21:18:55] and security groups ? [21:19:10] also — part of this process is that you'll be an early adopter on this new hardware and kernel. I don't expect issues, but — do you mind being a test case? [21:19:58] matanya: I think we can get dns/proxy/security groups back the way they were easily enough [21:20:34] andrewbogott: it will be my 3rd time as a beta tester on labs :) [21:20:50] looks like both instances only use default security groups, so that's simple... [21:20:57] or second time beta, once was alpha :D [21:21:07] no, they have redis [21:21:09] the proxies… I'm not sure if they're set by IP or by name. If by IP you may need to recreate them [21:21:16] ok [21:21:32] (the proxies are set by IP now) [21:21:46] matanya: ah, only doing the encoding* instances [21:21:49] which don't use redis [21:21:55] ah, ok [21:22:12] !log video deleting instances encoding02 and encoding03, soon to recreate [21:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL, dummy [21:22:43] would be nice if you can preserve the ip [21:22:52] (i know, dhcp...) [21:23:05] would be nice, indeed, but would require heroic measures :) [21:23:33] andrewbogott: yeah I can imagine them crashing mysteriously :] [21:28:07] valhallasw: now your requirements.txt is incomplete :D requires bs apparently [21:28:32] it interfaces with phabricator, so probably ;D [21:28:43] ok :D [21:28:59] valhallasw: I just added the -dev packages just now, I'll do them in a more orderly fashion soon [21:29:14] valhallasw: I guess I should add git and stuff in there as well [21:29:56] nod [21:30:05] any chance we can just use the original puppet manifest for that? [21:30:19] install puppet, puppet apply dev_environ.pp? [21:30:36] valhallasw: you lost me at 'install puppet' [21:30:40] :P [21:34:36] matanya: ok, encoding02 and encoding03 are back and ready for you to configure [21:34:45] I moved the proxies already but you might want to check my work [21:34:56] (And, no GPUs, alas :) ) [21:35:16] and of course, expect host-key warnings [21:37:22] 06Labs, 10Labs-Other-Projects: video project: move rendering instances to SSD servers - https://phabricator.wikimedia.org/T139802#2450234 (10Andrew) encoding02 and encoding03 are now rebuilt on labvirt1012 and labvirt1013, respectively. We're no longer using the m1.moregigantic flavor so I'll remove that for... [21:43:37] 06Labs, 10wikitech.wikimedia.org: Labs front-page statistics are very wrong - https://phabricator.wikimedia.org/T139773#2450272 (10Andrew) At least some of these are caused by a race in page maintenance. I can see cases where the page is cleared and the instance marked deleted, and right after the page is ref... [21:52:05] 06Labs, 06Operations, 13Patch-For-Review: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2450322 (10Andrew) [21:52:07] 06Labs, 06Operations, 13Patch-For-Review: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2450320 (10Andrew) 05Open>03Resolved Patch reverted, keys shredded and removed, script removed. [22:11:20] 06Labs, 06Operations, 13Patch-For-Review: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448488 (10RobH) All of the below is my understanding of things, and it could be wrong! Our historical install process is we used to install a root password via the installer... [22:18:04] heya labs opsen, can anyone tell me if labvirt1012 is under use yet or if i can reboot it to fix? https://phabricator.wikimedia.org/T138509#2450414 [22:18:23] chasemp or yuvipanda maybe (i pinged andrew already ;) [22:19:05] it's not officialy in use but I think a few things may have been moved to it hold on [22:19:05] i doubt its under use just yet, but perhaps andrew was very fast in pushing stuff into production due to labs being at capacity, dunno and i dont wanna assume =] [22:19:10] cool [22:19:22] figured better to fix now before more stuff gets moved [22:19:58] so yeah the vid encoding stuff was moved there [22:20:00] | dd13a2cf-d4c7-48d5-86b6-7990256a19ff | encoding02 | video | ACTIVE | - | Running | pub [22:20:05] I didn't want to just hope someone saw the task, I know you guys are busy. ahh. [22:20:12] zhuyifei1999_: can we cause a small downtiem for encoding02? [22:20:40] basically less than 10 minutes. i'll just reboot into bios, flip some bits, and bring it back up. [22:21:58] robh: it's just the one VM there but I don' thave context on how harmful to downtime [22:22:33] zhuyifei1999_: or matanya would know iiuc [22:22:48] oh, i can see its not a cpu issue in the system without reboot [22:22:54] so its indeed just ht not being enabled. [22:24:20] that's not per cpu is it? [22:24:23] just a switch? [22:24:37] usually its a switch in bios for all cpus on the host [22:24:57] much like the virtualization flag [22:25:09] though HT is default ON for everything these days [22:25:18] so this was likely just missed by accident. [22:25:24] robh: I guess file a task eh they were working on a bunch of vid project stuff this morning and I don't want ot pull the rug out atm [22:26:00] its on the install task, its reopened. but can make a new sub task if needed [22:26:05] nah [22:26:27] just wanted to make sure you guys were aware so more stuff is moved, complicating downtime, etc =] [22:27:18] robh: are 13 and 14 ok? [22:28:25] they show twice the core count [22:28:26] so yep [22:28:45] i can see if the CPU is detected in the ilom [22:28:46] great we'll get that rebooted then probably tomorrow morn when folks are around [22:28:49] and see how many cores via OS [22:29:03] andrew should be comfortable booting into bios, he did it to set these up [22:29:17] well, he booted into bios for the hwraid stuff not cpu [22:29:21] but its just another bios screen =] [22:30:31] yeah no worries I have a general idea [22:42:17] 06Labs: Access needed to mwui.wmflabs.org - https://phabricator.wikimedia.org/T123316#2450555 (10Mattflaschen-WMF) 05Open>03Resolved a:03Halfak [22:52:57] 06Labs, 15User-bd808: Staging environment for Reading Web - https://phabricator.wikimedia.org/T104994#2450595 (10Danny_B) [23:24:36] !log deployment-prep Unmounted /data/project (NFS) on all active hosts (mediawiki0[1-3], jobrunner01, tmh01), leaving just deployment-upload (shutoff, to schedule for deletion soon) - T64835 [23:24:36] Please !log in #wikimedia-releng for beta cluster SAL [23:24:37] T64835: Setup a Swift cluster on beta-cluster to match production - https://phabricator.wikimedia.org/T64835 [23:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [23:28:24] Krenair: \o/ that's awesome [23:29:30] bd808, I also have a patch to merge the two filebackend mw-conf files [23:30:27] does your new setup get rid of the funky thumbs handler script that beta clsuter had too? [23:30:39] yes, we use the production setup now [23:30:53] we have a custom python script that swift uses to do a bit of rewriting [23:31:04] and it hits the swift backends instead of NFS [23:31:18] I still need to figure out the cirrus/temp URL keys, get the puppet change merged (right now it's cherry-picked), write and send an announcement, then schedule deletion of deployment-upload and its hiera page on wikitech [23:31:58] right now deployment-upload is shutoff, and is the only instance which still has the NFS mount. I may need to recover a file that looks like it went missing during the migration from there still [23:37:05] 06Labs, 10Beta-Cluster-Infrastructure: Completely remove Beta Cluster dependency on NFS - https://phabricator.wikimedia.org/T102953#2450818 (10AlexMonk-WMF) a:03AlexMonk-WMF I'm far enough into {T64835} now that I think we can almost call this done. [23:38:20] ugh, there may be a couple of other things I need to rescue out of the NFS mount [23:50:51] I seem to be on an alert list for this, even though I probably shouldn't be: "puppet failure on deployment-ms-fe01.deployment-prep.eqiad" [23:56:39] the alert list is "any project admin" afaik [23:57:03] robla, do you have project adminship of deployment-prep? [23:57:43] I might, but it would be from several years ago, and I haven't touched it in the past year or two [23:57:47] but yes the ms-fe01 one earlier was because I left puppet disabled for a while, it was being worked on. it's back to normal now and the change is puppetised in a cherry-picked commit on deployment-puppetmaster [23:58:01] yeah it'll still count you because you still have the rights [23:59:09] I would be happy to be removed from deployment-prep; I can request it later if I need it