[00:58:10] !log mobile created instance mobile-puppetmaster and set it up as a standalone puppetmaster with itself as the client. (https://wikitech.wikimedia.org/wiki/Standalone_puppetmaster) [00:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Mobile/SAL [06:43:20] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3069402 (10Marostegui) Hi, Indeed, the new labs servers (running ROW based replication are fine) and that drift is probably coming from mult... [06:50:11] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:30:11] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:37:18] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3070071 (10bd808) > Since you're continuously curious about Labs frustrations, the replicas are definitely top-five. Noted. We are really hop... [08:06:34] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3070099 (10Marostegui) >>! In T159493#3070071, @bd808 wrote: >> Since you're continuously curious about Labs frustrations, the replicas are de... [08:37:41] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3070129 (10jcrespo) @MZMcBride , why wait when you can go NOW and test the new servers? As many told you, enwiki is there now and fixed. :-) [08:59:12] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3070156 (10jcrespo) 05Open>03Resolved a:03jcrespo ``` root@labsdb1001[enwiki_p]> select ug_group from user_groups join user on user_id =... [09:11:04] 10Wikibugs: Separate wikibugs username, into grrrrit and wikibugs - https://phabricator.wikimedia.org/T153732#3070161 (10Nemo_bis) [09:43:54] tgr: thanks for the replies at Meta :) [09:45:46] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070184 (10jcrespo) [09:45:49] 06Labs, 10DBA: Discrepancy between labsdb replicas of arwiki_p.user_groups - https://phabricator.wikimedia.org/T133469#3070182 (10jcrespo) 05Open>03Invalid This may have been true at some point, but I do not see this difference with the given query- labsdb1001 and labsdb1003 are identical, and all of them... [09:49:12] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3070201 (10jcrespo) [09:49:19] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070202 (10jcrespo) [09:49:22] 06Labs, 10Tool-Labs, 10DBA, 10wikitech.wikimedia.org: labswiki isn't replicated on Labs - https://phabricator.wikimedia.org/T89548#3070199 (10jcrespo) 05Open>03declined I do not think this is going to happen soon due to how special labswiki is- feel free to reopen (or open a new one with a feature requ... [09:52:02] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 07Tracking: Missing Toolserver features in Tools (tracking) - https://phabricator.wikimedia.org/T60791#3070209 (10jcrespo) [09:52:07] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070210 (10jcrespo) [09:52:12] 06Labs, 10DBA: Make watchlist table available as curated foo_p.watchlist_count on labsdb - https://phabricator.wikimedia.org/T59617#3070203 (10jcrespo) 05Open>03stalled This is stalled- this is definitely going to happen, but we cannot find the time to do it. The scripts are done, we just need to puppetize... [09:58:45] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070219 (10jcrespo) [09:58:47] 06Labs: Replication issue with Fa WP replica - https://phabricator.wikimedia.org/T108032#3070216 (10jcrespo) 05Open>03Resolved a:03jcrespo This was indeed incorrect on labsdb1003 only, it how now been solved: ``` labsdb1003[fawiki_p]> select * from page WHERE page_namespace = 0 and page_title IN ('محیط_عمل... [10:02:44] 06Labs: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#1677380 (10jcrespo) Can you provide more context of what it is missing, and test it has not been added yet? BTW, Labs is a very wide task- you should add it to the tracking ticket T150767 and not T50930, or o... [10:06:12] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070267 (10jcrespo) [10:06:14] 06Labs, 10ContentTranslation, 10DBA, 07WorkType-NewFunctionality: Replicate ContentTranslation databases on Labs - https://phabricator.wikimedia.org/T119847#3070264 (10jcrespo) 05Open>03stalled We had been requested many times for x1 to be replicated to labs. This is very dangerous, as many tables cont... [10:12:10] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3070270 (10Tpt) [10:12:15] 06Labs: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#3070269 (10Tpt) [10:12:17] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070271 (10Tpt) [10:12:46] 06Labs: `pr_index`to be replicated to Labs public databases - https://phabricator.wikimedia.org/T113842#1677380 (10Tpt) > Can you provide more context of what it is missing The `pr_index` table managed by the ProofreadPage extension and present on all Wikisources should be replicated on labs. > and test it has... [10:16:25] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#3070277 (10jcrespo) [10:16:27] 06Labs, 10DBA: Wrong page title in labs database replica enwiki page table - https://phabricator.wikimedia.org/T136618#3070274 (10jcrespo) 05Open>03Resolved a:03jcrespo Fixed: ``` labsdb1001[enwiki]> SELECT page_id, page_namespace, page_title FROM page where page_id IN (50274778,1272531,976991,50274777... [10:17:58] 06Labs, 10Labs-Infrastructure, 10DBA: LabsDB replica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#3070279 (10jcrespo) [10:18:04] 06Labs, 10DBA, 06Operations, 07Tracking: Database replication problems - production and labs (tracking) - https://phabricator.wikimedia.org/T50930#1118490 (10jcrespo) [10:18:06] 06Labs, 10DBA: page_lang column of the page table is not replicated to Labs - https://phabricator.wikimedia.org/T154355#3070278 (10jcrespo) [11:21:15] 06Labs, 10DBA: page_lang column of the page table is not replicated to Labs - https://phabricator.wikimedia.org/T154355#3070335 (10TTO) This column remains unavailable from Labs. For another similar task (T155605) @jcrespo suggested modifying [[https://phabricator.wikimedia.org/diffusion/OPUP/browse/production... [11:26:34] 06Labs: Lost Wikitech 2FA details, recovery needed - https://phabricator.wikimedia.org/T159521#3070340 (10mschwarzer) [11:26:46] 06Labs, 10DBA: page_lang column of the page table is not replicated to Labs - https://phabricator.wikimedia.org/T154355#3070353 (10Marostegui) The column is available in labs, what is not available is the view indeed. ``` mysql:root@localhost [enwiki]> select @@hostname; +------------+ | @@hostname | +--------... [13:51:24] 06Labs: Lost Wikitech 2FA details, recovery needed - https://phabricator.wikimedia.org/T159521#3070618 (10Aklapper) > this LDAP account is attached to this Phabricator account. [[ https://phabricator.wikimedia.org/p/mschwarzer/ | No, it is not. ]] [13:55:43] 06Labs: Lost Wikitech 2FA details, recovery needed - https://phabricator.wikimedia.org/T159521#3070340 (10EddieGP) Service: Though it's not, the user page https://wikitech.wikimedia.org/w/index.php?title=User:Mschwarzer&oldid=1418394 has the section "currently working on" (created by the user himself) and this l... [14:42:39] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3070723 (10chasemp) So, this seems like partially SSH timeouts. I have no problem upping that for now while are still figuring out baselines. The puppet run and setup variance is the most under... [14:52:14] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3070728 (10chasemp) So I just caught an instance that had initial issues. > 2017-03-03 14:38:46,717 INFO Creating fullstackd-1488551924 > 2017-03-03 14:44:32,354 INFO servers.labnet1001.nova.ve... [14:53:34] 06Labs: Problems accessing Labs via PuTTY and WinSCP - https://phabricator.wikimedia.org/T159533#3070746 (10AlvaroMolina) [14:53:37] 06Labs, 06Operations, 15User-Elukey: labtestcontrol2001: cron-spam from invoke-rc.d atop _cron - https://phabricator.wikimedia.org/T159532#3070758 (10ema) [15:02:33] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3070764 (10chasemp) load across active labvirts ```labvirt1001 15:01:30 up 128 days, 20:30, 1 user, load average: 52.25, 47.77, 48.21 labvirt1004 15:01:32 up 121 days, 19:37, 0 users, load... [15:03:35] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3070765 (10chasemp) labvirt1001 is handling more than it's share of load here, and I'm wondering if the scheduler is fairly weighted across these nodes that are at this moment unfairly allocated.... [15:06:10] !log wikispeech Deploy latest from Git master: e2fbe6a (T148622) [15:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikispeech/SAL [15:06:15] T148622: Highlight recited sentence - https://phabricator.wikimedia.org/T148622 [15:14:34] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3070787 (10chasemp) >>! In T159459#3070765, @chasemp wrote: > labvirt1001 is handling more than its share of load here, and I'm wondering if the scheduler is fairly weighted across these nodes th... [15:20:11] 06Labs, 10Tool-Labs: Problems accessing Labs via PuTTY and WinSCP - https://phabricator.wikimedia.org/T159533#3070799 (10AlvaroMolina) [15:22:55] 06Labs, 10Tool-Labs: Problems accessing Labs via PuTTY and WinSCP - https://phabricator.wikimedia.org/T159533#3070746 (10chasemp) @AlvaroMolina can you provide more details? What is the configuration of your setup? What VM are you trying to get into? Can you SSH into the bastion directly? What bastion do y... [15:23:19] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Alaa was created, changed by Alaa link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Alaa edit summary: Created page with "{{Tools Access Request |Justification=To do some maintenance work in the Arabic Wikipedia |Completed=false |User Name=Alaa }}" [15:26:36] 06Labs, 10Tool-Labs: Problems accessing Labs via PuTTY and WinSCP - https://phabricator.wikimedia.org/T159533#3070819 (10AlvaroMolina) >>! In T159533#3070803, @chasemp wrote: > @AlvaroMolina can you provide more details? > > What is the configuration of your setup? What VM are you trying to get into? Can yo... [15:31:46] 06Labs: Lost Wikitech 2FA details, recovery needed - https://phabricator.wikimedia.org/T159521#3070821 (10mschwarzer) I mixed something up. The phab account is linked to my MediaWiki profile. But yes, the wikitech user page refers to a ticket of this phab account. Sorry for the confusion. [15:36:11] 06Labs, 10Tool-Labs: Problems accessing Labs via PuTTY and WinSCP - https://phabricator.wikimedia.org/T159533#3070834 (10AlvaroMolina) 05Open>03Resolved a:03AlvaroMolina I was able to access, WinSCP had deleted the SSH public key file I had set in the settings. Thank you anyway. [15:50:35] chasemp: I am back around. Related to the long time to boot an instance, the load on labvirt1001 started raising at about the same time [16:02:23] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3070918 (10Marostegui) Shall we go for s5 or s2 next? I am not sure we can do both (because of disk space issues). s2 has more wikis, but s5 has wikidata :) [16:03:37] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3070931 (10jcrespo) s5 for me [16:11:12] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3070970 (10Paladox) [16:11:54] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3070982 (10Paladox) p:05Triage>03High [16:25:12] 06Labs, 10DBA, 13Patch-For-Review: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts - https://phabricator.wikimedia.org/T153743#3071026 (10Marostegui) >>! In T153743#3070931, @jcrespo wrote: > s5 for me Agreed. If that is the case, I believe db1070 can be a good option to be sanit... [16:27:23] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3071028 (10chasemp) a:03Andrew we merged https://gerrit.wikimedia.org/r/#/c/340986/ causing nova services to... [16:57:12] PROBLEM - Puppet run on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [17:23:35] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3071222 (10Paladox) p:05High>03Unbreak! Guessing unbreak as ci is down? [17:37:10] RECOVERY - Puppet run on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:25] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3071508 (10Paladox) p:05Unbreak!>03High [19:00:28] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3071624 (10jsn.sherman) You are correct, Aklapper. Sorry for completely mistagging this, and thanks for correcting it. [19:43:14] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ranjithsiji was created, changed by Ranjithsiji link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Ranjithsiji edit summary: Created page with "{{Tools Access Request |Justification=Creating a tool for tracking a category. Especially articles in a category. Used for tracking an Edit a thon in a Wiki. Like Asian Month...." [20:11:44] 06Labs, 10DBA: labsdb1004 MySQL crash - https://phabricator.wikimedia.org/T159572#3071865 (10Marostegui) [20:19:45] 06Labs, 10DBA: labsdb1004 MySQL crash - https://phabricator.wikimedia.org/T159572#3071897 (10Marostegui) Just to be clear, the server is UP, but I have left replication stopped so it doesn't crash the whole server when it comes to that transaction :-) [20:24:28] 06Labs, 10DBA: labsdb1004 MySQL crash - https://phabricator.wikimedia.org/T159572#3071901 (10chasemp) I honestly have no idea, I don't think I've had any contact with this setup. I hope @jynus or @yuvipanda has some wisdom. This seems super weird. [20:42:26] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3071968 (10MZMcBride) >>! In T159493#3070129, @jcrespo wrote: > @MZMcBride , why wait when you can go NOW and test the new servers? As many to... [20:43:57] 06Labs, 10Labs-Infrastructure, 10DBA: Data integrity issue with enwiki_p user_groups on Wikimedia Tool Labs (missing rows) - https://phabricator.wikimedia.org/T159493#3071970 (10MZMcBride) Related question: in scripts, I connect to `enwiki_p` for the database name and `enwiki.labsdb` for the database host na... [20:54:34] Hi, I'm a student working with Dr. Cohl and would like to gain access to the Wikitech Math Project [20:56:37] Can someone add me? [21:08:48] andrewbogott: Can you help me? [21:11:39] Mao: the list of admins is on https://wikitech.wikimedia.org/wiki/Nova_Resource:Math (assuming that is that project you want access to ) [21:11:53] dont know who is online though [21:12:20] potentially you can try filling a task in https://phabricator.wikimedia.org/ [21:12:35] so people can notice it when they get online [21:13:39] chasemp: good news! After labvirt pool got changed some hours ago, Nodepool time to launch an instance looks way better :} [21:14:00] and full stack metric on https://grafana.wikimedia.org/dashboard/db/labs-nova-fullstack as well [21:14:10] nice, I suspected that may happen [21:14:24] we are getting fulled up more and more in capacity [21:14:50] what I don't get is labvirt1001 / 1002 are overloaded, but then it is only a slice of the capacity [21:14:52] mobrovac: Can you help me with gaining access to the Math Project? [21:14:59] so I am not sure why it happened so often [21:16:04] chasemp: there are also a bunch of instances with high system CPU, and a lot of tools one with 100% user CPU. Top 15 for each of user/system on https://grafana-labs.wikimedia.org/dashboard/db/top-instances [21:16:42] hashar: my suspicion is those two as early in the list were somehow weighted [21:16:53] I don't know that there is state kept on scheduling across requests [21:17:09] so it's possibly goign to fill from front to back all things being equal but I'm not entirely certain [21:17:40] needs more investigation [21:24:11] andrewbogott: Are you an admin of the Math Project? [21:27:46] TIME [] [21:37:15] chasemp: maybe I will dig in the scheduler code over the weekend :} [21:42:02] Ill look at it aswell [21:42:33] Its in integration/config or jenkins? [21:42:53] Nevermind, I managed to convince an admin to add me. Thank you. [21:43:42] Mao: no [21:43:44] Problem [22:21:55] (03CR) 10Paladox: [V: 032 C: 032] Adding pywikibot-core so the bot development can offically start. [labs/tools/quarrybot-enwiki] - 10https://gerrit.wikimedia.org/r/339829 (owner: 10Zppix) [23:00:08] 06Labs, 06Operations, 06Release-Engineering-Team, 07Nodepool: Investigate why nodepool keeps leaking instances and why it stops for no reason sometimes - https://phabricator.wikimedia.org/T159543#3072330 (10hashar) 05Open>03Resolved Nova / OpenStack recovered. Thus instances managed to get deleted and... [23:05:38] Hi [23:05:41] Someone here? [23:05:42] Hi Zenith4237, I am here, if you need anything, please ask, otherwise no one is going to help you... Thank you [23:06:01] I'd need some help with cron jobs on tools