[00:18:42] PROBLEM - Puppet run on tools-worker-1029 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:21:49] PROBLEM - Puppet run on tools-worker-1028 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:47:14] 10Tool-Labs-tools-Xtools: XTools: Top edits - 'All' option - https://phabricator.wikimedia.org/T160720#3108792 (10Stevietheman) [00:50:26] 10Tool-Labs-tools-Xtools, 06Community-Tech: XTools: Top edits - 'All' option - https://phabricator.wikimedia.org/T160721#3108804 (10Stevietheman) [00:54:51] 10Tool-Labs-tools-Xtools: XTools: Top edits - 'All' option - https://phabricator.wikimedia.org/T160720#3108824 (10Stevietheman) Please close this ticket as a duplicate of T160721. I had meant to create this as a subtask. [01:57:00] PROBLEM - Puppet run on tools-worker-1008 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:58:06] PROBLEM - Free space - all mounts on tools-proxy-01 is CRITICAL: CRITICAL: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found)tools.tools-proxy-01.diskspace.root.byte_percentfree (<22.22%) [02:18:09] RECOVERY - Free space - all mounts on tools-proxy-01 is OK: OK: tools.tools-proxy-01.diskspace._public_dumps.byte_percentfree (No valid datapoints found) [02:31:58] RECOVERY - Puppet run on tools-worker-1008 is OK: OK: Less than 1.00% above the threshold [0.0] [06:47:42] PROBLEM - Puppet run on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [06:50:04] PROBLEM - Puppet run on tools-exec-1422 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:27:42] RECOVERY - Puppet run on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [07:30:02] RECOVERY - Puppet run on tools-exec-1422 is OK: OK: Less than 1.00% above the threshold [0.0] [08:13:28] How well does mediawiki work together with nginx? [08:19:13] 06Labs, 10DBA, 10MediaWiki-extensions-Babel: Replicate babel db table on Labs - https://phabricator.wikimedia.org/T160713#3109132 (10jcrespo) I've checked and babel table and it is being replicated to labs, just not exposed (needs view changes). I would suggest to labs team to ask for the ok from legal and/o... [10:00:40] 06Labs, 10Labs-Infrastructure, 10DBA: ug_expiry column of the user_groups table is not present on Labs - https://phabricator.wikimedia.org/T160686#3109297 (10Marostegui) Just to clarify: Moved it in our internal DBA dashboard to the "not db team" as this is normally handled by Labs. [12:25:10] 06Labs, 06Operations: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3109497 (10chasemp) 05Open>03Resolved a:03chasemp > labstore1003.eqiad.wmnet:/dumps nfs4 28T 18T 11T 64% /public/dumps [12:31:12] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3109518 (10jcrespo) Unless anyone else says so, I will reimage the old server on Monday. Last chance to check data and functionality works on the ne... [12:34:03] 06Labs: Clean up data in /data/scratch/mwoffliner - https://phabricator.wikimedia.org/T144025#3109534 (10chasemp) >>! In T144025#2920185, @Kelson wrote: > @chasemp I'll clean it piece by piece to see if it works fine without. Until now I have only verified other aspects (than storage) of the full dumping. root@... [12:36:03] 06Labs, 10Labs-Infrastructure: labvirt1001 can't launch new VMs - https://phabricator.wikimedia.org/T159721#3109535 (10chasemp) p:05Triage>03High [12:36:10] 06Labs, 10Labs-Infrastructure, 06Operations, 10ops-eqiad: Labvirt1001 has insanely slow IO - https://phabricator.wikimedia.org/T159835#3109536 (10chasemp) p:05Triage>03High [12:40:51] 06Labs, 10Tool-Labs: Virtualenvs slow on tool labs NFS - https://phabricator.wikimedia.org/T136712#3109538 (10chasemp) a:03madhuvishy We want to experiment with enabling lookupcache=all everywhere. This is currently set on the k8s-workers and the bastions afaict. Passing to @madhuvishy from conversations y... [12:47:09] 06Labs, 10Labs-Infrastructure, 10DBA: ug_expiry column of the user_groups table is not present on Labs - https://phabricator.wikimedia.org/T160686#3109542 (10chasemp) 05Open>03Resolved should be good to go, let me know if not [12:54:17] 06Labs, 06Operations: openstack instance creation sometimes takes >480s - https://phabricator.wikimedia.org/T159459#3109559 (10hashar) p:05High>03Normal Since labvirt1001 and labvirt1002 have been removed from the scheduler pool, the time to get ssh access has significantly dropped and seems to be rather s... [12:54:21] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Deprecate precise instances in Labs by 2017-03-31 - https://phabricator.wikimedia.org/T143349#3109558 (10chasemp) A note that the appointed time grows nigh, and this is quickly becoming the most mysterious item left on the list: > wildcat dannyb no Andrew w... [12:59:31] 06Labs, 07Wikimedia-Incident: Monitor labs new instance creation - https://phabricator.wikimedia.org/T123590#3109566 (10chasemp) 05Open>03Resolved a:03chasemp We have this running as of 99ef86ae0e2b74370e543d3fe22a46e8b0928df3 and have found several issues from the normalized and ongoing testing [13:02:17] 06Labs, 10Labs-Sprint-107, 10Labs-Sprint-108, 13Patch-For-Review: Make continuous backups of NFS data to codfw - https://phabricator.wikimedia.org/T106474#3109578 (10chasemp) [13:02:20] 06Labs, 06Operations: paramiko (python SSH implementation) needs older hashes for host authentication - https://phabricator.wikimedia.org/T106871#3109576 (10chasemp) 05Open>03Invalid We removed paramiko from the backup pipeline [13:06:12] 06Labs, 10Tool-Labs, 07Tracking: Useful graphite metrics to be tracked for Tool labs (tracking) - https://phabricator.wikimedia.org/T69879#3109586 (10chasemp) [13:06:14] 06Labs, 10Tool-Labs: Track gridengine stats on Graphite - https://phabricator.wikimedia.org/T69881#3109583 (10chasemp) 05Open>03Resolved a:03chasemp We have the basics of this now: https://graphite-labs.wikimedia.org/render?title=Tools&yMin=0&width=800&height=400&target=cactiStyle(alias(sumSeries(tools.... [13:11:18] 06Labs, 10Labs-Infrastructure: virt host reboots sometimes breaks puppet on instances - https://phabricator.wikimedia.org/T127698#3109588 (10chasemp) 05Open>03declined Since this is >1yr old and we haven't updated it at all I'm going to close in favor of resurfacing the issue if needed [13:13:11] 06Labs, 10Tool-Labs, 05Security: proxylistener does not verify that request comes from Tools project - https://phabricator.wikimedia.org/T124731#3109593 (10chasemp) [13:13:24] 06Labs, 10Tool-Labs, 05Security: proxylistener does not verify that request comes from Tools project - https://phabricator.wikimedia.org/T124731#1964556 (10chasemp) 05Open>03Invalid >>! In T124731#2987981, @scfc wrote: > I don't know why I did not try a simple `curl` :-). When I `webservice php5.6 shell... [13:14:42] 06Labs, 10Tool-Labs, 13Patch-For-Review: bigbrother doesn't stop - https://phabricator.wikimedia.org/T94500#3109598 (10chasemp) >>! In T94500#2932730, @gerritbot wrote: > Change 330265 merged by Andrew Bogott: > toollabs: bigbrother: stop tracking jobs when rcfile is deleted > > [[https://gerrit.wikimedia.o... [13:16:36] 06Labs, 10Tool-Labs: Investigate alternatives to dedicated exec node for gifti's tools - https://phabricator.wikimedia.org/T99130#3109601 (10chasemp) 05Open>03declined Eventually this workload moves to k8s with all others but for now I'm marking this declined with https://phabricator.wikimedia.org/T156981#... [13:19:09] chasemp: working on it, but not enough big contiguous timeframe available to do the switch yet :-/ [13:19:25] 06Labs: Instances spontaneously suspended - https://phabricator.wikimedia.org/T113646#3109604 (10chasemp) 05Open>03declined considering age and no activity I'm bouncing this task [13:19:38] Danny_B: ok thanks, can you comment on that task to the effect? [13:21:36] 06Labs, 10Tool-Labs: Some grid jobs are in odd state - https://phabricator.wikimedia.org/T95094#1180258 (10chasemp) >>! In T95094#1657187, @scfc wrote: > There are still some jobs in that state; I have changed http://tools.wmflabs.org/?status to display "n/a" for jobs with no information, so this is an easy wa... [13:24:45] 06Labs, 10Tool-Labs, 07Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#3109615 (10chasemp) [13:24:48] 06Labs, 10Tool-Labs, 10Pywikibot-core: Install python-enum34 on toollabs - https://phabricator.wikimedia.org/T111602#3109612 (10chasemp) 05Open>03Resolved a:03chasemp >>! In T111602#1616257, @jayvdb wrote: > So it is available on trusty now, but not precise. > This should be fairly easy to package for... [13:26:07] 06Labs, 10Labs-Infrastructure, 10Labs-Sprint-114: Ironic on Labs - https://phabricator.wikimedia.org/T110556#3109617 (10chasemp) 05Open>03declined https://wikitech.wikimedia.org/wiki/Labs_labs_labs/Bare_Metal [13:30:21] 06Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#3109622 (10chasemp) 05Open>03Resolved a:03chasemp as of efcac33f8a5d00427a0593e9e7b6e8a020c86f40 this is hopefully at least viable and further work should be track... [13:33:38] 06Labs: High load on idle machines - https://phabricator.wikimedia.org/T104416#3109630 (10chasemp) 05Open>03declined I'm closing for age and lack of activity [13:39:07] 06Labs, 10Tool-Labs, 07Tracking: Useful graphite metrics to be tracked for Tool labs (tracking) - https://phabricator.wikimedia.org/T69879#3109662 (10chasemp) [13:39:09] 06Labs, 10Tool-Labs: Track labsdb stats on Labs Graphite - https://phabricator.wikimedia.org/T69884#3109659 (10chasemp) 05Open>03Resolved a:03chasemp Since these are production servers it seems most appropriate they would appear in prod graphite/promethius https://grafana-admin.wikimedia.org/dashboard/d... [13:42:26] 06Labs, 10Tool-Labs: enable hba on tools-precise-dev - https://phabricator.wikimedia.org/T103058#3109674 (10chasemp) 05Open>03Invalid precise is no longer supported [13:47:47] 06Labs, 10Labs-Sprint-101: Provide all labs users with username / passwords for the Postgres database - https://phabricator.wikimedia.org/T101661#3109679 (10chasemp) 05Open>03declined this seems to not be an issue and I'm not inclined to worry about it with the current demand [13:50:35] 06Labs: Puppet errors on newly created instances - https://phabricator.wikimedia.org/T100108#3109681 (10chasemp) 05Open>03Resolved a:03chasemp this is long since old I believe [13:52:08] 06Labs, 10Labs-Infrastructure: Weird state of /data/project for dumps (semi-missing files) - https://phabricator.wikimedia.org/T87224#3109684 (10chasemp) 05Open>03Invalid closing this due to age and activity [13:59:56] 06Labs, 10Labs-Infrastructure: Database is slow. Load times abnormally high at times. - https://phabricator.wikimedia.org/T71326#3109711 (10chasemp) 05Open>03Invalid closing due to age and activity [14:01:51] 06Labs, 10Labs-Infrastructure, 07Documentation: add Central Logging Service documentation - https://phabricator.wikimedia.org/T56702#3109722 (10chasemp) 05Open>03Invalid no longer accurate [14:05:12] 06Labs, 10Labs-Infrastructure: Nagios checks needed for labs-ns0/labs-ns1 - https://phabricator.wikimedia.org/T45028#3109732 (10chasemp) 05Open>03Resolved a:03chasemp this [[ https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=labs-ns1.wikimedia.org&service=Check+for+gridmaster+host+resol... [14:15:40] 06Labs, 10Labs-Infrastructure: Invalidate the nscd group cache of instances in a project when a user is added or removed - https://phabricator.wikimedia.org/T45526#3109754 (10chasemp) 05Open>03declined closing due to age and activity, I don't think this has been an issue [14:17:31] 06Labs, 10Wikimedia-Labs-General: Report when an instance has finished its initial Puppet run - https://phabricator.wikimedia.org/T70508#3109757 (10chasemp) 05Open>03declined [14:21:09] 06Labs, 13Patch-For-Review, 07Puppet: Labs: Could not find dependency File[/usr/lib/ganglia/python_modules] for File[/usr/lib/ganglia/python_modules/gmond_memcached.py] - https://phabricator.wikimedia.org/T95107#3109763 (10chasemp) 05Open>03Resolved a:03chasemp closing due to age and activity (seems fi... [15:21:59] 06Labs, 10Tool-Labs, 13Patch-For-Review: bigbrother doesn't stop - https://phabricator.wikimedia.org/T94500#3109948 (10Andrew) 05Open>03Resolved a:03Andrew Yes, I think this is resolved. [16:07:15] 06Labs, 10Tool-Labs: Virtualenvs slow on tool labs NFS - https://phabricator.wikimedia.org/T136712#3110080 (10madhuvishy) I tested the lookupcache enabling yesterday on tools-bastion-05 (it wasn't enabled there) and an exec node(tools-exec-1402), through instance hiera. One puppet run - which was clean, and th... [16:23:41] RECOVERY - Puppet run on tools-worker-1029 is OK: OK: Less than 1.00% above the threshold [0.0] [16:26:46] RECOVERY - Puppet run on tools-worker-1028 is OK: OK: Less than 1.00% above the threshold [0.0] [16:29:07] 06Labs, 10Tool-Labs: Cannot access replica databases - access denied - https://phabricator.wikimedia.org/T151296#3110161 (10MnemonicFlow) How can I contact an admin so that I can have a working replica.my.cnf file? I still cannot connect to the replicas... As a simple user I don't have the necessary rights to... [16:30:41] Can any admin help with https://phabricator.wikimedia.org/T151296#2823897 ? I need a regenerated replica.my.cnf file for my shell user so that I can access the database replicas [16:32:50] 06Labs, 10Wikimedia-Site-requests, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitech: Switch over from using extension SemanticForms to PageForms - https://phabricator.wikimedia.org/T149749#3110165 (10demon) 05Open>03declined {T53642} [16:34:16] mflow: Hi! I'll look into it in a little bit :) [16:37:41] 06Labs, 06Operations, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#3110177 (10yuvipanda) The puppet module is still present - although documentaiton has been updated to point people to the p... [16:37:53] madhuvishy: thanks, I'll by afk for some time on IRC, but I'll be back later on. If you need anything from me, let me know by responding to that issue [16:38:05] mflow: sure, will do [16:38:44] 06Labs, 10Wikimedia-Site-requests, 10wikitech.wikimedia.org, 13Patch-For-Review: Wikitech: Switch over from using extension SemanticForms to PageForms - https://phabricator.wikimedia.org/T149749#3110182 (10Reedy) Depending on how long that ends up taking... I may suggest we jfdi this for the security stuff... [16:42:03] 06Labs, 10wikitech.wikimedia.org: Get rid of SemanticMediaWiki/SRF/SF from wikitech.wikimedia.org - https://phabricator.wikimedia.org/T53642#3110190 (10demon) [16:44:03] 10Tool-Labs-tools-Pageviews: Upgrade Chart.js to use new logarithmic axis - https://phabricator.wikimedia.org/T160765#3110195 (10MusikAnimal) [16:51:11] Matthew_: Hi! following up on T133321 - it seems like xtools usage is climbing on labsdb1003 again and is up at 118G. The disk usage on the server is close to full, and cleaning this up would help a lot [16:51:11] T133321: `s51187__xtools_tmp` database using 272G on labsdb1001 - https://phabricator.wikimedia.org/T133321 [16:52:46] madhuvishy: Hm. Let me check the logs for my cleanup, i wonder if there's a bug. [16:53:42] 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3110246 (10jcrespo) [16:53:46] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Xtools, 10DBA: `s51187__xtools_tmp` database using 272G on labsdb1001 and 118G on labsdb1003 - https://phabricator.wikimedia.org/T133321#3110244 (10jcrespo) 05Resolved>03Open [17:00:21] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Xtools, 10DBA: `s51187__xtools_tmp` database using 272G on labsdb1001 and 118G on labsdb1003 - https://phabricator.wikimedia.org/T133321#3110269 (10Matthewrbowker) The cleanup job has been running successfully. I ran it manually, here is the output. ``` 16:54 [xtoo... [17:01:13] 06Labs, 10Tool-Labs, 10DBA: u3532__ (=marcmiquel) table using 64G on labsdb1001 and 108 GB on labsdb1003 - https://phabricator.wikimedia.org/T133322#3110275 (10jcrespo) [17:02:38] 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3110283 (10jcrespo) [17:02:42] 06Labs, 10Tool-Labs, 10DBA: u3532__ (=marcmiquel) table using 64G on labsdb1001 and 108 GB on labsdb1003 - https://phabricator.wikimedia.org/T133322#2228158 (10jcrespo) 05Resolved>03Open labsdb1003 is now constraineed, and one of your databases have >100GB in space. They look like simple copies of produc... [17:02:44] 06Labs, 10wikitech.wikimedia.org: Get rid of SemanticMediaWiki/SRF/SF from wikitech.wikimedia.org - https://phabricator.wikimedia.org/T53642#3110284 (10demon) [17:06:08] 06Labs, 10wikitech.wikimedia.org: Get rid of SemanticMediaWiki/SRF/SF from wikitech.wikimedia.org - https://phabricator.wikimedia.org/T53642#3110290 (10demon) We already [[ https://gerrit.wikimedia.org/r/#/c/340354/ | banned the creation of new forms ]], I wonder if we should do something with AbuseFilter to s... [17:09:01] 06Labs, 10Labs-Infrastructure: Strategies to avoid OOM on labvirt hosts - https://phabricator.wikimedia.org/T139954#3110321 (10Andrew) 05Open>03Resolved a:03Andrew With changes to our provision ration this hasn't been an issue anymore. [17:10:31] 06Labs, 06Discovery, 06Operations, 03Interactive-Sprint, 06Maps (Maps-data): PostgreSQL query planner bug on labsdb1006 - https://phabricator.wikimedia.org/T145599#3110325 (10MaxSem) 05Open>03Resolved a:03MaxSem Works after the servers were upgraded to PG 9.4. [17:11:12] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3110330 (10MaxSem) The new server works for me. The upgrade also resolved T145599. Thank you! [17:11:33] 06Labs, 06Discovery, 06Operations, 03Interactive-Sprint, 06Maps (Maps-data): PostgreSQL query planner bug on labsdb1006 - https://phabricator.wikimedia.org/T145599#2635546 (10jcrespo) \o/ [17:15:26] !log tools moving tools-exec-1424 to labvirt1012 to ease load on labvirt1004 [17:15:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:18:54] 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3110371 (10jcrespo) [17:18:58] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Xtools, 10DBA: `s51187__xtools_tmp` database using 272G on labsdb1001 and 118G on labsdb1003 - https://phabricator.wikimedia.org/T133321#3110369 (10jcrespo) 05Open>03Resolved > What is the difference between labsdb1001 and labsdb1003? Does labsdb1001 correlate to... [17:19:53] PROBLEM - Host tools-exec-1424 is DOWN: CRITICAL - Host Unreachable (10.68.19.159) [17:22:45] RECOVERY - Host tools-exec-1424 is UP: PING OK - Packet loss = 0%, RTA = 2.98 ms [17:24:57] !log tools moving tools-webgrid-lighttpd-1416 to labvirt1013 to reduce load on labvirt1004 [17:25:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:28:14] PROBLEM - Puppet run on tools-exec-1424 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [17:30:02] PROBLEM - Host tools-webgrid-lighttpd-1416 is DOWN: CRITICAL - Host Unreachable (10.68.19.50) [17:33:13] RECOVERY - Puppet run on tools-exec-1424 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:35] 06Labs, 10Tool-Labs: Cannot access replica databases - access denied - https://phabricator.wikimedia.org/T151296#3110430 (10Urbanecm) @MnemonicFlow You can contact an admin using IRC, connect to #wikimedia-labs at freenode and bring on your problem, Also include the Tnumber you can see in the URL as the admin... [17:38:33] 06Labs, 10Tool-Labs: Cannot access replica databases - access denied - https://phabricator.wikimedia.org/T151296#3110431 (10Urbanecm) p:05Normal>03High This seems to be breaking problem so raising the priority to HIGH. [17:39:53] Hi, any idea when this task might get looked at? Thanks. https://phabricator.wikimedia.org/T159407 [17:40:16] Just curious about timescale :) [17:42:55] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3066662 (10chasemp) I'm not sure if this is the right solution, almost certainly it's not a good solution. How large are the backups expected to be? [17:42:59] Samwalton9: commented, but I'm not sure taht's the thing to do [17:44:14] 06Labs, 10Tool-Labs, 07Tracking: Make maintain-dbusers.py create replica.my.cnf files for user accounts as well - https://phabricator.wikimedia.org/T158420#3110447 (10chasemp) a:03madhuvishy [17:44:23] Thanks chasemp [17:44:47] 06Labs, 10Tool-Labs: Cannot access replica databases - access denied - https://phabricator.wikimedia.org/T151296#3110454 (10chasemp) This is a known issue and T158420 will resolve it but at present there is no mechanism for maintainer per-user replica creds, only per tool. It's in progress though. [17:51:07] 06Labs, 10Tool-Labs: Cannot access replica databases - access denied - https://phabricator.wikimedia.org/T151296#3110460 (10Urbanecm) @chasemp Thank you for the infos [17:55:28] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3110464 (10bd808) @Samwalton9 can you give us some estimates of the space you need for these backups? The related ticket mentions 30 days of daily backups. Are we talking about... [17:57:26] RECOVERY - Host tools-webgrid-lighttpd-1416 is UP: PING OK - Packet loss = 0%, RTA = 1.50 ms [18:23:56] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:36:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Info-farmer was created, changed by Info-farmer link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Info-farmer edit summary: Created page with "{{Tools Access Request |Justification=I am running info-farmerBot for Tamil Wikimedia projects. I need a server to run the bot. |Completed=false |User Name=Info-farmer }}" [18:58:56] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:22] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3111067 (10dschwen) Ok, up till now I had no pressure to get on horizon, but I need to rebuild an instance now, and being unable to lo... [21:20:47] andrewbogott, do you have any idea, why I'm unable to log into horizon? [21:22:16] dschwen: not really — do the same creds work on wikitech [21:22:16] ? [21:23:21] let me try [21:23:33] I think my 2fa was off again [21:23:42] at some point I disabled and reanabled it [21:23:48] give me a minute [21:24:05] andrewbogott: we should really change horizon to use the 2fa api endpoint I built for Striker [21:24:19] I wonder if I created a bug for that [21:24:45] dschwen: if your 2fa was turned off then horizon will definitely not work at all [21:25:26] dang [21:25:39] it is enabled now, but I still get the same error [21:25:58] on wikitech? [21:26:20] Yeah, I can log in with the token on wikitech.wikimedia.org [21:27:30] same username and password on horizon gives me An error occurred authenticating. Please try again later. [21:27:48] if I enter a wrong password teh error is "Invalid credentials" instead [21:27:55] dschwen: your username and your shell name are the same, right? [21:28:04] yeah, "dschwen" [21:29:57] dschwen: I turned on some debug lines, can you try again? [21:31:33] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3111125 (10Samwalton9) @jsn.sherman should be able to provide details :) [21:33:55] Ok, I see some stuff :-) [21:34:26] bah, yeah, i think I broke something… probably unrelated [21:34:27] sorry, hang on [21:34:43] https://gist.github.com/dschwen/19045e40466a19e766eeacdefc523271 [21:34:46] oh, ok [21:34:49] who knew debug=true would be so messy [21:34:59] literally everyone [21:35:14] nobody knew healthcare would be so complicated... [21:40:36] * andrewbogott shakes fist [21:40:39] ok, try again? [21:41:00] same [21:41:16] did you see anything? [21:42:03] not really, just "DEBUG:keystoneauth.session:Request returned failure status: 400" [21:42:13] which really comes as no surprise [21:43:19] https://bugs.launchpad.net/horizon/+bug/1661456 related? [21:43:59] I don't think so. Given that it works for everyone else... [21:44:26] I feel really special now [21:45:17] You're visiting a completely unembellished url, right? Just https://horizon.wikimedia.org ? [21:45:19] clear cookies / history / new browser? [21:45:28] what browser is it? [21:45:31] yeah, new browser is probably a good thing to try [21:47:37] chrome [21:47:45] let my try in incognito mode! [21:48:33] same deal in incognito. trying FF now [21:49:09] same deal [21:50:09] All the logs say is [21:50:12] [Fri Mar 17 21:49:01.558742 2017] [:error] [pid 27349] DEBUG:keystoneauth.session:Request returned failure status: 400 [21:50:13] [Fri Mar 17 21:49:01.559150 2017] [:error] [pid 27349] Login failed for user "dschwen". [21:50:39] I'm going to try to log myself in with a bad password and/or bad token, see if I can get the same messages... [21:50:40] I tried lynx, but horizon apparently needs JS :-D [21:50:59] bad token gets 401 [21:51:30] bad password also 401 [21:51:55] aaaaaaand fake username, also 401 [21:52:02] so 400 means something special... [21:52:06] yay [21:52:14] chippy: you're 401'ing as well [21:52:29] oops, sorry chippy, mistyped [21:55:04] HTTP 401 == Unauthorized [21:55:23] yeah, as it should be... [21:55:29] dschwen: try once more please? [21:55:32] I'm "Bad request" [21:55:34] ok [21:55:36] one sec [21:56:06] same [21:56:11] isn't 400 like request is invalid? [21:56:15] um, I was trying to capture a log and I missed it [21:56:18] once more? [21:56:32] ok.. [21:56:34] ah yeah [21:56:34] HTTP Status Code 400: The server cannot or will not process the request due to something that is perceived to be a client error [21:56:56] Ok, logging in now... [21:57:01] done [21:57:05] same error [21:57:24] ahh, so teh server is blaming me basically? ;-) [21:57:39] ok, something interesting is happening in keystone [21:57:50] "interesting" [21:58:01] nover a good word when debugging :) [21:58:05] https://www.irccloud.com/pastebin/gHNLRpn3/ [21:58:24] that implies that you don't have 2fa [21:58:31] so, let me dig in the db and the code... [21:59:36] thx. Yeah, I swear that 2fa is set up and working :-) [22:00:06] for wikitech.wikimedia.org [22:01:16] dschwen: humor me and try logging into horizon as Dschwen (capital D) instead of as dschwen? [22:02:59] sure, one sec, phone charging [22:03:31] another thing to check would be if the login works at https://toolsadmin.wikimedia.org/ and if it prompts you for a 2fa token [22:03:55] same error [22:04:03] well, that's a small mercy [22:04:41] https://toolsadmin.wikimedia.org/ prompts for 2fa token and lets me log in just fine [22:06:19] dschwen: ok, I'm going to log the actual sql query that is failing. Let me know when you're ready to try again? [22:06:32] let me populate the form [22:07:01] ready, entering 2fa token... [22:07:08] go ahead [22:07:12] click [22:07:19] Hmmm... if toolsadmin works and horizon doesn't then it is certainly a bug in the funky way that horizon checks tokens [22:07:26] yep [22:07:30] in the keystone plugin [22:07:36] did you get it? [22:08:40] everything but the interesting part :( [22:08:45] once more, let me know when you're ready [22:09:03] entering token... [22:09:10] ok, have at [22:09:13] click [22:10:37] ok, now I can see the issue in mysql at least… I think [22:12:06] :) [22:12:45] dschwen: sorry, can you try again as Dschwen? I still feel like this has something to do with case [22:12:57] (even though I know that doesn't work, I want to see where the 'D' is dropped) [22:13:41] yeah, [22:14:10] done [22:14:40] and once more [22:16:40] dschwen: hm… does it work now? [22:17:58] OMG!!!! [22:18:02] i'm in [22:18:10] daaaaaaaaaaaaaaang [22:18:36] * andrewbogott makes a bug [22:19:24] Ok, is this going to last? I.e. will I be able to logout and in? [22:19:31] or does this need more work? [22:21:46] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3111211 (10Andrew) This is because of a case mismatch between ldap and mediawiki. The mediawiki user_name table had the username 'Dsc... [22:21:47] I need to rebuild fastcci-worker1 (from scratch), but my quota does not allow me to create a m1.medium instance [22:21:54] dschwen: it should last [22:21:59] Explanation is in ^ [22:22:01] ok, thx [22:22:23] dschwen: do you want me to adjust your quota so you can create the new instance? I can just drop it again as soon as you delete the old one [22:22:37] yeah, that'd be great [22:22:48] 06Labs, 10Horizon: User dschwen unable to log into horizon.wikimedia.org (An error occurred authenticating. Please try again later.) - https://phabricator.wikimedia.org/T154860#3111215 (10Andrew) a:03Andrew [22:22:54] ok, stay tuned… what project is this? [22:22:58] it may be a few days until can drop the old one [22:23:01] fastcci [22:24:47] 06Labs, 07Tracking: Temporary quota increase for fastcci project - https://phabricator.wikimedia.org/T160798#3111225 (10Andrew) [22:25:53] dschwen: is that enough headroom? [22:30:29] 06Labs, 07Tracking: Temporary quota increase for fastcci project - https://phabricator.wikimedia.org/T160798#3111240 (10Andrew) a:03Andrew [22:30:54] 06Labs, 07Tracking: Temporary quota increase for fastcci project - https://phabricator.wikimedia.org/T160798#3111225 (10Andrew) @dschwen, I think I increased your quota enough for what you need... let me know if it's not enough. Otherwise, let me know when you're cleaned up and I'll drop the quota back down. [22:37:21] ok, btw. My instances are still labled ubuntu-12.04-precise (deprecated 2014-04-17) [22:37:28] but I in-place-upgraded them [22:38:07] is tehre a way to change the "Image Name" entry? [22:38:25] I'm a little worried that some admin might delete them accidentally [22:39:44] dschwen: I have it in mind, I don't think they'll be in too much danger [22:39:55] the name refers to the base qcow2 image that they're running off of, so there's really no way to change that [22:40:07] This is part of why I discourage people from upgrading in place, but you're not the only one [22:40:45] dschwen: I made a bug and subscribed you in case you run into quota problems. Need to go for now though [22:41:56] ok, thanks again [22:57:51] 06Labs, 10The-Wikipedia-Library: Requesting /data/project NFS share for Nova_Resource:Twl - https://phabricator.wikimedia.org/T159407#3111337 (10jsn.sherman) @chasemp @bd808 We're talking 10GB range for 30 days dumps for the foreseeable future. Our goals are to be able to recover from a lost instance, and to b... [23:21:44] Hello all, I hopefully have a quick question. [23:22:22] Since I am using wikipedia's slave copy and this is a remote database, how can I perform a sql join command [23:22:24] ? [23:27:01] hello [23:32:00] tomthirteen: joins should work as usual - what are you trying to join with? here's an example joining page and page_links on arwiki_p https://quarry.wmflabs.org/query/9972 [23:33:09] when I try to use a join, I'm told I don't have auth to do that [23:34:35] I want to join edits done in 2016 from "revision" and join from "page" table I want to get namespace 0 [23:35:29] tomthirteen: can you paste query and error message? also do queries to both those tables individually work? [23:36:04] yes, but I have to use sql simplewiki_p -e "select rev_page, rev_user_text, rev_timestamp from revision where rev_timestamp > 20160101000000 and rev_timestamp < 20170101000000 and rev_user_text > 0" > XXX_revision.txt [23:36:36] if i try without -e "" I get error messages [23:36:55] im using only read databases; i have no auth rights [23:37:16] what error messages are these? [23:37:31] tomthirteen: can you past the exact sql query you are trying to run? [23:37:36] *paste [23:37:55] sure one moment [23:41:57] I want something like this: [23:41:58] use simplewiki_p; select page_id, page_namespace, page_title from page left join revision on page_id.page = rev_page.revision; [23:42:47] use automatically gives "command not found" [23:43:08] tomthirteen: where are you running this? [23:43:29] in slave copy of wikipedia [23:43:38] i know this is a remote database [23:43:52] so how do I run the equivalent of this? [23:44:17] tomthirteen: do you want the output to a file or to your screen? [23:44:29] output to file [23:45:19] and `sql simeplewiki_p 'select page_id, page_namespace, page_title from page left join revision on page_id.page = rev_page.revision' > query.out` is basically what is blowing up for you? [23:45:50] yes sir [23:46:06] (sorry, I assumed you were sir, not ma'am) [23:46:09] the "command not found" part sounds like the shell telling you that you didn't quote something... [23:46:29] or that you aren't on a machine with `sql` installed [23:47:33] what happens when you jsut run: sql simplewiki_p [23:49:12] actually wait i think i got it! [23:49:29] $ sql simplewiki_p -e "select page_id, page_namespace, page_title from page left join revision on page_id = rev_page" > test.txt [23:49:42] I tried something like that and I think it worked [23:50:03] im used to write mysql so this remote database work around is throwing me a bit [23:51:54] you are using mysql, its just that INTO OUTFILE won't work [23:52:14] yes, but before the join was throwing an error [23:52:22] tomthirteen: just curious, have you seen quarry.wmflabs.org? [23:52:32] i forgot to save it as a file with ">" [23:53:18] @yuvipanda - Thanks! that looks very useful [23:53:46] tomthirteen: yeah, it's a nice and simple way to write SQL, exceute it, and share your results with people that persists over time [23:54:57] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:55:08] thank you [23:56:07] tomthirteen: yw! [23:58:04] you guys are very helpful. I appreciate it.