[00:02:34] Okay, so someone had deleted https://gerrit.wikimedia.org/r/#/c/247587/2 from the puppetmaster [00:06:51] We should use letsencrypt instead. [00:09:26] 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2513777 (10greg) [00:11:10] ostriches it seems that https://gerrit.wikimedia.org/r/#/c/299164/ is failing to be built [00:11:17] https://gerrit.wikimedia.org/r/#/c/302371/ [00:11:20] which is strange [00:11:33] since it works here https://gerrit.wikimedia.org/r/#/c/301841 [00:11:58] 00:03:09 gbp:error: upstream/2.12.2-wmf is not a valid treeish [00:12:32] yep [00:12:39] but it works in https://gerrit.wikimedia.org/r/#/c/301841 [00:12:49] but breaks in https://gerrit.wikimedia.org/r/#/c/299164/ [00:12:52] ostriches ^^ [00:13:20] Because your patch introduces gbp.conf [00:13:22] Mine doesn't have it [00:13:43] Yep [00:13:53] Which should fix 00:03:09 gbp:error: upstream/2.12.2-wmf is not a valid treeish [00:14:06] since yours fails too [00:14:58] I know it fails [00:15:04] It never worked, so I don't pay attention ;-) [00:15:14] oh [00:15:16] but it did [00:15:38] ostriches it built https://integration.wikimedia.org/ci/job/debian-glue-non-voting/44/artifact/gerrit_2.12.2+0~20160801235111.44+jessie+wikimedia~1.gbp0d41ac_all.deb [00:16:28] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [00:20:00] ostriches i got it working [00:20:01] https://integration.wikimedia.org/ci/job/debian-glue-non-voting/49/console [00:20:02] yay [00:21:46] ostriches i got the deb now [00:21:47] https://integration.wikimedia.org/ci/job/debian-glue-non-voting/49/artifact/gerrit_2.12.2-wmf.1+0~20160802001844.49+jessie+wikimedia~1.gbp8a56c5_all.deb [00:21:55] That is based on your change :) [00:22:13] all we need now is gerrit 2.12.3 :) [00:27:07] ostriches i think the deb will work now [00:46:43] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [00:52:44] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [00:58:16] tada: https://upload.beta.wmflabs.org/wikisource/en/thumb/6/62/Wind_in_the_Willows_%281913%29.djvu/page7-1024px-Wind_in_the_Willows_%281913%29.djvu.jpg [00:58:18] note the cert [00:58:19] greg-g, ^ [00:59:12] 10Beta-Cluster-Infrastructure, 10Flow, 03Collab-Team-Q1-July-Sep-2016, 13Patch-For-Review, 07Performance: Beta Cluster Special:Contributions lags by a long time and notes slow Flow queries - https://phabricator.wikimedia.org/T78671#2513846 (10Mattflaschen-WMF) [01:07:43] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [01:21:28] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [01:29:15] RECOVERY - SSH on deployment-elastic06 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [01:35:07] PROBLEM - SSH on deployment-elastic06 is CRITICAL: Server answer [01:47:58] 10Deployment-Systems, 06Collaboration-Team-Triage, 10Flow, 07I18n: Message Mediawiki:Flow-terms-of-use-edit on nowp is English - https://phabricator.wikimedia.org/T133571#2514026 (10Mattflaschen-WMF) [01:55:16] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 569 bytes in 0.002 second response time [01:57:28] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [01:58:44] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:59:08] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 567 bytes in 0.003 second response time [02:00:08] RECOVERY - SSH on deployment-elastic06 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [02:06:09] PROBLEM - SSH on deployment-elastic06 is CRITICAL: Server answer [02:12:52] RECOVERY - SSH on deployment-elastic06 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [02:13:03] Project selenium-QuickSurveys » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #103: 04FAILURE in 2.7 sec: https://integration.wikimedia.org/ci/job/selenium-QuickSurveys/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/103/ [02:13:43] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [02:30:20] I'll deal with the other domains of the text part of that tomorrow [02:30:29] certainly the ones affecting monitoring [02:32:31] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [02:41:08] Project selenium-CirrusSearch » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #103: 04FAILURE in 7.6 sec: https://integration.wikimedia.org/ci/job/selenium-CirrusSearch/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/103/ [02:44:59] hm... I can fiddle with the https-forcing to leave those domains for now I guess [02:49:12] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 44821 bytes in 1.919 second response time [02:50:17] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 32491 bytes in 0.983 second response time [03:05:07] 10Beta-Cluster-Infrastructure, 06Labs, 10Labs-Infrastructure, 06Operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2514079 (10Krenair) a:03Krenair [03:06:04] 10Beta-Cluster-Infrastructure, 06Labs, 10Labs-Infrastructure, 06Operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#527800 (10Krenair) This is now working for meta.wikimedia.beta.wmflabs.org and deployment.wikimedia.beta.wmflabs.org (and their... [03:43:28] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:46:53] PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:58:16] PROBLEM - Puppet run on phab-beta is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [07:26:51] RECOVERY - Puppet run on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [08:05:10] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 567 bytes in 0.003 second response time [08:06:18] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 301 TLS Redirect - string 'Wikipedia' not found on 'http://en.m.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 569 bytes in 0.004 second response time [08:09:47] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [08:21:29] 10Deployment-Systems, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Analytics-Cluster, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2514384 (10MoritzMuehlenhoff) @elukey: I've dropped Yuvi's expired key from pwstore, so new entries can be added now. [08:23:32] PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:24:26] PROBLEM - Puppet run on deployment-parsoid07 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [08:58:31] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [08:59:23] RECOVERY - Puppet run on deployment-parsoid07 is OK: OK: Less than 1.00% above the threshold [0.0] [09:03:29] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:14:43] RECOVERY - Puppet run on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:44:30] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [09:53:43] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [09:57:49] RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [09:59:27] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:34] RECOVERY - Puppet run on deployment-mediawiki02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:22:11] 10Beta-Cluster-Infrastructure, 10ContentTranslation-Deployments, 10Parsoid, 06Services, and 2 others: Migrate BetaCluster Node.JS services to Jessie and Node 4.3 - https://phabricator.wikimedia.org/T125003#2514761 (10mobrovac) [10:41:51] 10Deployment-Systems, 03Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2514822 (10fgiunchedi) 05Open>03Resolved this is completed! [11:20:23] 07Browser-Tests, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Regression, 07Unplanned-Sprint-Work: [Regression] Fix browser tests for language switching on the beta cluster - https://phabricator.wikimedia.org/T141647#2514959 (10phuedx) @dr0ptp4kt: [selenium-MobileFrontend](https://... [11:35:07] PROBLEM - SSH on deployment-elastic06 is CRITICAL: Server answer [11:50:29] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:58:07] (03PS2) 10Hashar: debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) [11:58:16] (03CR) 10jenkins-bot: [V: 04-1] debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [11:58:27] (03PS3) 10Hashar: debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) [11:58:36] (03CR) 10jenkins-bot: [V: 04-1] debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [11:59:24] (03PS4) 10Paladox: debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [11:59:33] (03CR) 10jenkins-bot: [V: 04-1] debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [11:59:59] (03PS5) 10Hashar: debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) [12:00:08] (03CR) 10jenkins-bot: [V: 04-1] debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [12:01:16] Project selenium-RelatedArticles » chrome,beta-mobile,Linux,contintLabsSlave && UbuntuTrusty build #99: 04FAILURE in 14 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-mobile,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/99/ [12:01:20] Project selenium-RelatedArticles » chrome,beta-desktop,Linux,contintLabsSlave && UbuntuTrusty build #99: 04FAILURE in 19 sec: https://integration.wikimedia.org/ci/job/selenium-RelatedArticles/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta-desktop,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/99/ [12:03:23] (03PS6) 10Paladox: debian-glue: fix zuul-cloner detached branch [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [12:03:57] (03CR) 10Paladox: "Rebased." [integration/config] - 10https://gerrit.wikimedia.org/r/301763 (https://phabricator.wikimedia.org/T141607) (owner: 10Hashar) [12:25:29] 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2515119 (10Aklapper) I appreciate this as I considered this my work so far. :P (I tried more like every other week, plus only pinging / naggin... [13:10:06] RECOVERY - SSH on deployment-elastic06 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [13:16:07] PROBLEM - SSH on deployment-elastic06 is CRITICAL: Server answer [13:44:25] Project selenium-VisualEditor » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #99: 04FAILURE in 24 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/99/ [13:46:08] RECOVERY - SSH on deployment-elastic06 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [13:47:22] ^ I'm having a look at deployment-elastic06 ... strange... [14:02:54] !log deployment-prep rebooting deployment-elastic06 (unresponsive to SSH and Salt) [14:09:20] PROBLEM - Puppet staleness on deployment-elastic06 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [14:19:19] RECOVERY - Puppet staleness on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [3600.0] [14:34:58] ostriches hi, would it be correct do this https://gerrit.wikimedia.org/r/#/c/302416/2 in the grrrit-wm bot [14:35:06] morning [14:35:14] possible bug in scap3 on deployment-tin [14:35:31] File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 608, in checks_setup [14:35:31] if not os.path.exists(checks_path): [14:35:31] File "/usr/lib/python2.7/genericpath.py", line 18, in exists [14:35:31] os.stat(path) [14:35:32] TypeError: coercing to Unicode: need string or buffer, NoneType found [14:35:32] 14:34:55 deploy failed: coercing to Unicode: need string or buffer, NoneType found [14:35:32] 10Deployment-Systems, 03Scap3 (Scap3-Adoption-Phase1), 10scap, 10Analytics-Cluster, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2515412 (10elukey) Created the keys in the private repo and encrypted them with the pass stored in pwstore under analytics-deplo... [14:35:39] Since it seems message.uploader.name carnt be used in the gerrit edit in browser [14:35:44] ? [14:46:56] ottomata: which repo are you seeing that on? [14:47:27] thcipriani: i saw it on both eventlogging/eventbus and eventlogging/analytics [14:47:32] brb... [14:47:35] ack, checking [14:49:18] * paladox brb restarting my pc due to high disk usage slowing my pc down, plus a windows 10 update to install. Bash gets released to all windows 10 users today :) 400+ million users will have access to bash in a few hours once microsoft starts rolling out the windows 10 update :) [14:54:34] Im back [14:54:44] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:55:12] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:58:43] ostriches i changed message.uploader.name to message.author.name here https://gerrit.wikimedia.org/r/#/c/302416/ would that work? [14:58:55] I don't know :) [14:59:54] Oh ok [15:00:13] ostriches but i did a test today and found that uploading through ssh or http worked [15:00:22] but doing it through the web ui and editing files didnt [15:00:30] Probably not, actually. [15:00:36] Oh [15:00:42] Eh, author.name might work. [15:00:58] Oh yeh [15:01:03] patchSet.*.uploader.name might also work [15:01:05] since that is what patch comment uses [15:01:08] Where * is the latest patch set. [15:01:08] Oh [15:01:10] :) [15:01:20] message.patchSet.*.uploader.name [15:01:27] something like ^^ ostriches [15:01:47] Well, you'd need to iterate over the patchSet array items and figure out which the latest is. [15:01:54] * could be 1, or it could be 45 :) [15:02:08] Oh but woulden message.author.name that do it [15:02:08] (back) [15:02:19] Oh wait.... [15:02:25] It's always just the latest in stream-events [15:02:29] Yeh [15:03:00] It's just probaly the web ui dosent set off message.uploader.name so that just uses the authers name [15:03:13] patchSet.uploader.name. [15:03:18] Yeh [15:03:24] That is what openstack uses too [15:03:25] Or patchSet.author.nae [15:03:27] *name [15:03:30] Oh [15:03:33] Either way, we want the patchSet one [15:03:37] Ok [15:03:41] I will do that now [15:03:41] Not the top-level author :) [15:03:47] thanks [15:04:12] I've never once looked at this code, so I'm just guessing really ;-) [15:04:34] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 44501 bytes in 1.038 second response time [15:04:38] ottomata: I see the problem I have a patch https://phabricator.wikimedia.org/D302, this patch is also live on deployment-tin if you'd like to try it. [15:04:38] thcipriani: one possibly unique fact, I am trying to deploy an in review change from gerrit [15:04:43] i fetched and checked out the change [15:04:49] oh ok [15:04:50] trying [15:04:51] ostriches ok, yeh but either one of those could fix it, were not really sure [15:04:57] since it works [15:04:59] for ssh and http [15:05:02] but not web ui [15:05:03] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 44499 bytes in 0.973 second response time [15:05:11] Author is probably better than uploader. [15:05:24] Oh ok [15:05:27] i will change it now [15:05:30] thanks [15:05:51] Although, I wonder if *both* would be best :) [15:06:05] thcipriani: looks good! [15:06:18] ottomata: thank you so much for that catch! [15:06:31] Prolly harder though :) [15:06:59] So you'd get something like.... [15:07:07] (CR) Ottomata: [C: 2] Hieraize eventlogging_kafka_handler to allow selection of different kafka clients [puppet] - https://gerrit.wikimedia.org/r/301126 (owner: Ottomata, author: Chad) [15:07:11] thanks for the fast fix! :) [15:07:14] (Sorry for the ping ottomata, just needed an example) [15:07:59] Oh [15:08:00] ottomata: deployment-tin has the latest scap package that is pretty close to going out live! Close call. [15:08:38] feature list: https://www.mediawiki.org/wiki/Deployment_tooling/Cabal/2016-08-01#Scap_v.3.2.1_tagged_.28live_on_beta.29 [15:08:41] ostriches so doing message.patchSet.uploader.name would do what you done in the example [15:08:50] and put both owner and author [15:09:01] Yeah :) [15:09:05] Ok [15:09:09] Thanks for explaning [15:09:19] Ive changed it to message.patchSet.author.name now [15:09:23] Then it's just super clear to everyone who all is involved :) [15:09:32] Yep [15:09:57] Now we need to find one of the authors to deploy it [15:10:00] so we can test [15:10:08] weather we fix it or broke more things lol [15:11:40] ostriches do you have to build the war for gerrit 2.12.3 or can you use the one they bult? [15:11:44] bult = built [15:12:13] I build them myself. I also have to build the its-* plugins so it's no big deal. [15:12:33] Oh [15:12:38] I tryed building it [15:12:40] but failed [15:12:49] LOL [15:13:24] ostriches do you know when they will be built ? [15:14:38] Soon :) [15:15:55] Ok, soon as in today, a week or next week. [15:15:56] ? [15:16:06] ostriches ^^ [15:17:54] :) [15:19:14] I now get error [15:19:15] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 on node gerrit-test3.git.eqiad.wmflabs [15:19:15] Warning: Not using cache on failed catalog [15:19:15] Error: Could not retrieve catalog; skipping run [15:19:28] when running sudo puppet agent -tv [15:19:31] ostriches ^^ [15:33:08] Project selenium-MobileFrontend » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #100: 04FAILURE in 11 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/100/ [15:35:07] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:38:27] Project selenium-MobileFrontend » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #100: 04FAILURE in 16 min: https://integration.wikimedia.org/ci/job/selenium-MobileFrontend/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/100/ [15:43:17] ostriches: hey, we already deploy from gerrit. We have a ores submodule which is mirrored from github but the real repo is https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/ores/deploy [15:44:13] Apologies if I misunderstood then. Github deploys scare meh :p [15:44:18] 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Identify "first responders" for "all" "components" deployed on Wikimedia servers - https://phabricator.wikimedia.org/T141066#2515594 (10greg) >>! In T141066#2513427, @greg wrote: >>>! In T141066#2496346, @faidon wrote: >> "First responders" or "f... [15:44:26] updating submodule requires a commit which means it needs to be done in gerrit [15:44:39] https://gerrit.wikimedia.org/r/#/q/project:mediawiki/services/ores/deploy [15:44:40] You can do it from gerrit gui [15:44:42] :) [15:44:42] PROBLEM - Puppet run on deployment-mathoid is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:44:58] Krenair: !!!!!!!!!!!!!!!!!! [15:45:06] Krenair: re the LE cert on beta [15:45:14] Hi greg-g [15:45:14] Amir1: Ah, so it's a submodule pointing at Github but the submodule is Gerrit-maintained? Eh, better. But still not best :) [15:45:42] PROBLEM - Puppet run on deployment-cache-upload04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [15:45:43] yeah [15:47:02] (I'm not huge of this current complex gerrit-diffusion-github-based system too) [15:48:50] greg-g: One of the things Aaron and I came to agreement is that we will do deployments in the "Services" window only from now on. Is it okay. Or we should get another window? [15:49:18] Amir1: that is correct, which is what you did yesterday, yes? [15:49:56] that wasn't intentional [15:50:05] but it was at the same time [15:50:23] Amir1: what window were you using? [15:50:28] I don't see it on the calendar [15:50:38] did you just pick a random time? that's a very bad idea [15:51:07] greg-g: not randomly, when there was no other windows [15:51:13] Amir1: that's not how it works [15:51:19] yeah and I know it [15:51:30] that's why I suggested to get a window [15:51:59] well, now I'm even less happy about what happened yesterday [15:52:02] to be fully honest [15:52:45] PROBLEM - Puppet run on deployment-sca03 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:52:53] I'm not going to be responsive on IRC for a bit (a meeting) but, no more picking a window yourself, schedule *with me* or use the services window (in coordination with them) [15:52:57] PROBLEM - Puppet run on deployment-eventlogging03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:53:05] PROBLEM - Puppet run on deployment-zotero01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:53:49] PROBLEM - Puppet run on deployment-changeprop is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:55:02] greg-g: let's talk once you're done. Please ping me you have some free time [15:56:09] or continue the thread on the ops list and/or the talk page [15:57:02] but I'll try to ping when I can [15:57:07] thanks [15:59:15] hm, puppet repo on tin seems a little borked [15:59:19] trying to run [15:59:25] /usr/local/bin/git-sync-upstream [15:59:28] Yeh [15:59:30] error: could not apply 2218144... puppetization for thumbor [15:59:38] See -operations [16:00:35] hm paladox not sure i follwed in my quick skim there, but that doesn't seem related [16:00:43] sounds like they are talking about a labs wide hiera lookup problem? [16:00:49] Oh yeh sorry [16:00:50] i'm just having problems doing the rebase [16:00:54] on deployment-puppetmaster [16:00:58] Oh [16:01:01] (oh sorry, not tin, deployment-puppetmaster [16:01:01] ottomata: hmm, looks like that patch merged, but may also be cherry-picked? [16:01:01] ) [16:01:09] https://gerrit.wikimedia.org/r/#/c/300827/ [16:01:10] i think someone merged patches related to your error [16:01:19] thcipriani: should I remove the thing from thumbor? [16:01:24] the commit from the rebase list? [16:02:32] yeah, I'd remove that cherry-pick from the rebase list [16:02:37] should work then, I reckon. [16:02:45] (unless there are other conflicts) [16:03:44] k [16:04:23] it's probably an old version of a now-merged commit, highly likely to conflict [16:05:08] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [16:09:44] ok, y'all, I'm going to break CI in about 30 seconds. [16:12:06] andrewbogott: ack. [16:15:01] oh yeah sorry about thumbor ottomata thcipriani, I merged the change in production so that likely broke it [16:15:26] aye, i am in standup but haven't been able to fix it yet [16:15:34] godog: got a sec to see if you can make the rebase happy? [16:15:44] even if i remove that commit from the rebase list, i get Could not apply 2218144cb26430d971f4b546889477588a583fe2... puppetization for thumbor [16:15:50] error: could not apply 2218144... puppetization for thumbor [16:16:01] okay, everyone stop for a moment. [16:16:55] ok! [16:17:08] I'm looking [16:19:25] godog, ottomata: Ok [16:19:31] It should be fixed now [16:19:48] I removed these two: [16:19:52] 9d13910 puppetization for thumbor [16:19:54] 2218144 puppetization for thumbor [16:20:09] using `git rebase -i HEAD~20` [16:20:16] then ran `git pull --rebase origin production` [16:20:35] root@deployment-puppetmaster:/var/lib/git/operations/puppet# git log --oneline | grep "puppetization for thumbor" [16:20:35] e959e6a puppetization for thumbor [16:20:35] root@deployment-puppetmaster:/var/lib/git/operations/puppet# [16:21:11] 10Beta-Cluster-Infrastructure: Admin permissions for cawiki at beta-cluster - https://phabricator.wikimedia.org/T141890#2515732 (10Toniher) [16:21:40] thanks Krenair ! [16:21:55] godog, is tht the only "puppetization for thumbor" commit you expect? [16:21:58] 10Beta-Cluster-Infrastructure: Admin permissions for cawiki at beta-cluster - https://phabricator.wikimedia.org/T141890#2515744 (10Toniher) [16:22:12] Krenair: the only I cherry-picked yeah iirc [16:22:28] okay then I think we're good [16:22:50] hi [16:23:16] hi sachins301 [16:23:30] I'm sure this has been discussed forever, I wonder if there's a way to make git-sync-upstream DTRT in cases like this, where the change has been merged but another PS was previously cherry-picked [16:24:01] maybe using Change-Ids [16:24:12] it has "# TODO: rewrite in python?" near the top [16:24:13] git-cherry is for this iirc [16:24:17] may want to do that first [16:24:37] https://git-scm.com/docs/git-cherry [16:26:19] git pull --rebase :) [16:26:30] ostriches the deb is being built [16:26:33] and uploaded [16:26:36] I know. [16:26:39] thcipriani: nice! though I'm not sure it'd work if the diffs are different like in different PSes [16:26:39] Ok [16:27:09] ostriches do you know if gerrit 2.12.3 will be built today? [16:27:16] No, I don't. [16:27:19] ok [16:27:21] I said soon. [16:27:25] ok [16:27:45] RECOVERY - Puppet run on deployment-sca03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:28:05] RECOVERY - Puppet run on deployment-zotero01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:28:47] RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [16:29:45] PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:30:11] PROBLEM - Puppet run on integration-slave-trusty-1023 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:30:17] PROBLEM - Puppet run on integration-puppetmaster is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [16:31:40] hrm, yeah, different diffs would be problematic... Krenair is probably right about Change-Ids. Something like doing a --theirs in cases where there are conflicts with the same change-id. Although a switch to differential would break that. [16:32:21] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:32:33] PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [16:32:57] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [16:34:32] should probably kill shinken during the labs upgrade window? [16:36:48] PROBLEM - Puppet run on deployment-poolcounter01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:37:24] PROBLEM - Puppet run on integration-saltmaster is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:38:42] these are all "Failed to determined $::labsproject" ? [16:38:51] yeh [16:38:55] im getting that too [16:40:08] PROBLEM - Puppet run on integration-slave-trusty-1018 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:40:24] 10Beta-Cluster-Infrastructure, 15User-Luke081515: Admin permissions for cawiki at beta-cluster - https://phabricator.wikimedia.org/T141890#2515777 (10Luke081515) a:03Luke081515 [16:41:13] PROBLEM - Puppet run on integration-jessie-lego-test01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [16:42:09] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [16:42:49] PROBLEM - Puppet run on integration-slave-trusty-1014 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:44:01] PROBLEM - Puppet run on integration-slave-jessie-1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:44:05] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 06Revision-Scoring-As-A-Service, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2515795 (10greg) [16:45:05] PROBLEM - Puppet run on deployment-db1 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:46:10] PROBLEM - Puppet run on integration-aptly01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:46:36] PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [16:46:41] 06Release-Engineering-Team, 06Operations, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2515799 (10greg) >>! In T141130#2515119, @Aklapper wrote: > I appreciate this as I considered this my work so far. :P (I tried more like every... [16:47:52] PROBLEM - Puppet run on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:48:35] PROBLEM - Puppet run on integration-raita is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:49:36] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 06Revision-Scoring-As-A-Service, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2515807 (10Ladsgroup) There were two bugs in the yesterday deployment. We couldn't... [16:50:31] I just muted shinken-wm, will unmute after the labs stuff is done [16:53:12] ah there were two! [16:53:13] ok [16:53:21] after the labs stuff is done I might be able to get it to show a useful hostname instead of that labs IP [16:53:38] :) [16:53:52] I think nodepool may also need to be restarted after labs upgrade [16:54:04] not sure if it will resolve its self or is stuck. [16:54:43] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 06Revision-Scoring-As-A-Service, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2515821 (10Halfak) @greg, I'm not sure what you're looking for here. We obviously... [16:58:03] ostriches: it seems this https://phabricator.wikimedia.org/T141887 has popped up [16:58:31] Meh [16:59:12] ostriches yep, not sure why it has problems? do you see any errors in the error log even though you may get lost in there. [17:02:44] ostriches the its-base plugin builds against the gerrit 2.11.5 api. [17:02:49] it should be 2.12.* [17:02:55] https://gerrit.googlesource.com/plugins/its-base/ [17:04:23] 10Beta-Cluster-Infrastructure, 15User-Luke081515: Admin permissions for cawiki at beta-cluster - https://phabricator.wikimedia.org/T141890#2515840 (10Luke081515) 05Open>03Resolved Done: (change visibility) 16:45, 2 August 2016 Luke081515 (talk | contribs | block) changed group membership for Tonihe... [17:07:31] 07Browser-Tests, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Regression, 07Unplanned-Sprint-Work: [Regression] Fix browser tests for language switching on the beta cluster - https://phabricator.wikimedia.org/T141647#2515870 (10dr0ptp4kt) @phuedx, @jhobs, @bmansurov for your engine... [17:14:16] * paladox yay windows 10 update has begun rolling out https://blogs.windows.com/windowsexperience/2016/08/02/how-to-get-the-windows-10-anniversary-update/ [17:14:23] * paladox bash now aviable worlwide [17:19:49] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 06Revision-Scoring-As-A-Service, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2515946 (10greg) >>! In T141825#2515821, @Halfak wrote: > If you are asking how we... [17:25:40] ostriches hi, i still get [17:25:40] root@gerrit-test3:/home/paladox# systemctl status gerrit.service [17:25:40] ● gerrit.service - LSB: Start/stop Gerrit Code Review [17:25:40] Loaded: loaded (/etc/init.d/gerrit) [17:25:40] Active: failed (Result: exit-code) since Tue 2016-08-02 17:25:02 UTC; 23s ago [17:25:40] Process: 27252 ExecStart=/etc/init.d/gerrit start (code=exited, status=1/FAILURE) [17:25:42] Aug 02 17:25:02 gerrit-test3 gerrit[27252]: ** ERROR: GERRIT_SITE not set [17:25:44] Aug 02 17:25:02 gerrit-test3 systemd[1]: gerrit.service: control process exited, code=exited status=1 [17:25:47] Aug 02 17:25:02 gerrit-test3 systemd[1]: Failed to start LSB: Start/stop Gerrit Code Review. [17:25:49] Aug 02 17:25:02 gerrit-test3 systemd[1]: Unit gerrit.service entered failed state. [17:25:51] even after updating gerrit [17:29:02] 07Browser-Tests, 06Reading-Web-Backlog, 03Reading-Web-Sprint-78-Terminal-Velocity, 07Regression, 07Unplanned-Sprint-Work: [Regression] Fix browser tests for language switching on the beta cluster - https://phabricator.wikimedia.org/T141647#2515984 (10jhobs) a:05dr0ptp4kt>03jhobs A new build is runnin... [17:29:07] 06Release-Engineering-Team, 06Operations, 06Services, 07Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#2515987 (10greg) [17:30:00] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ORES, 10ORES, 06Revision-Scoring-As-A-Service, 07Wikimedia-Incident: Config beta ORES extension to use the beta ORES service - https://phabricator.wikimedia.org/T141825#2516001 (10greg) >>! In T141825#2515946, @greg wrote: > I think a separate task ab... [17:34:50] ostriches ? [17:39:27] ostriches im not sure why it is not setting GERRIT_SITE [17:39:28] ? [18:15:43] greg-g: do have some free time to talk about the deployment window? [18:17:33] Amir1: sure, sorry, was catching up with other things [18:18:28] greg-g: okay, it's either we should get a dedicated deployment window every day or we piggy back in services [18:18:54] I need to mention that ORES can't be deployed by services team [18:20:15] depends on you and Ops obliviously [18:22:01] why every day? [18:23:35] we don't deploy every day [18:23:51] but maybe we need to do a urgent fix [18:23:59] *an [18:24:08] urgent fixes are urgent fixes and don't need a window [18:24:20] you just need to coordinate with me/ops when you do need to [18:24:39] hmm, okay [18:24:39] and if it's really urgent and you discovered it 23 hours before your next window, would you wait? [18:24:46] the answer should be no [18:25:00] yes, you're right [18:25:24] so, anyways, I think for now ORES sould be a part of the Services window (which is every day, just different times) [18:25:25] we are learning and trying to adapt a production-like deployment system [18:25:46] yes, thanks for learning :) it's all we all can do [18:26:21] okay. Sounds good. Just one thing. What should we do as coordination with the services team? [18:27:08] like, asking if they are doing anything at that time. [18:27:42] good question. When the window starts jouncebot will ping you (and everyone else int eh window), those who are planning to deploy something will then say "I plan to do X" or "no X deploy today". Based on that information you all self-organize and do the needfull. [18:28:17] honestly, I haven't taken much active management of it (maybe a mistake) as initially it didn't need it and my active management wasn't desired (back when it was just parsoid) [18:29:07] okay. So probably I need to add my name and Aaron to that section [18:29:18] I hope that's okay for you. [18:29:19] exactly, that's how you join the window :) [18:29:26] https://wikitech.wikimedia.org/wiki/Deployments [18:29:59] Is there any place we should add? documentations somewhere? [18:30:52] all of the services windows [18:31:05] I copy/paste the last week for the next week so it'll stay in there [18:31:47] sure [18:42:44] Amir1: (just to be clear, are you adding your names to that wiki page or am I? I'd appreciate you :) ) [18:42:58] greg-g: editing right now [18:43:01] it's super big [18:44:08] https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=815996&oldid=815934 [18:44:12] greg-g: ^ [18:46:39] yeah, I added a lot of future weeks to get the new european swat window listed [18:47:11] oh, great [18:47:20] nice, it'll be in 20 days [18:47:26] yup :) [18:50:26] CI should be back running tests again. Let me know if you discover otherwise. [18:51:08] will do, looks like some tests are starting to trickle through now [18:52:05] I just saw a ci-jessie-wikimedia-blah spawn on the jenkins ui [18:52:12] (a nodepool instance, that is) [18:52:20] and a bunch more now [18:52:22] \o/ [18:52:49] :) [19:12:03] ostriches yes could you fix it in the package please [19:12:04] ? [19:13:42] ostriches could you make Letsencrypt optional on labs please [19:13:47] since im hitting a error with it [19:14:02] root@gerrit-test3:/var/lib/gerrit2/review_site/logs# journalctl -xn [19:14:02] -- Logs begin at Mon 2016-08-01 18:08:54 UTC, end at Tue 2016-08-02 18:49:43 UTC. -- [19:14:02] Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Apache/Service[apache2]) Dependency Service[gerrit] has failures: true [19:14:02] Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Apache/Service[apache2]) Skipping because of failed dependencies [19:14:02] Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Gerrit::Proxy/Letsencrypt::Cert::Integrated[gerrit]/Exec[acme-setup-acme-gerrit]) Dependency Service[gerrit] has failures: true [19:14:05] Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: (/Stage[main]/Gerrit::Proxy/Letsencrypt::Cert::Integrated[gerrit]/Exec[acme-setup-acme-gerrit]) Skipping because of failed dependencies [19:14:08] Aug 02 18:49:25 gerrit-test3 puppet-agent[1386]: Finished catalog run in 11.10 seconds [19:14:10] Aug 02 18:49:26 gerrit-test3 sudo[1385]: pam_unix(sudo:session): session closed for user root [19:14:12] Aug 02 18:49:42 gerrit-test3 sudo[2153]: diamond : TTY=unknown ; PWD=/ ; USER=puppet ; COMMAND=list /bin/cat /var/lib/puppet/state/last_run_summary.yaml [19:14:15] Aug 02 18:49:43 gerrit-test3 sudo[2154]: diamond : TTY=unknown ; PWD=/ ; USER=puppet ; COMMAND=/bin/cat /var/lib/puppet/state/last_run_summary.yaml [19:14:18] Aug 02 18:49:43 gerrit-test3 sudo[2154]: pam_unix(sudo:session): session opened for user puppet by (uid=0) [19:14:21] Aug 02 18:49:43 gerrit-test3 sudo[2154]: pam_unix(sudo:session): session closed for user puppe [19:14:22] paladox: pastebin [19:14:26] oh sorry [19:14:58] https://phabricator.wikimedia.org/T141803#2516448 [19:15:02] ostriches ^^ [19:16:57] 10Deployment-Systems, 03Scap3: More atomic directory operations - https://phabricator.wikimedia.org/T141913#2516467 (10Dereckson) [19:20:03] 10Deployment-Systems, 03Scap3: More atomic directory operations - https://phabricator.wikimedia.org/T141913#2516495 (10Dereckson) [19:21:25] paladox: gotta find out why exactly acme-setup fails [19:21:36] there was a change yesterday [19:21:37] oh [19:21:53] like when Krenair was using it to get certs for beta [19:22:10] he pasted an error that was because LE changed something August 1 [19:22:13] mutante but it seems to fail because of gerrit failing [19:22:54] oh, that is right [19:23:13] well, one by one. [19:23:19] there is this now https://gerrit.wikimedia.org/r/#/c/302491 [19:23:32] Oh [19:23:34] that got us to the next step which you have now [19:23:44] Yep [19:24:05] there must be more in some logs [19:24:16] Oh, im not sure it dosent seem to show any more [19:24:20] maybe apache2 [19:24:35] try running the init script manually [19:24:38] No nothing in apt [19:24:42] and how do i do that [19:24:55] which system is it failing on? [19:25:09] how about /etc/init.d/gerrit start [19:25:10] root@gerrit-test3:/var/lib/gerrit2/review_site/logs# /etc/init.d/gerrit start [19:25:10] Starting Gerrit Code Review: FAILED [19:25:10] root@gerrit-test3:/var/lib/gerrit2/review_site/logs# [19:25:24] Krenair: a new instance where we use strictly the prod class [19:25:29] which? [19:25:43] gerrit-test3 [19:25:47] in the project called git [19:25:48] is this in labs? [19:25:50] git, ok [19:25:51] yes [19:25:57] Oh [19:26:20] mutante https://phabricator.wikimedia.org/T141803#2516511 [19:26:40] I'm not in that project [19:27:51] oh, i gueess. it's because [19:27:53] RUN_ARGS=' -Xmx28g [19:28:03] and we dont have that much memory there [19:28:30] well, it's trying to start it, and then it kills it again [19:28:30] ah ha [19:28:35] mutante there is no file [19:28:37] /var/lib/gerrit2/review_site/bin/gerrit.war [19:28:38] + start-stop-daemon -S -b -c gerrit2 -p /var/lib/gerrit2/review_site/logs/gerrit.pid -m -d /var/lib/gerrit2/review_site -a /usr/bin/perl -- -e '$x=$ENV{JAVA};exec $x @ARGV;die $!' [19:28:40] called that [19:28:42] see the "die" [19:28:49] yep but it calls [19:28:50] /var/lib/gerrit2/review_site/bin/gerrit.war [19:28:55] which dosent exist [19:28:56] paladox: wanna add him to project? [19:29:03] Yep [19:29:24] oh, well,, if there is no .war ...but .. you have the package installed [19:29:35] the new one [19:29:48] Yeh but you need the war [19:29:53] It runs the on the war [19:30:21] ok, for some reason i thought the package drops the .war in place [19:30:29] Me too [19:30:46] ostriches ^^ [19:31:02] mutante ok ive added Alex Monk to the project [19:31:05] aka Krenair [19:31:31] paladox: the "dpkg -L gerrit" we did earlier [19:31:35] It drops the war in place in /var/lib/gerrit2/ [19:31:36] see the .war in there? [19:31:41] Oh [19:31:46] Maybe we need to change it [19:31:47] You have to run it with init to get a fully provisioned gerrit install. [19:31:52] Puppet will never get it right [19:31:57] Oh [19:31:58] Because gerrit's init is braindead. [19:32:04] How do we do it through init [19:32:06] So you gotta run it a few times, whack it with a hammer. [19:32:14] Then puppet takes over [19:32:15] LOL [19:32:19] (it's *always* been like this) [19:32:22] Oh [19:32:24] (and it's annoying as fuckballs) [19:32:29] Im not sure how to start it [19:32:30] ? [19:32:37] /etc/init.d/gerrit [19:32:40] Just have to keep trying and debugging. [19:32:45] Ok [19:33:01] Maybe we can add a new check [19:33:03] to init [19:33:05] called [19:33:06] Okay I don't see any acme errors [19:33:06] install [19:33:13] Yay I made it :D [19:33:15] that does it for us [19:33:18] Oh [19:33:49] ostriches do i just keep doing /etc/init.d/gerrit start [19:33:56] alright, which part did you mean to run multiple times [19:33:57] Or how about run [19:34:12] Whee I win :D [19:34:20] lol [19:34:48] krenair@gerrit-test3:~$ sudo /etc/init.d/gerrit start [19:34:50] Starting Gerrit Code Review: FAILED [19:34:51] 2 things gotta happen to make this *better* [19:34:52] useless [19:34:54] where's the error logs? [19:34:56] ostriches: on eventbrite ?:) [19:35:17] Krenair i think /var/log/ or /var/lib/gerrit2/review_site/logs [19:35:20] Krenair: /var/lib/gerrit2/review_site/logs/ [19:35:23] but theres nothing in there [19:35:42] More useful though, run `/var/lib/gerrit2/review_site/bin/gerrit.sh run` [19:35:44] Oh brilliant, so it doesn't actually log errors to the error logs? [19:35:46] Which dumps it all to stderr :D [19:36:00] Anyway, I dunno what all the fuss is about :) [19:36:04] krenair@gerrit-test3:~$ sudo /var/lib/gerrit2/review_site/bin/gerrit.sh run [19:36:04] sudo: /var/lib/gerrit2/review_site/bin/gerrit.sh: command not found [19:36:31] /var/lib/gerrit2/review_site/bin is empty [19:36:33] Like I told paladox, running init is brain-dead and puppet has a hard time getting it right [19:36:44] Oh [19:36:54] yeh [19:37:00] java -jar /var/lib/gerrit2/gerrit.war -d /var/lib/gerrit2/review_site/ init --batch --no-auto-start [19:37:04] Then deal with any problems [19:37:05] I know of 2. [19:37:15] Oh maybe we should add that to gerrit [19:37:18] and call it [19:37:23] gerrit install [19:37:28] krenair@gerrit-test3:~$ sudo java -jar /var/lib/gerrit2/gerrit.war -d /var/lib/gerrit2/review_site/ init --batch --no-auto-start [19:37:28] fatal: unknown command -d [19:37:28] (no com.google.gerrit.pgm.-d) [19:37:35] Remove the -d [19:37:54] I always find that fails using the -d [19:38:01] where does it go without the -d? [19:38:16] It goes /var/lib/gerrit2/review_site/ [19:38:25] Oh wait [19:38:32] Not sure [19:38:33] now [19:38:40] I was looking at it wrong [19:39:35] Krenair: gerrit.war init -d [19:39:39] Order matters :\ [19:39:44] Yep lol [19:39:48] whatever you do, dont fix it manually on this specific instance.. just on the other ones :) [19:40:28] mutante: https://gerrit.wikimedia.org/r/#/c/302497/ helps [19:40:32] ostriches great idea we create a file that run sudo java -jar /var/lib/gerrit2/gerrit.war init -d /var/lib/gerrit2/review_site/ init --batch [19:40:56] oh [19:41:19] ostriches: 'k, thanks! [19:41:19] We already do that in puppet. [19:41:43] Yay lets try it [19:42:14] mutante ^^ [19:47:19] mutante could you merge https://gerrit.wikimedia.org/r/#/c/302497/ please [19:47:20] ? [19:47:23] so we can test [19:54:34] 06Release-Engineering-Team, 10LDAP-Access-Requests, 06Operations, 10Ops-Access-Requests, and 3 others: Determine a core set or a checklist of permissions for deployment purpose - https://phabricator.wikimedia.org/T140270#2458739 (10dpatrick) Because there are a number of Phabricator tickets in the Security... [19:57:14] mutante ? [20:04:32] brb [20:04:56] 10Deployment-Systems, 03Scap3: More atomic directory operations - https://phabricator.wikimedia.org/T141913#2516619 (10bd808) This will be one of the many benefits of implementing {T114313}. I think this task can honestly be merged into that one. [20:06:12] 06Release-Engineering-Team, 10LDAP-Access-Requests, 06Operations, 10Ops-Access-Requests, and 3 others: Determine a core set or a checklist of permissions for deployment purpose - https://phabricator.wikimedia.org/T140270#2458739 (10MaxSem) Cluster access means much greater degree of trust, and by running a... [20:06:45] 10Beta-Cluster-Infrastructure: Reenable $wgMWOAuthSecureTokenTransfer=true; on the beta cluster - https://phabricator.wikimedia.org/T67421#2516623 (10AlexMonk-WMF) @csteipp, @tgr: How do you reset all consumer secrets? Once we find that out we can unstall this and do it. [20:07:38] that change makes me wonder what happens if you run the reindex command on the existing prod install and if it runs on every single puppet run [20:07:44] Why is https://phabricator.wikimedia.org/T38891 sitting with noone currently handling it? Task number is in the 30ks and there's a trivial patch [20:13:02] hi ! we cannot deploy , we are getting issues accessing deployment-prep [20:15:14] 06Release-Engineering-Team: Cannot deploy using scap from deployment-tin - https://phabricator.wikimedia.org/T141920#2516655 (10Nuria) [20:16:19] 06Release-Engineering-Team: Cannot deploy using scap from deployment-tin - https://phabricator.wikimedia.org/T141920#2516672 (10Nuria) [20:16:36] 06Release-Engineering-Team: Cannot deploy using scap from deployment-tin - https://phabricator.wikimedia.org/T141920#2516655 (10Nuria) p:05Triage>03High [20:17:04] nuria_: hi! what user are you trying to deploy as? [20:17:14] thcipriani: myself [20:17:18] I recognize the error message and I *think* I see what's wrong. [20:17:53] thcipriani: aham.. [20:20:32] nuria_: I think this is a keyholder permissions thing [20:20:38] can you try: SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service deployment-aqs01.deployment-prep.eqiad.wmflabs [20:20:52] see if that lets you ssh or if it gives you an error [20:21:02] thcipriani: on tin? [20:21:19] yep, on deployment-tin in beta [20:21:38] thcipriani: no, that worked [20:22:27] awesome, then deploy should work for you now. You weren't in the deploy-service group, so the agent admitted failure message was just keyholder rejecting you trying to use the key. [20:22:35] I added you to that group [20:22:53] thcipriani: would you be so kind to add milimetric too? [20:22:53] 10Beta-Cluster-Infrastructure: Cannot deploy using scap from deployment-tin - https://phabricator.wikimedia.org/T141920#2516702 (10greg) [20:23:00] thcipriani: he is cc-ed on ticket [20:23:26] thcipriani: many thanks for the fast response [20:23:52] absolutely, I'll add milimetric, lemme know if it works for you now. [20:24:35] milimetric added as well. [20:25:12] thx :) [20:28:44] Im back [20:41:53] Project selenium-Echo » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #104: 04FAILURE in 52 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/104/ [20:41:55] Project selenium-Echo » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #104: 04FAILURE in 54 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/104/ [20:46:13] milimetric: not that I'm deployment stalking you...but I did see some errors in the aqs/deploy on deployment-tin :) ssh-host-key stuff, *should* be fixed now. Just had to accept the host keys for ssh in /etc/ssh/ssh_known_hosts or your user known_hosts file. I did the former, just now. [20:47:50] thx thcipriani, I appreciate it :) [20:48:51] I see, so if we had a new deployment-aqs2 or something, it would need the same treatment. Got it [20:49:23] thx for stalking^Wlooking out :) [20:49:49] :D [20:50:07] yeah, in prod this is taken care of in some fancy way. [20:50:15] in beta: less fancy :( [21:00:11] thcipriani: puppet resource collection which nobody has figured out how to setup on the self hosted puppetmasters [21:00:28] ostriches i now get error https://phabricator.wikimedia.org/T141803#2516775 [21:00:31] 10Beta-Cluster-Infrastructure: Cannot deploy using scap from deployment-tin - https://phabricator.wikimedia.org/T141920#2516777 (10greg) 05Open>03Resolved a:03thcipriani ```lang=irc 20:13 < nuria_> hi ! we cannot deploy , we are getting issues accessing deployment-prep 20:17 <+thciprian> nuria_: hi! w... [21:00:37] * bd808 stalks the stalkers [21:00:51] :) [21:00:52] paladox: You have to install mysql before setting up gerrit. [21:00:58] Oh [21:01:09] The role doesn't do that for you, it assumes you have a working mysql with a valid $db_host, $db_user and $db_pass set. [21:01:20] Oh [21:01:21] ah [21:02:51] 10Beta-Cluster-Infrastructure: Reenable $wgMWOAuthSecureTokenTransfer=true; on the beta cluster - https://phabricator.wikimedia.org/T67421#2516782 (10Tgr) Set `oarc_secret_key` to `MWCryptRand::generateHex( 32 )`, send notification to the user that they should go to `https://meta.wikimedia.org/wiki/Special:OAuth... [21:03:23] ostriches do i set password manually in the secure.config [21:03:24] file [21:03:25] ? [21:03:46] You set it via hiera. There's already one in labs/private.git, just need to tweak it further probably :) [21:03:57] Ok [21:04:37] ostriches oh, i thought doing that will expose the password [21:04:48] Well it does, but it's labs :p [21:05:00] oh lol [21:05:01] ok [21:05:06] what do i put in hiera [21:05:13] ostriches ^^ [21:05:36] You can put it on wiki in hiera: [21:05:43] Or in the appropriate place in the hiera/ directory [21:06:04] ostriches I mean what do i put in there please? configs names? [21:09:57] ostriches i doint know how to set the password in hiera? [21:28:26] !log disabling puppet on scap targets briefly to test scap_3.2.2-1_all.deb [21:32:01] !log re-enabling puppet on scap targets [21:32:20] huh, no logbot. [21:36:19] ostriches i carnt seem to set the db password [21:36:28] i get a random password that i didnt do [21:36:29] ? [21:37:06] paladox: is it already in labs/private repo? [21:37:16] I doint think so [21:37:20] it is in secure.config [21:37:22] and what is the problem if you already get a password [21:37:29] the puppet run finishes? [21:37:38] Oh, i thought you would want to choose your own password [21:37:47] i guess i will just use that password [21:38:22] paladox: if you remove it, and run puppet again, it re-adds it, right? [21:38:36] Oh let me try that [21:39:46] mutante nope [21:39:53] still installs a random password [21:39:58] im just going to use that [21:39:59] for now [21:40:13] i doubt it's actually random [21:40:38] the point was if puppet sets it back [21:40:43] to whatever [21:40:56] Yep [21:41:20] paladox: a fake password should be in labs/private [21:41:34] Oh [21:41:39] it is a fake password then [21:41:48] in general if you have secrets that are in private repo in prod [21:41:55] then you add a fake version of the same thing [21:41:56] in labs/private [21:42:00] so that puppet works [21:42:09] Oh [21:42:14] in prod private it's in modules/passwords [21:42:18] it works just need to create the user in the db now [21:42:23] which i also wrote on task [21:42:23] ok, yea [21:42:27] we have to do that [21:42:38] yep [21:43:05] 10Deployment-Systems, 03Scap3: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762#2516893 (10thcipriani) 05Resolved>03Open Thanks for the upload @fgiunchedi Bug was discovered on Beta this morning :( I fixed and bumped the minor version—new version on Beta seems to have fixed th... [21:44:30] paladox: i merged the thing with the symlink to /etc/default/gerritcodereview. so that manual step should be gone [21:44:36] Oh yay [21:44:38] thanks [21:44:44] i have to go for a little while. be back soon [21:45:21] mutante i got furthur [21:45:21] root@gerrit-test3:/var/lib/gerrit2# /usr/bin/java -jar gerrit.war reindex -d review_site --threads 4 [21:45:21] [2016-08-02 21:45:05,230] [main] INFO com.google.gerrit.server.cache.h2.H2CacheFactory : Enabling disk cache /var/lib/gerrit2/review_site/cache [21:45:21] [2016-08-02 21:45:05,885] [main] INFO com.google.gerrit.server.project.ProjectCacheWarmer : Loading project cache [21:45:22] fatal: fetch failure on changes [21:45:24] root@gerrit-test3:/var/lib/gerrit2# [21:48:02] ostriches ^^ [21:58:48] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #106: 04FAILURE in 6 min 47 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/106/ [22:21:11] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #103: 04FAILURE in 1 min 10 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/103/ [22:55:07] 10Beta-Cluster-Infrastructure: Reenable $wgMWOAuthSecureTokenTransfer=true; on the beta cluster - https://phabricator.wikimedia.org/T67421#2517366 (10AlexMonk-WMF) Let's enable secure token transfer, then if someone decides we really need to do that in beta, it can be done afterwards? [23:05:38] somebody can explain why https://gerrit.wikimedia.org/r/#/c/302424/ is not being merged despite having +2? [23:05:55] Try re doing c+1 [23:05:58] c+2 [23:06:04] Please [23:06:20] ok will do [23:06:20] ok will do [23:06:25] SMalyshev, maybe it was +2'd while CI was down or something? [23:06:50] I was looking [23:06:59] at there was no c+2 till now [23:07:07] at 00:06 my time [23:07:09] bst [23:07:14] hm, nope [23:07:43] I had +2ed it at 2:46... about 1:15 ago. But I've retried now [23:08:15] Oh [23:08:22] ah [23:08:22] doesn't seem to work... maybe it depends on some othe patch? I'm not sure I understand new gerrit interface [23:08:24] now i know [23:08:26] why [23:08:34] you carnt write a comment and do +2 [23:08:40] at the same time [23:08:43] uh [23:08:44] it is a known problem [23:08:48] that's unexpected [23:08:51] but why isn't it working now [23:09:12] Not sure [23:09:47] there it is [23:10:05] Yep, i did a recheck [23:10:18] looks like there's some dependency but I'm not sure where I look for it in new UI [23:10:25] Oh yeh [23:10:33] if it has dependacies that havent merged [23:10:35] it wont merge [23:10:44] until they are merged [23:10:47] so where I find the dependencies? [23:11:01] By looking at parent [23:11:15] parent has just commit ID [23:11:23] no link to a gerrit patch [23:11:44] Yeh [23:11:49] but it is linked to those [23:11:50] https://gerrit.wikimedia.org/r/#/c/300954/ [23:11:55] Thats one of them [23:12:12] and https://gerrit.wikimedia.org/r/#/c/300953/ [23:12:14] paladox: wait so "related changes" are actually parents? [23:12:27] Nope [23:12:29] parent is [23:12:31] ah no they are the same for all... Now I'm confused [23:12:38] so how I know which one is the parent? [23:12:42] yeh i think gerrit removed that support [23:12:45] making it harder [23:12:51] ught that's bad [23:12:53] But parent is the commits now [23:14:02] https://gerrit.wikimedia.org/r/#/c/302424/ requires https://gerrit.wikimedia.org/r/#/c/300954/ [23:14:18] which then that requires https://gerrit.wikimedia.org/r/#/c/300953/ [23:14:30] which then requires [23:14:31] https://gerrit.wikimedia.org/r/#/c/298112/ [23:14:54] SMalyshev ^^ [23:14:58] well, old gerrit told me that... I guess now I have to investigate [23:15:01] paladox: thanks [23:15:06] Your welcome [23:15:08] I'll submit a task about it [23:15:14] Ok [23:15:20] You can leave your +2 [23:15:30] which will then merge when ever all those patches [23:15:34] are merged [23:17:01] ok, thnaks, I filed T141947 [23:18:39] ok [23:18:40] thanks