[00:59:27] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [01:20:09] (03Abandoned) 10Legoktm: docker: Rename php images to php70 [integration/config] - 10https://gerrit.wikimedia.org/r/381377 (owner: 10Legoktm) [01:25:04] PROBLEM - Host integration-slave-jessie-1004 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [01:25:20] PROBLEM - Host integration-slave-jessie-1003 is DOWN: CRITICAL - Host Unreachable (10.68.17.164) [01:39:26] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [01:43:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Give legoktm access to push docker images for CI - https://phabricator.wikimedia.org/T177158#3648914 (10Legoktm) [01:43:53] paladox: what are private changes? [03:33:44] (03PS2) 10Legoktm: Use Debian stretch for php images [integration/config] - 10https://gerrit.wikimedia.org/r/381392 [03:40:32] (03PS4) 10Legoktm: Add experimental "composer-package-php70-docker" job [integration/config] - 10https://gerrit.wikimedia.org/r/381378 [03:40:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Give legoktm access to push docker images for CI - https://phabricator.wikimedia.org/T177158#3648973 (10thcipriani) what's your account name on dockerhub? I can add you to the wmfrleng is a docker hub team. [03:42:33] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Give legoktm access to push docker images for CI - https://phabricator.wikimedia.org/T177158#3648974 (10Legoktm) It's `legoktm` :-) [03:50:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team: Give legoktm access to push docker images for CI - https://phabricator.wikimedia.org/T177158#3648975 (10thcipriani) 05Open>03Resolved a:03thcipriani >>! In T177158#3648974, @Legoktm wrote: > It's `legoktm` :-) makes sense :) done! [04:46:51] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [04:56:50] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:06:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [06:57:19] PROBLEM - Puppet errors on deployment-kafka01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [08:01:39] legoktm they are the new drafts. they are more private so that only the reviewers and the owner of the change can see it. [08:02:03] the change can also be easily switched to private or back to open. [08:02:33] interesting [08:02:39] will jenkins-bot still be able to see them? [08:04:37] not by default [08:04:45] we can grant the bot to see them [08:04:46] though [08:05:22] legoktm the wikibugs bot will need some updates though, removal of drafts and to ignore wip changes. [08:07:57] well, whenever we get to it [08:08:13] anyways, time to sleep :) [08:09:35] ok :) [08:32:31] legoktm there is a a new ignore / mute button for changes. [08:32:52] though mute is being renamed to reviewed / unreviewed. [08:51:50] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [09:02:02] PROBLEM - Puppet errors on deployment-trending01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [11:37:57] PROBLEM - App Server Main HTTP Response on deployment-mediawiki07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 392 bytes in 0.004 second response time [15:04:37] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:34:04] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<100.00%) [15:39:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:13:06] (03PS1) 10EddieGP: Add Jayprakash12345 to CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/381639 [17:13:38] (03PS2) 10EddieGP: Add Jayprakash12345 to CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/381639 [17:18:18] (03CR) 10EddieGP: "Note that the user asked me via mail whether I could add them. As >10 patches are already merged, I agree that adding them would be sensef" [integration/config] - 10https://gerrit.wikimedia.org/r/381639 (owner: 10EddieGP) [18:24:33] 10Gerrit: Switch to mariadb java connector once we upgrade to gerrit 2.14 - https://phabricator.wikimedia.org/T176164#3649340 (10Paladox) [18:43:01] Project beta-code-update-eqiad build #175022: 04FAILURE in 0.86 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/175022/ [18:44:16] is jenkins out of space? [18:44:24] i get [18:44:25] org.apache.commons.jelly.JellyTagException: jar:file:/var/cache/jenkins/war/WEB-INF/lib/jenkins-core-2.46.2.jar!/hudson/model/Run/console.jelly:65:27: PermGen space [18:45:23] jenkins is down [18:47:13] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649345 (10Paladox) [18:47:33] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649358 (10Paladox) p:05Triage>03Unbreak! [18:53:37] hashar hi [18:53:38] ci seems to be down [18:57:02] i have a stack trace on https://phabricator.wikimedia.org/T177174#3649358 [18:58:51] paladox: yeah that is why I joined. Out of memory [18:59:07] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649345 (10Legoktm) Disk space looks OK to me: ``` legoktm@contint1001:~$ df -h Filesystem Size Used Avail Use% Mounted on udev... [18:59:38] paladox: I have restarted it [18:59:42] ah [18:59:43] thanks [18:59:54] it deadlocked last friday as well. Similar issue [19:03:39] Yippee, build fixed! [19:03:40] Project beta-code-update-eqiad build #175023: 09FIXED in 38 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/175023/ [19:05:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649378 (10Paladox) i think Caused by: java.lang.OutOfMemoryError: PermGen space could be the ram. [19:06:09] hashar legoktm it's the heap [19:06:17] ram [19:06:19] https://plumbr.eu/outofmemoryerror/java-heap-space [19:07:02] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649381 (10Zoranzoki21) [19:08:38] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649393 (10Paladox) p:05Unbreak!>03High Hashar restarted it and it came back on, he also said it happened last friday too. so maybe we should investigate why... [19:09:36] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649396 (10Zoranzoki21) Complete command line (I wanted to test, but I thinked to this test patch abandon when I upload) P6063 [19:14:06] https://wiki.jenkins.io/display/JENKINS/Builds+failing+with+OutOfMemoryErrors#BuildsfailingwithOutOfMemoryErrors-HeaporPermgen? [19:14:19] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649398 (10Paladox) See https://wiki.jenkins.io/display/JENKINS/Builds+failing+with+OutOfMemoryErrors#BuildsfailingwithOutOfMemoryErrors-HeaporPermgen? [19:26:14] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Operations: CI is down (jenkins) - https://phabricator.wikimedia.org/T177174#3649400 (10Paladox) https://jenkins.io/blog/2016/11/21/gc-tuning/ java 8 may improve things. [19:50:14] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649409 (10Zoranzoki21) I have not idea.. I need push rights (???) [19:58:46] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649414 (10Jayprakash12345) @Zoranzoki21 Try to follow https://www.mediawiki.org/wiki/Gerrit/Tutorial Steps. I have little Knowlege in git. But I follow MediaWiki Step. And get Good Results. [20:13:12] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649447 (10Zoranzoki21) >>! In T177176#3649414, @Jayprakash12345 wrote: > @Zoranzoki21 Try to follow https://www.mediawiki.org/wiki/Gerrit/Tutorial Steps. I have little Knowlege in git. But I follow MediaWiki Step. And... [20:27:48] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Backlog): Get rid of zend tests for wmf branches - https://phabricator.wikimedia.org/T94149#1156282 (10EddieGP) Should this be declined due to T176370? [20:48:32] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Backlog): Get rid of zend tests for wmf branches - https://phabricator.wikimedia.org/T94149#3649484 (10Paladox) Yeh, one should be filled for getting rid of the hhvm tests. [21:11:25] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649381 (10Legoktm) >>! In T177176#3649409, @Zoranzoki21 wrote: > I have not idea.. I need push rights (???) > > ``` > ~\Documents\GitHub\mediawiki-config [master ↑1]> git push > Warning: Permanently added '[gerrit.wi... [21:16:36] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649492 (10Aklapper) 05Open>03Invalid This is a [[ https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Feedback.2C_questions_and_support | support request ]] about `git-review`, hence closing this task... [21:21:21] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649495 (10Zoranzoki21) >>! In T177176#3649492, @Aklapper wrote: > This is a [[ https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Feedback.2C_questions_and_support | support request ]] about `git-review`,... [21:27:43] 10Gerrit: Error: Non UTF-8 code starting with \x90 - https://phabricator.wikimedia.org/T177176#3649496 (10Zoranzoki21) >>! In T177176#3649492, @Aklapper wrote: > This is a [[ https://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Feedback.2C_questions_and_support | support request ]] about `git-review`,... [21:35:18] (03PS7) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [21:35:22] (03PS8) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [21:37:56] (03CR) 10jerkins-bot: [V: 04-1] Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [21:40:14] (03CR) 10Umherirrender: "I still have no idea what flake8 is, and why "flake8: commands failed"" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [22:33:58] RECOVERY - Host integration-slave-jessie-1004 is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [22:34:04] PROBLEM - Puppet staleness on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [22:39:07] RECOVERY - Puppet staleness on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [3600.0] [22:40:41] PROBLEM - Puppet staleness on integration-slave-jessie-1004 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [43200.0] [22:45:40] RECOVERY - Puppet staleness on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [3600.0] [22:50:05] (03PS9) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [22:51:55] (03CR) 10jerkins-bot: [V: 04-1] Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [23:42:24] (03CR) 10Melos: "> I still have no idea what flake8 is, and why "flake8: commands" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender) [23:54:50] (03PS10) 10Umherirrender: Update test config [integration/config] - 10https://gerrit.wikimedia.org/r/380790 [23:55:34] (03CR) 10Umherirrender: "Could catch! I was looking at the end of the report, because there the failing is printed, but at the begin is also a print, with this err" [integration/config] - 10https://gerrit.wikimedia.org/r/380790 (owner: 10Umherirrender)