[07:03:53] goood morning :) [07:04:56] joal: later on whenever you have time we could upgrade node on AQS. I can see 0.10, 4.2.4 and 4.3.0 available to install [07:31:12] (brb commuting) [08:25:11] Analytics-Cluster: logrotate kafkaServer-gc.log on kafka brokers - https://phabricator.wikimedia.org/T118421#2177088 (elukey) Open>Resolved a:elukey ``` elukey@kafka1022:/var/log/kafka$ ls -lh total 1.3G -rw-r--r-- 1 kafka kafka 17M Apr 4 08:23 kafka.log -rw-r--r-- 1 kafka kafka 257M Apr 4 08:22... [08:35:26] Hi elukey [08:50:02] hellooooo [08:50:08] Hey :) [08:50:13] Had a good weekend ? [08:50:35] yep! and you?? [08:50:55] Yeah, beautiful weather (not always the case in britanny) [08:51:09] :) [08:51:40] if you remember Ireland, britanny is kinda the same :D [08:55:57] I do remember Ireland, very weird weather.. everything looks grey and anonymous but when the sun comes out a million colors and shades appear [08:56:15] elukey: you have it :) [09:06:16] elukey: You tell me when you want to upgrade node :) [09:06:35] joal: all right! [09:48:33] joal: I am ready if you are [09:48:38] I am ! [09:49:02] ok so tracking phab https://phabricator.wikimedia.org/T123629 [09:49:02] Currently testing why the uniques job fail, But I have time for you :) [09:49:44] elukey: How shall we proceed ? [09:50:18] ahhh okok so would you prefer to wait for the upgrade? [09:50:29] no sir, no biggy [09:50:50] the plan is written in the phab task.. basically one node at the time, de-pool, drain, stop everything, upgrade, etc.. [09:51:01] okey [09:51:13] all right starting with aqs1001 [09:51:27] elukey: can we start with 1003 ? [09:51:32] and do it in reverse order ? [09:52:33] mmmm already de-pooled aqs1001.. what would change? [09:53:55] !log de-pooled aqs1001.eqiad.wmnet as pre-step for nodejs upgrade [09:53:55] elukey: nevermind ! [09:53:56] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [09:54:02] all righhttt [09:57:13] joal: do you know where aqs logs are? Related to node [09:57:24] elukey: absolutely not [09:58:10] I found only cassadra stuff [09:58:12] mmmm [09:58:25] :( [10:01:35] so I found only https://logstash.wikimedia.org/#/dashboard/elasticsearch/analytics-cassandra that is nice, no mention of nodejs even in the config files [10:02:57] mobrovac: hi! Quick question for you: where can I find the nodejs logs (if any) on AQS hosts? [10:15:34] elukey: oing ok ? [10:15:41] cd [10:15:43] oops [10:16:02] joal: yep yep I am taking time to check metrics just in case :) [10:16:06] elukey: AQS seems to be working on aqs1001 [10:16:14] np, sounds good :) [10:16:24] I haven't stopped anything, just de-pooled it :) [10:16:35] Ah, right [10:40:48] joal: there are too many ? in my head [10:40:58] I am going to re-pool 1001 and study a bit [10:41:11] elukey: too many ? [10:41:12] maybe I'll ask to milimetric too [10:41:17] questions :) [10:41:51] hm, let me know if I can help [10:42:14] !log re-pooled aqs1001.eqiad (no node upgrade, need more info about restbase) [10:42:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [10:45:28] joal: really sorry but I am over-paranoid when I don't know a service/host [10:45:45] maybe it is something super trivial but I want to have everything cleared out [10:45:48] and documented [10:45:50] elukey: no problem for me, just let me know if there is anything on which I can help [10:45:57] sure! [10:46:12] elukey: you say it yourself: better safe than sorry :) [10:46:20] in other news, https://github.com/wikimedia/varnishkafka has been updated with all the last patches for Varnish 4 [10:46:23] \0/ [10:46:27] ottomata: ---^ [10:46:40] ah still not online [10:46:40] elukey: that's super good news ! [10:46:44] yessss [10:46:51] Well done mate :) [10:46:55] also it has been running on the maps cluster for days, all good :) [10:48:26] Analytics-Kanban, Operations, Traffic, Patch-For-Review: varnishkafka integration with Varnish 4 for analytics - https://phabricator.wikimedia.org/T124278#2177313 (elukey) Open>Resolved Code merged by ema, plus the varnish maps cluster has been running with vk for days without triggering any... [11:01:26] elukey: if you have a minute I could with some help [11:08:34] joal: sure! [11:08:38] just found out this [11:08:38] elukey@aqs1001:~$ sudo service aqs status [11:08:39] ● aqs.service - "aqs service" Loaded: loaded (/lib/systemd/system/aqs.service; enabled) Active: active (running) since Fri 2016-04-01 13:26:49 UTC; 2 days ago [11:08:50] I was looking for restbase :D [11:09:14] anyhoww [11:09:19] How can I help? [11:09:54] elukey: I'm fighting with a hadoop job failing, and can't access logs [11:10:45] do you want me to get to a specific node or on the master? [11:11:41] elukey: node IP is 10.64.21.112 [11:12:07] analytics1051.eqiad.wmnet. [11:12:23] folder: /var/lib/hadoop/data/m/yarn/logs/application_1458129362008_56648/container_e17_1458129362008_56648_01_000003/ [11:12:32] should be stdout and stderr [11:13:42] is it on hdfs or file-system? [11:13:53] file system elukey [11:16:08] joal: I can only see [11:16:09] application_1458129362008_56643/ application_1458129362008_56645/ application_1458129362008_56646/ application_1458129362008_56647/ application_1458129362008_56649/ [11:16:41] elukey: even as root? [11:16:58] yep [11:17:34] rmmmm [11:17:38] Not good [11:18:41] ooooh elukey actually, compared to oozie logs, I can access those using yarn logs function [11:18:47] Sorry for having disturbed 1 [11:19:29] nahhh it's fine! Did you find the log? I don't see it :( [11:19:54] elukey: It's because they have been archived, therefore accessible by yarn logs CLI [11:20:28] ahhh okok [11:21:38] brb lunch! [11:55:41] AFK for a moment [12:12:01] Analytics-Tech-community-metrics: Make GrimoireLib display *one* consistent name for one user - https://phabricator.wikimedia.org/T118169#2177384 (Lcanasdiaz) I've isolated the bug. Basically it is due to the way the current version of the library calculates the profiles to be included in people.json file.... [12:54:05] 14:18 PROBLEM - RAID on stat1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) [12:54:10] :( [13:03:06] so it seems a single disk with multiple LVM volumes on top [13:03:16] probably hw RAID? [13:11:30] yes confirmed, will cut a phab task [13:31:00] https://phabricator.wikimedia.org/T131758 [13:31:14] so we have raid 6 and 12 disks, only one has failed [14:37:13] joal: ready to restart the nodejs upgrade :) [14:40:12] or maybe milimetric [14:40:13] :) [14:41:04] cool with me, elukey, one at a time right? [14:41:17] o/ [14:41:19] morning :) [14:41:21] yep! [14:41:29] and wait, I'm doing it or you will? [14:41:41] I'll do it [14:41:49] following https://wikitech.wikimedia.org/wiki/Analytics/AQS#Monitoring [14:41:57] just need some verifications etc.. [14:42:02] sorry [14:42:06] https://phabricator.wikimedia.org/T123629#2143751 [14:42:40] looks good, I can test manually after each one if you like [14:42:59] woa, endpoints are sloow :) [14:44:04] Hi milimetric :) [14:44:09] everyhtin [14:44:15] good on aqs side?> [14:44:22] man ... heavy fingers ! [14:44:30] just one of the endpoints was slow when i pinged it [14:44:33] it seems fine tho [14:44:48] joal: btw, I've been restarting those jobs here and there over the weekend [14:44:58] I slacked a bit yesterday so I restarted a bunch more just now [14:45:02] all right, depooling aqs1001 [14:45:03] I've seen that, monitoring the errors [14:45:19] the one weird one, that keeps failing over and over is Jan 19 [14:45:25] milimetric: I have spend a long time this morning trying to understand what's wrong with uniques job [14:45:30] It fails erratically [14:45:38] the Jan 19th fails consistently [14:45:38] an I can't get why [14:45:47] it's failed at least 4 times so far [14:46:03] !log de-pooled aqs1001.eqiad from the confd pool for nodejs upgrade [14:46:04] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [14:46:19] I have pinpointed the error, but it's a very unexpected one given what we're doing, so root cause is even further :( [14:47:11] joal: sudo httpry -i eth0 tcp port = 7232 is veeeery nice [14:47:35] elukey: can't sudo :( [14:47:45] What does it say ? [14:48:22] HTTP traffic flowing, really useful to see if only LVS/pybal stuff are flowing [14:48:28] finally I found what I wanted :P [14:48:33] awesome :) [14:48:40] Ahhh, I get it ! [14:48:56] elukey: You were looking for metrics to make sure depooling was really done, right ? [14:49:39] yeah.. but didn't really find them :( [14:49:58] elukey: if you ended using httpry, I can guess so [14:52:55] what the hell https://logstash.wikimedia.org/#/dashboard/elasticsearch/analytics-cassandra [14:53:09] a-team, need to AFK again, will be back for standup and be here late tonight [14:53:17] milimetric, joal: just ran nodetool drain + cassandra stop [14:53:23] as the procedure needs [14:53:26] k [14:53:29] but aqs didn't like it [14:53:39] which aqs node? [14:53:54] elukey: it's down on aqs1001 [14:54:07] yep exactly (it is depooled [14:54:17] all right no more errors [14:54:26] but weird [14:54:34] it'd down everywhere, elukey [14:54:38] so that procedure must be wrong [14:54:41] lemme look closer [14:54:43] com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM [14:54:56] PROBLEM - Analytics Cassanda CQL query interface on aqs1001 is CRITICAL: Connection refused [14:55:15] milimetric: bat-cave? [14:55:20] sure [14:58:37] RECOVERY - Analytics Cassanda CQL query interface on aqs1001 is OK: TCP OK - 0.007 second response time on port 9042 [15:04:31] ottomata: eventbussss [15:04:40] nuria: he is sick :) [15:04:41] nuria: he's sick today [15:05:06] k [15:18:29] milimetric: we should have LOCAL_QUORUM, so from http://www.ibm.com/support/knowledgecenter/SS3JSW_5.2.0/com.ibm.help.gdha_tuning.doc/com.ibm.help.gdha_administering.doc/gdha_planning_cassandra_maintenance.html it should have been fine [15:18:33] really weird [15:19:58] ah no but the error msg says "consistency level QUORUM" [15:20:27] that should be even better [15:23:21] Analytics, Hovercards, Reading Web Sprint 70 L: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2177827 (dr0ptp4kt) [15:30:09] aah, I have to miss standup - minor non-emergency outside [15:44:16] hm, I'm back, standup over? [15:44:31] milimetric: it starts in 15! :) [15:44:33] Is it not at 9? [15:44:53] doh, *I'm* the one that changed it [15:44:54] lol [15:45:00] :D [15:45:10] joal: want to talk about CR? [15:45:17] !log aqs1001 re-added to the aqs pool (nodejd NOT upgraded) [15:45:18] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log, Master [15:59:59] nuria: just came back [16:00:09] nuria: tqalk after grooming ? [16:00:22] joal:sure [16:20:50] Analytics, Hovercards, Reading Web Sprint 70 L: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2178078 (dr0ptp4kt) [16:25:09] Analytics, Hovercards, Reading Web Sprint 70 L: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2178081 (dr0ptp4kt) [16:31:59] Analytics, Hovercards, Reading Web Sprint 70 L: Capture hovercards fetches as previews in analytics - https://phabricator.wikimedia.org/T129425#2178090 (phuedx) @GWicke: We're hoping to use the #RESTBase summary service as the primary backing service for #Hovercards. Does RESTBase's logging pipeline m... [16:34:13] Analytics-Kanban: End date not included - https://phabricator.wikimedia.org/T131641#2178093 (Milimetric) p:Triage>High [16:36:20] Analytics, Pageviews-API: AQS: add option to include all redirects for a page - https://phabricator.wikimedia.org/T131566#2178099 (Milimetric) [16:37:48] Analytics, Pageviews-API: AQS: add option to include all redirects for a page - https://phabricator.wikimedia.org/T131566#2171087 (Milimetric) To answer the second question, @kaldari, it depends. We don't have enough data in our system to figure out how to properly report statistics on redirects. We're... [16:40:51] Analytics: Making geowiki data public - https://phabricator.wikimedia.org/T131280#2178135 (Milimetric) p:Triage>Normal [16:43:16] Analytics, Pageviews-API: AQS: add option to include all redirects for a page - https://phabricator.wikimedia.org/T131566#2178147 (kaldari) [16:43:18] Analytics: Better redirect handling for pageview API - https://phabricator.wikimedia.org/T121912#1891561 (kaldari) [16:44:53] Analytics: Making geowiki data public - https://phabricator.wikimedia.org/T131280#2178152 (Nuria) >The bucket size requested is 10 for editing data. I think analytics team did some work in the past that proved that this data size is too coarse to be released to the public. > Asaf mentioned that legal had sig... [16:50:25] Analytics: Make geowiki data safely public - https://phabricator.wikimedia.org/T127409#2178177 (Milimetric) [16:50:27] Analytics: Making geowiki data public - https://phabricator.wikimedia.org/T131280#2178178 (Milimetric) [16:50:29] scuse me a-team, got disconnected from google, and looking for my phone for 2-step auth [16:51:24] Analytics: Traffic Breakdown Report - Browser Major Minor Version {lama} - https://phabricator.wikimedia.org/T115590#2178186 (Milimetric) Open>Resolved done in: https://browser-reports.wmflabs.org/#all-sites-by-browser/browser-family-and-major-hierarchical-view [16:58:58] Analytics: Making tests environment for pageview API deployments - https://phabricator.wikimedia.org/T131773#2178243 (Nuria) [17:01:06] Analytics: Traffic Breakdown Report - Client OS Major Minor Version {lama} - https://phabricator.wikimedia.org/T115591#2178257 (Milimetric) done here: https://browser-reports.wmflabs.org/#all-sites-by-os/os-family-and-major-hierarchical-view [17:01:16] Analytics: Traffic Breakdown Report - Client OS Major Minor Version {lama} - https://phabricator.wikimedia.org/T115591#2178258 (Milimetric) Open>Resolved [17:01:22] Analytics: Create fake data for beta AQS deployment - https://phabricator.wikimedia.org/T120841#2178260 (Nuria) [17:01:24] Analytics: Making tests environment for pageview API deployments - https://phabricator.wikimedia.org/T131773#2178259 (Nuria) [17:01:35] Analytics: Traffic Breakdown Report - User Agent Overview {lama} - https://phabricator.wikimedia.org/T115599#2178262 (Milimetric) done in general by this whole dashboard: https://browser-reports.wmflabs.org/ [17:01:52] Analytics: Traffic Breakdown Report - User Agent Overview {lama} - https://phabricator.wikimedia.org/T115599#2178263 (Milimetric) Open>Resolved [17:02:21] Analytics: Making tests environment for pageview API deployments - https://phabricator.wikimedia.org/T131773#2178243 (Nuria) [17:03:18] Analytics: Traffic Breakdown Report - User Agents Trend {lama} - https://phabricator.wikimedia.org/T115601#2178267 (Milimetric) Open>Resolved done by the trend graphs of this dashboard: https://browser-reports.wmflabs.org/ [17:03:43] Analytics: Traffic Breakdown Report - Visiting Country {lama} - https://phabricator.wikimedia.org/T115605#2178269 (Milimetric) Open>Resolved done last quarter by providing geo-coded data and Erik updating the data source for the original reports [17:03:50] Analytics: Traffic Breakdown Report - Visiting Country per Wiki {lama} - https://phabricator.wikimedia.org/T115607#2178271 (Milimetric) Open>Resolved done last quarter by providing geo-coded data and Erik updating the data source for the original reports [17:09:12] milimetric: I was reading through https://phabricator.wikimedia.org/T116206 [17:09:22] and it sounds like the AQS environment already exists [17:09:33] on beta [17:09:55] so it was really just about getting fake data to test it [17:09:57] cool, nuria: maybe update the parent task you just made ^ [17:10:36] Analytics: Making tests environment for pageview API deployments - https://phabricator.wikimedia.org/T131773#2178293 (Nuria) mmm.. some testing environment exists: https://phabricator.wikimedia.org/T116206 [17:11:50] joal: http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/detail/49760 :O [17:16:59] hi a-team [17:17:16] milimetric: deployment-aqs01.deployment-prep.eqiad.wmflabs [17:17:31] hi mforns: how's being back in Spain :) [17:18:34] madhuvishy, hi! it's confy, but not so fun as India :] [17:18:39] awww [17:18:44] less mosquitoes I hope :D [17:18:49] hehe yes [17:19:05] holaaa [17:19:19] hi nuria :] [17:26:59] Analytics: Implement Pages Created & Count of Edits full vertical slice - https://phabricator.wikimedia.org/T131779#2178389 (Nuria) [17:29:40] Analytics: Read mw databases AND dumps (separately) to fill the revision_create schema - https://phabricator.wikimedia.org/T131781#2178419 (Nuria) [17:31:00] Analytics: Put all historical data needed for Pages created and count of edits metrics through Event Bus into HDFS - https://phabricator.wikimedia.org/T131782#2178434 (Nuria) [17:31:41] Analytics: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports - https://phabricator.wikimedia.org/T131783#2178448 (Nuria) [17:53:38] elukey: still around? [17:54:35] joal: for 10 minutes, but yes :) [17:55:56] Arf, nevermind then elukey :) [17:58:18] joal: if I can help it would be great, I'll be out tomorrow :) [17:59:55] elukey: don't worry, it'll wait :) [18:00:57] joal: all right :) [18:01:19] elukey: Have a good end of day ! [18:01:21] I was investigating why /me removing one cassandra node caused quorum failures on AWS [18:01:26] *AQS [18:01:26] See you on wednesday ! [18:01:37] all right thanks! [18:01:38] elukey: We can discuss about if you want, I have ideas [18:02:19] I'll dig a bit more tomorrow if I have time, after that I'll be glad to chat with you.. I want to make sure that we are tolerant to one node going down [18:02:31] we should theoretically [18:03:10] going offline, byyeeee team!! [18:03:12] o/ [18:06:23] bye elukey [18:09:22] Analytics: Load into Druid when data is ready - https://phabricator.wikimedia.org/T131786#2178517 (Nuria) [18:11:11] milimetric: tasks created for loading of edit data, they sprang from this one: https://phabricator.wikimedia.org/T130256 [18:21:41] thx nuria --^ [18:22:09] nuria: gwicke also mentionned that there was a raid sync ongoing on aqs [18:22:23] nuria: I'll wait for that to have finished before restarting some load tests [18:22:56] milimetric: Do wer schedule some time tomorrow to go over some wikistats edit pages? [18:23:37] hmmm, actually, already a lot of meetings tomorrow milimetric [18:23:41] milimetric: wednesday? [18:47:52] sorry - was out [18:48:02] wednesday works great [19:03:52] nuria: good afternoon :-) I am around for a bit [19:04:27] hashar: I was wondering of what was the best way to work with you on jenkins changes we would like to do [19:04:43] beer comes to mind [19:04:47] hashar: to build & deploy java jars to ease our cluster deployment [19:04:57] but that is just because I am currently shallowing a beer :D [19:04:59] ah [19:05:07] cc madhuvishy [19:05:13] yeah there is a task about it and I believe madhuvishy poked us about it last week [19:05:23] jaja [19:05:42] yessir [19:05:52] with the timezone difference, I am not the best point of contact though :( [19:06:02] hashar: I pinged the team on the task [19:06:06] but that should be straightforward assuming maven is used [19:06:11] https://phabricator.wikimedia.org/T130576 [19:06:35] will poke our list about it [19:06:48] thank you :) [19:06:50] the Jenkins part should not be too hard [19:06:52] hashar: who could be a best point of contact for NY tz (we mostly work on that one rather than PST) [19:06:59] signing is a bit concerning since the CI Jenkins is not really secure [19:07:27] anyone from the team but Zeljko and I (we are the only one in Europe) [19:07:29] rest are us [19:07:29] US [19:08:00] hashar: we don't sign the jars as far as i know - giving jenkins archiva creds i'm not sure how [19:08:48] hashar: and best way to ping team is irc or e-mail list? [19:10:57] madhuvishy: signing can be figured out later I guess [19:11:03] hashar: yeah [19:11:07] nuria: there is the QA list but it does not have that much attention [19:11:12] right now it doesn't even attempt to release [19:11:14] I am poking the internal list [19:11:25] hashar: ajajam [19:13:45] milimetric: Wednesday before standup, or after?> [19:14:00] nuria: madhuvishy poked list and CC ed you both :-} [19:14:02] joal: anytime [19:14:08] :) [19:14:10] hashar: thank you :) [19:14:18] joal: I can be up earlier too if you need, it's *no* problem, I wake up at like 6 [19:14:20] milimetric: before standup then :) [19:14:22] hashar: super thanks [19:14:37] nuria: madhuvishy: we are trying to scale CI out of me to the rest of the team. There are too many requests nowadays and I can no more keep up :-D [19:14:46] milimetric: That's way too early for me to accept a meeting like that! [19:14:47] hashar: ya, no wonder [19:14:50] hashar: of course! [19:15:05] hashar: you cannot be CI man for the foundation, understood completely [19:15:06] nuria: madhuvishy this maven / release / jar thing looks like a good opportunity for cross team training and I believe it would be of interested for the #releng people working on the scap3 deployment tool [19:15:18] hashar: sure [19:15:30] madhuvishy: kudos on having the Jenkins job created :-) [19:15:31] cool [19:15:41] hashar: ha ha if only it worked :D [19:17:03] madhuvishy: well it seems to work just fine, it is just that some of the module fails for a random reason. One gotta look at the full console log and look up for errors / stracktraces etc [19:17:17] https://integration.wikimedia.org/ci/job/analytics-release-test/7/org.wikimedia.analytics.refinery.job$refinery-job/consoleFull < warning 15 Mbytes [19:17:22] hashar: yeah - i tried looking at it a bit [19:17:36] [JENKINS] Recording test results [19:17:36] hudson.AbortException: Test reports were found but none of them are new. Did tests run? [19:17:37] For example, /mnt/jenkins-workspace/workspace/analytics-release-test/refinery-job/target/surefire-reports/TEST-AppSessionSuite.xml is 7 min 52 sec old [19:18:02] hashar: hmmm - if i submit a maven job normally they all build [19:18:08] seems some tests havent run or wrote down test results in a .xml file [19:18:17] hashar: hmmm [19:18:37] ah [19:18:39] hmmm [19:18:40] may be [19:18:51] Jenkins also has a few plugins to enhance the console output which we use on almost all other jobs [19:18:52] i'm not asking it to actually run tests [19:18:56] i can do that [19:19:08] a) transform ansi escaping to have nice colors [19:19:18] b) prefix lines with the time [19:19:25] ooh [19:19:42] apparently they are enabled doh [19:20:17] ho you even figured out the Zuul parameters \O/ [19:20:24] * hashar looks for a CI barnstar [19:21:09] :D [19:21:30] adding clean package to goals and rebuilding [19:22:01] hashar: I defined it as a maven job without goals - may be not having goals was not running the tests [19:22:27] Analytics-Kanban, Release-Engineering-Team: [Spike] Figure out how to automate releases with jenkins {hawk} - https://phabricator.wikimedia.org/T130576#2178752 (hashar) From a quick conversation with Nuria / Madhumitha , I have poked the Release Engineering internal mailing list to raise attention to this.... [19:22:42] but - it was attempting to do it based on the job already defined for the repo I thought. [19:22:47] not sure [19:22:51] the build state it is a success [19:22:54] from https://integration.wikimedia.org/ci/job/analytics-release-test/7/console [19:23:02] yeah [19:23:12] which i thought would make it attempt to release [19:23:18] because i defined that in the config [19:23:28] but it doesn't [19:24:04] if it did, it would fail - but it can successfully run release:prepare atleast. release:perform would need archiva creds [19:24:31] ah build #8 has all modules passing [19:24:36] yes! [19:24:44] adding the goals seems to have helped [19:25:00] looks like "clean package" was the magic thing :D [19:25:23] ha ha [19:25:27] but no attempts to release [19:25:37] for the M2 release plugin, honestly I have zero clue :( [19:25:41] yeah [19:25:46] that's the hard part [19:25:46] one sure thing dont put passwords / tokens etc in the job [19:25:53] i haven't [19:26:13] the jobs configs are readable by anyone in the wmf / wmde / nda LDAP groups [19:26:35] https://wiki.jenkins-ci.org/display/JENKINS/M2+Release+Plugin the docs are laughable [19:26:46] right [19:26:59] yeah plugin docs are not always very accurate nor kept up to date [19:27:12] the good news, is that it is a wiki ;-} [19:27:28] so in theory fixable / amendable. Still have to figure out how the plugin works though [19:27:33] yeah [19:28:22] I sometime have to dig into the source code .. Then I don't know java at all so that is really the last attempt for me [19:29:01] hashar: aah - okay let me look - if it tried to do something - it would show up in logs right? [19:29:10] well [19:29:13] who knows [19:29:15] maybe :-) [19:29:59] you know how devs love to do stuff like { catch( Exception e) { /* happens randomly, annoying */ } [19:30:07] or 2> /dev/null [19:30:38] yes yes [19:30:48] i realized that as soon as i asked the question [19:31:23] I abuse those tricks myself, so it is usually one of the first thing I look for :D [19:32:13] I see their code has good logging [19:32:22] ahh [19:32:46] so some thing needs to happen to trigger a release [19:32:53] i'm not setting it off right [19:33:10] have you triggerred build #8 with the "Perform maven release" https://integration.wikimedia.org/ci/job/analytics-release-test/m2release [19:33:15] seems to use a different goal [19:33:24] (PS5) Nuria: Evaluating Pageview tagging only for apps requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/279447 (https://phabricator.wikimedia.org/T128612) [19:33:43] based on https://wiki.jenkins-ci.org/display/JENKINS/M2+Release+Plugin "Project configuration" / first picture [19:33:43] hashar: woah didn't know this page existed [19:33:53] aaah [19:33:56] interestinggg [19:34:04] okay that may work [19:34:09] guess you can copy past ZUUL parameters :D [19:34:10] trick [19:34:11] let me type in all the zuul things [19:34:13] yes [19:34:17] oh there's a trick? [19:34:27] you can have the ZUUL_ parameters to point to Gerrit [19:34:29] ie [19:34:34] ZUUL_PROJECT=analytics/whatever [19:34:46] ZUUL_URL=https://gerrit.wikimedia.org/r/p [19:34:50] ZUUL_COMMIT=master [19:34:57] ZUUL_REF=master [19:35:00] ZUUL_BRANCH=master [19:35:18] those parameters are usually forged by Zuul [19:35:24] oh cool [19:35:36] but in the end they are just passed straight to the Jenkins git plugin [19:35:56] right [19:35:57] which is instructed to clone "$ZUUL_URL/$ZUUL_PROJECT" [19:35:58] etc [19:36:12] (CR) Joal: "Hopefully the last one !" (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/279447 (https://phabricator.wikimedia.org/T128612) (owner: Nuria) [19:36:33] right [19:36:35] also in Zuul there is the "experimental" pipeline, it triggers job whenever someone comment in Gerrit "check experimental" [19:36:48] ha [19:36:51] so in theory we could add your experiment job analytics-release-test in there :D [19:36:58] and you can just "recheck" to build it [19:37:23] but for a release, I guess it should be done when a tag is cut or maybe when a change is merged in the branch [19:37:32] yeah [19:37:45] i was hoping i could configuring it in a way that [19:37:50] if you push to a release branch [19:37:56] it would trigger a release [19:37:59] totally doable [19:38:30] what we would do is get Zuul to trigger analytics-release-test whenever a change is merged [19:38:57] and have the job to only trigger on release branches (using a regex like ^v\d+\.\d\.* [19:39:05] or ^REL.* for mediawiki ;-D [19:39:15] right [19:39:20] ya i saw those [19:39:21] https://integration.wikimedia.org/ci/plugin/m2release/img/releasebadge.png !!! [19:39:35] so all that glue is not rocket science [19:39:52] but is a bit intimidating cause it is all hidden in Jenkins Job Builder / Zuul yaml configuration files [19:40:02] yeah [19:40:06] i read all those [19:40:07] which are really Domain Specific Languages to respectively describe a job in Jenkins and a workflow [19:40:15] tried to do some branch specific config [19:40:19] so there is a bit of a learning curve at start [19:40:23] which i didn't succeed with much [19:40:39] then a few attempts / a bunch of reviews and reading the upstream doc should get you on par with the CI gurus :D [19:41:14] :D the perform maven release stuff need to be translated to the jjb config though [19:41:17] anyway for POC / experiment, the Jenkins web interface is easier [19:41:21] (PS6) Nuria: Evaluating Pageview tagging only for apps requests [analytics/refinery/source] - https://gerrit.wikimedia.org/r/279447 (https://phabricator.wikimedia.org/T128612) [19:41:22] yup [19:41:22] we use it all the time myself included [19:41:58] at some point when i was pushing my config from the cli it stopped showing up in the job list - when i gave up and switched to the UI [19:42:05] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.1:prepare (default-cli) on project refinery: An error is occurred in the checkin process: Exception while executing SCM command. Detecting the current branch failed: fatal: ref HEAD is not a symbolic ref -> [Help 1] [19:42:07] booooooo [19:42:15] that's fineee [19:42:18] it tried [19:42:20] i'm happy :D [19:42:26] now there's a path [19:42:27] \O/ [19:42:56] too late to remember of why HEAD is not around from time to time :( [19:43:28] will try giving it a commit number [19:43:34] uhhh [19:43:37] SHA [19:43:37] at least at the top the Jenkins git plugin claims to have 00:00:00.111 Checking out Revision 6687e01eec6f38ffee2197a9e9953f7f1ad51de5 (origin/release) [19:44:14] yeah [19:44:52] not sure that is the sha1 that was meant to be tried [19:46:17] hashar: that's the latest commit on the branch [19:46:33] so that is correct when i just gave it the branch i think [19:49:15] success! [19:49:16] /mnt/jenkins-workspace/workspace/analytics-release-test [19:49:19] 00:03:29.901 [M2Release] its only a dryRun, no need to mark it for keep [19:49:26] yes [19:49:32] i marked dry run true [19:49:44] cool now to try without dry run [19:50:19] it has only build the first module though [19:50:24] and skipped Core/Tools/Hive etc [19:50:47] oh [19:50:57] the preparation simulation... sorry [19:53:54] hmmm it failed with the symbolic head thingy [19:55:58] http://stackoverflow.com/questions/20351051/git-fatal-ref-head-is-not-a-symbolic-ref-while-using-maven-release-plugin asks to not use the shallow clone option [19:57:57] ( [19:58:39] trying that although i don't know if it's a good idea [19:58:55] it looks like it's associated with not using the master branch [19:59:26] maybe because the repo is not fully cloned [19:59:37] the job is set to use a shallow clone (should clone only one commit) [19:59:43] yeah [19:59:48] i unchecked it to test [20:00:36] also sometime the console output is mangled with long lines of random characters [20:00:39] hmmm - same [20:00:57] there is a link on the side bar to show the raw text version [20:00:58] https://integration.wikimedia.org/ci/job/analytics-release-test/11/consoleText [20:01:26] and there you can see the actual git commands passed [20:01:28] ie [20:01:32] it does a commit [20:01:42] and then git symbolic-ref HEAD [20:03:12] ah yes [20:03:29] madhuvishy: let me nuke the workspace entirely [20:03:38] done [20:03:42] then retrigger and see whathappens [20:03:58] the files are kept being run [20:04:16] cool, trying [20:04:19] I am not sure how the Jenkins git plugin can migrate from "shallow clone" to full clone. That is probably not handled [20:04:35] ah - it's there in the config [20:04:40] and i unchecked it [20:04:52] at least in the UI [20:05:00] may be it does nothing [20:07:09] try again ? [20:07:19] yup just launched a job [20:08:18] then if all fails Google has a bunch of related messages https://www.google.fr/#safe=active&q=jenkins+git+plugin+symbolic-ref+head :D [20:08:47] https://issues.jenkins-ci.org/browse/JENKINS-5856 "Unable to Release a Git Project" [20:08:50] (resolved) [20:09:02] some time the issue reported have good workaround / way to fix etc [20:09:37] aah looking [20:09:57] so apparently it is in detached head [20:10:07] same i guess - i'd try with master - i don't want it to commit the release message there though [20:10:17] there are a few hints in the comments [20:10:19] oh [20:10:22] reading [20:10:26] the maven plugin apparently craft the commit for you [20:10:33] yeah [20:10:42] but i don't want it in master [20:10:48] [maven-release-plugin] prepare release v0.0.28 [20:10:48] - 0.0.28-SNAPSHOT [20:10:48] + 0.0.28 [20:10:49] etc [20:11:34] ah [20:11:52] git plugin does checkout commits by RefID, not branches. You can force it to use a local branch (advanced section) so that maven release process will be able to commit, tag and push to remote repository [20:12:29] yeah [20:12:46] in the job git section, you can add extra behavior via an [Add] button [20:12:56] right [20:12:59] doing now [20:12:59] one being that "check out to specific local branch" [20:13:07] maybe ZUUL_BRANCH would do [20:13:16] err [20:13:18] $ZUUL_BRANCH [20:13:48] fun [20:13:57] I have learned something tonight \O/ [20:14:10] :D [20:16:51] so [20:17:01] Jenkins is really all about trial and errors :( [20:17:05] yeahh [20:17:14] once you get a job working fine [20:17:26] you can think about the workflow you want for the repository [20:17:30] yup [20:17:39] i.e. what to do when a patch is proposed to master , on release branch [20:17:48] what happens when change is merged , a tag pushed etc [20:18:07] where would the code to make release plugin work with jjb go? [20:18:15] bah can't find the symbolic ref again grr [20:19:15] yeah i only tried changing the branch specifier [20:19:30] added the advanced setting now - Checkout/merge to local branch (optional) [20:19:35] $ git symbolic-ref HEAD [20:19:36] refs/heads/release [20:19:36] $ [20:19:38] better :-} [20:20:09] for JJB I guess it is time to poke #wikimedia-releng :-} [20:20:37] in a nutshell the upstream doc is at http://docs.openstack.org/infra/jenkins-job-builder/ [20:20:59] we run a stall copy / fork which is on our Gerrit in integration/jenkins-job-builder.git which we sync with upstream from time to time [20:21:17] yeah i'll ask there once this works :) we add some xml to the .m2/settings.xml with username/password info for archiva [20:21:27] do you know where such a thing would go [20:21:35] i haven't really looked yet so i can [20:22:12] so if a plugin is missing, we send a patch proposal to JJB upstream authors ( openstack ) [20:22:17] hashar: yay it went past symbolic ref [20:22:26] \O/ [20:22:28] Host key verification failed. :( [20:22:29] no rights to push [20:22:38] guess it tries to push something [20:22:46] Unable to commit files [20:22:47] yes [20:22:56] it does git commit and git push [20:23:03] the jenkins user should have rights [20:23:34] is there a way to do this except manually configure the gerrit repo? [20:24:42] Jenkins has a credential storage [20:25:03] so you can put api tokens in there / ssh keys etc [20:25:21] then for example have the job to launch a ssh-agent and load the ssh key from the credential store [20:25:34] that would give the job the ability to push over git+ssh [20:25:40] hmmm [20:25:46] (assuming the push-url is properly set) [20:26:09] ya i think so [20:26:10] or for Gerrit you can push over https given a token is generated for the user in the Gerrit settings [20:27:57] the user here is jenkins-bot right? [20:29:56] there is no CI job pushing back to Gerrit :D [20:30:14] the releases are all done manually [20:31:10] https://gerrit.wikimedia.org/r/#/admin/groups/833,members has the people that can push [20:31:15] the jenkins-bot user in Gerrit is for Zuul to be able to submit changes in Gerrit, vote CR/Verified on changes [20:31:17] ahh of course [20:32:21] and that is where we reach the lands of deployment :-D on which other releng folks can assist ;-} [20:32:31] I dont have a good idea [20:32:31] coool [20:32:52] thanks for all the help - this was a lot of progress :D [20:33:04] potentially have a release user with a ssh key [20:33:08] yeah [20:33:12] we would add the ssh key to Jenkins [20:33:41] get the ssh key injected in the job ssh-agent when it the job triggered on a change merged on a release branch [20:33:56] figure out the push url to use with those credentials [20:34:16] and in theory, the maven release pluging should be able to push and maybe even tag ;D [20:35:06] all that are really unexplored areas. That is the first use case I hear of so far [20:35:23] beside releasing mediawiki tarball a couple years ago, but we havent gone far on that front [20:35:56] right [20:36:00] madhuvishy: I am going to rest / sleep etc. May you write a quick summary on the task ? [20:36:01] it would be really cool [20:36:03] yes [20:36:05] i will do that [20:36:07] a couple line about the job passing [20:36:29] thanks again! good night :) [20:36:29] and stuff to do like figuring out the actual workflow and way to inject credentials [20:36:31] ;_D [20:36:32] yes [20:36:38] at least the build is green ! [20:36:42] :D [20:36:43] yes [20:36:56] and do poke #wikimedia-releng about it -;)) [20:37:00] i will :) [20:37:14] JJB part, if you know python it is usually not too hard ;-) [20:37:20] that i do [20:37:30] I can probably rush a rough patch in a couple a hours [20:37:57] i would love to try, and i'll bother you when things don't work :) [20:38:09] a recent example of adding support for a plugin https://review.openstack.org/#/c/286689/ [20:38:22] and follow up with tests https://review.openstack.org/#/c/286690/ [20:38:31] nice! thanks [20:38:44] they are using gerrit so it should be straight forward [20:38:47] there is a CLA [20:38:51] awesome [20:39:04] if you are a WMF employee the Wikimedia Foundation has signed the corporate CLA [20:39:17] you might have to sign an individual one as well. Legal team is your friend ;-} [20:39:44] then it is all about cloning / running tests, looking at patch history to look how plugins support is added [20:39:51] feel free to add me as a reviewer there ( hashar ) [20:40:23] once we have something passing tests upstream, we can cherry pick on our fork and call it done (more orless) [20:40:40] kudos again. I am talking too much and should sleep ;) [20:41:27] nuria: well madhuvishy is a Jenkins pro really. Beside mumbling I have not done much ;-] Half solved imho, refinery release build is green at least! [20:44:59] hashar: i will do that! [20:47:01] thank you :) [22:10:18] Analytics: Making geowiki data public - https://phabricator.wikimedia.org/T131280#2179296 (Milimetric)