Return-Path: X-Original-To: distix@pieni.net Delivered-To: distix@pieni.net Received: from yaffle.pepperfish.net (yaffle.pepperfish.net [88.99.213.221]) by pieni.net (Postfix) with ESMTPS id B297542A0A for ; Sat, 16 Feb 2019 14:40:12 +0000 (UTC) Received: from platypus.pepperfish.net (unknown [10.112.101.20]) by yaffle.pepperfish.net (Postfix) with ESMTP id 9C56441912; Sat, 16 Feb 2019 14:40:12 +0000 (GMT) Received: from ip6-localhost.nat ([::1] helo=platypus.pepperfish.net) by platypus.pepperfish.net with esmtp (Exim 4.80 #2 (Debian)) id 1gv18W-00082M-Iq; Sat, 16 Feb 2019 14:40:12 +0000 Received: from koom.pieni.net ([88.99.190.206] helo=pieni.net) by platypus.pepperfish.net with esmtpsa (Exim 4.80 #2 (Debian)) id 1gv18V-00082B-U0 for ; Sat, 16 Feb 2019 14:40:11 +0000 Received: from exolobe1.liw.fi (unknown [194.157.43.230]) by pieni.net (Postfix) with ESMTPSA id 8789E42A0A for ; Sat, 16 Feb 2019 14:40:11 +0000 (UTC) Received: from exolobe1.liw.fi (localhost [127.0.0.1]) by exolobe1.liw.fi (Postfix) with ESMTPS id ECE5D12052B for ; Sat, 16 Feb 2019 16:40:10 +0200 (EET) Date: Sat, 16 Feb 2019 16:40:09 +0200 From: Lars Wirzenius To: ick-discuss@ick.liw.fi Message-ID: <20190216144009.GA5870@exolobe1.liw.fi> References: <5e2b278847b76dd0311d5050b3455b12b4dd3077.camel@liw.fi> <1549688875.1308.1@ssh.steve.org.uk> MIME-Version: 1.0 In-Reply-To: <1549688875.1308.1@ssh.steve.org.uk> User-Agent: Mutt/1.10.1 (2018-07-13) X-Pepperfish-Transaction: c6c8-5fab-3cc2-ec6b X-Pepperfish-Transaction-By: platypus Subject: Re: Plan for using Muck for the Ick controller X-BeenThere: ick-discuss@ick.liw.fi X-Mailman-Version: 2.1.5 Precedence: list List-Id: discussions about the ick CI system List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============6564971783618614984==" Mime-version: 1.0 Sender: ick-discuss-bounces@ick.liw.fi Errors-To: ick-discuss-bounces@ick.liw.fi --===============6564971783618614984== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="jI8keyz6grp/JLjh" Content-Disposition: inline --jI8keyz6grp/JLjh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable (Sorry about taking so long to answer: I was travelling for work.) On Sat, Feb 09, 2019 at 05:07:55AM +0000, Steve Kemp wrote: > Assume you're building a kernel, which might spit out 400,000 lines > of build-log. Assume each new update, due to buffering, or sizing, > is 10 lines. >=20 > To get the whole build-log you're going to need to: >=20 > * Get the first piece. > * Get the next child. > * Get the next child. > * Get the next child, 39,997 times more. >=20 > That seems like it will scale terrible. The only potential win I can > see here is imagining that you might want to view the output via a > brower and you'll probably only care about the LAST 100 lines, not the > FIRST. (i.e. The bit which will typically contain "built blah", or > "build failed".) You're right, this scales badly to long build logs, because of the large number of HTTP requests required. (The size of the logs is so much less of an issue I'm going to ignore that for now.) For now, I think I'm OK with that, if it's fast enough for not-very-long build logs. From my personal Ick instance, the longest log files are about 14000 lines. I added a little benchmark script to the muck-poc repository to test this: benchmark-log. I generates N snippets, stores them in Muck, then searches for the snippets, and retrieves each, and reconstructs the whole log by catenating the snippets in order (i.e., as outlined in my previous mail). It also reports how long each stage takes. Note that muck-poc and the benchmark are both running sequentially. Running with N=3D14000, on my laptop, the script outputs: creating snippets getting list of snippets reconstructing full log OK 68 creation time 0 list snippets 67 assemble log This is not excellent. Is it tolerable? Not very tolerable for interactive use. I'm not much worried, for now, about the time it takes to create the log file snippets, that's going to be drowned by the actual build. The time to get the list of snippets is fast enough, I think, even in the current prototype written in Python without indexes. The full log assembly time is bad. As you say, it's because there's a lot of HTTP requests. It might be tolerable for a short while, while we improve things, just to get the Ick controller to use Muck instead of having local state, but let's discuss how we can improve it. I don't like the idea of keeping the log file locally, either on the worker-manager or the controller, as it means that one can't do the equivalent of "tail -f" while the build is running. Updating the controller's view of the log while the build is running, without lagging much from actual output, is a thing I'd really like to keep. I also don't want to rely on websockets or similar, for now. The Muck archtecture doesn't currently suit that, and I don't want to change that, at least for now. The controller could coalesce small snippets while the build is running. It could notice that there's a lot of small snippets, and combine them into a bigger snippet, deleting the small ones. This would radically reduce the number of snippets. If, say, the controller combined 1000 small snippets to one larger one, that would reduce the number of snippets so much reconstructing a complete log is tolerably fast - order of zero seconds at the moment on my laptop. It'd mean some duplication of log data in the Muck changelog, but I think that'd also be acceptable. There's a fair bit of back and forth of log data, of course. Thus, the controller would do this: * incoming short snippets would arrive from the worker-manager as outlined in my previous email * the controller would create a short snippet object in Muck for each incoming snippet * new step: if there are more than M short snippet objects for a build, the controller would construct a combined snippet object, and delete the short snippet objects * new step: at the end of the build, the controller would comine all short snippet objects * new: when returning the whole log, the controller would first catenate all the combined snippets, and then the short snippets Does this seem reasonable? I think it'd have tolerably good performance. Another approach, some day in the future, might be to have a "log server" instead of having the controller do that, but that is a whole new component, and I don't want to go there yet. --=20 I want to build worthwhile things that might last. --joeyh --jI8keyz6grp/JLjh Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEETNTnrewG6wEE1EJ3bC+mFux6IDEFAlxoIMgACgkQbC+mFux6 IDFtXQ/7ByREaDr1ZRQcCC6LW9bYJiSNi+v0s3gE/jzUtc/dxtdMHFG6nvajRQ9V Sz25FR4oDk+lr4ptFhvVRt8Ep0jM4oYiVG8Lzs7qcBNEerbtAnrw0unrGSDeq62x njdZxX+SDM2joX3k8TL+QUflHecThbm7DSeg2AAAEasLFb/FmPuReDZJo8NMzik2 oBxXhsCDHmRRQg88tVRo1FwPxkBvkxbwt5M8iJY1LraGqowcbyZQbpNANQRTYJJv McNfTWMu1u2JElTTR2Y8DhMwyh6dBM2U4bcuF8PR1bZgZVOEqV0RpfWGaKpaf4CB 5VXsM0rokRFT3E0h66TsWYCIMJB0bxXbRrYfyI7kOoQgeIu3LaEYc8JFgX2LM29+ e8OtMFiTYs2f+hp73UP7JCbAf7ny/eAuHeui2KLjgm2fSYroFmfxidFmA+mz6Xaz /C2EFCQbNzSyJQcc3kMR+0MKO33zYvDEUjMuapAukEmKPtHPziJahdf5b09VLyYb 6DJ+34/5VknvGcv0pqnZmmfppxAGxC961JDj5EzEPHTVt5PVBN6Dhe0CWbl1zTZM d52dsNaa8dAeQ8idLlK7cEMuNHlFPyL5/FKQGlhlgEGv5dc9kVEfnJpyHg0TJLgS gZL2o4To1k4XOis2+siKMgf42dvEUrW9fu5onBX59OQq8IhhWa8= =hIMd -----END PGP SIGNATURE----- --jI8keyz6grp/JLjh-- --===============6564971783618614984== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ick-discuss mailing list ick-discuss@ick.liw.fi https://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/ick-discuss-ick.liw.fi --===============6564971783618614984==--