IRC Logs for #crux-devel Thursday, 2010-04-15

*** mike_k has joined #crux-devel01:32
*** mike_k has quit IRC01:37
*** mike_k has joined #crux-devel01:45
*** mike_k_ has joined #crux-devel03:55
*** mike_k has quit IRC03:58
*** jue_ has joined #crux-devel06:52
*** deus_ex has joined #crux-devel08:29
*** deus_ex is now known as pedja10:57
jue_tilman: do you know why Charly's login at crux.nu might have been disabled11:45
tilmanhuh!?11:45
jue_:)11:46
*** jue_ is now known as jue11:50
*** _mavrick61 has joined #crux-devel13:26
_mavrick61Hi... Now I finally found how to join this channel.13:27
_mavrick61So Crux Server have some problem.13:27
tilmanhi!13:28
tilmanyeah13:28
_mavrick61First my password has not work since 2 years ago.13:28
_mavrick61Login/password.13:28
tilmanlocal login or ssh login?13:28
_mavrick61So what is the suggestion.13:29
tilmancan you trigger a reboot remotely?13:30
_mavrick61Shut down start in rescu state and mount the disk/partitions and delete things?13:30
tilmanjue's suspicion was that /tmp was filled up13:31
_mavrick61No I must go down to the server. The data center is 125 meter from me so that is not a beig deal.13:31
tilmanbut yeah, having a look via rescue system would be good13:31
tilmanah, phew13:31
_mavrick61I can booutit up using PXE boot, the ite will boot up debian. Then I should be able to mount the soft raid partitions13:32
_mavrick61RAID13:32
tilmanok13:33
_mavrick61It is 3 disk, and I created a RAID level 5 when I set it up.13:33
_mavrick61I can se what I can do.13:33
tilmanthanks!13:34
_mavrick61Be back in a while with some status info.13:34
teK_mavrick rocks ;-)13:38
tilmanteK_: one of these days we should talk about system maintenance re. crux.nu again13:40
teK_yes13:40
teK_I'd still like to see a mirror host (which I could provide in short)13:41
tilmana fallback system?13:41
tilmanor something to try stuff on?13:41
tilman(before deploying $stuff on the real machine)13:41
teK_both, kinda13:41
teK_since no on-the-fly upgrade from 2.x -> 2.x+1 is supported this is sort of 'critical' to difficult to do without remote hands13:42
teK_the host would be a kvm-virtualised instance so we'd have all options13:42
tilmanafk, will check later14:02
juere14:08
juehmm, we should think about splitting out some services to other places14:15
juee.g. moving/mirroring our repos to gitorius/github14:15
teK_If everything works as I like it I can provide a mirror.14:16
juewell, I'm happy with everything that works and until now we have no reason to complain about our current solution14:20
teK_my offer is totally optional14:20
jueit just works :)14:20
jueyeah, would be great to have some kind of backup/mirror14:21
juejust in case :)14:21
teK_rsync is cheap14:21
teK_so this should not be too much of a big deal14:21
teK_I even transfered a SLES 10 to a different server this way. Linux rocks!!1114:22
_mavrick61I may use simple word, "SHIT", then ther is a problem.14:34
_mavrick61It seems there is a disk error. The Crux server are now on my service workbench.14:35
teK_oh.14:35
_mavrick61I don't know the type of problem yet. But during boot I got kernel panic error can't read .........14:36
teK_fsck (either way)14:36
_mavrick61Try to start mdadm. Not enough with devices14:39
_mavrick61That means more the 1 disk has crashed. No one have checked the RADI status for a while?14:40
teK_I didn't.14:40
_mavrick61I did that time to time when I hade access. But after my loggin and pass was not working I have not done it for more the a year14:41
_mavrick61then*14:42
_mavrick61Anyone who have greate eperience with mdadm...14:42
_mavrick61There is some force options, but I don't rember them. All 3 hardisk is showing up while booting.14:44
teK_which command exactly fails?14:46
teK_mdadm -A?14:46
_mavrick61I'll check some more, but we need plan B.14:46
_mavrick61mdadm-startall in debian. We have PXE boot function in our network.14:46
_mavrick61We use that to be able to boot a linx system for "rescue" work14:47
teK_I guess jue or tilman will have to investigate. I don't have *any* real-world experience with mdadm14:47
_mavrick61tilman EU and jaeger US any on line now? My time is very limitd. I have some importatnt meeting tomorrow.14:49
_mavrick61I could let on of the old core members get ssh access through our firewall. But that info will be sent in a more secure way.14:50
teK_tilman is afk14:51
tilmani'm back14:55
tilmani'm clueless about mdadm, too14:57
_mavrick61I have now checed the disk14:58
_mavrick61cfdisk /dev/sda-c all the partions are still there. RAID autostart14:58
_mavrick61But the partions can be corrupted. Any tols for checking that.14:59
_mavrick61tools14:59
_mavrick61So what to do.15:01
tilmanask jaeger for help =)15:01
tilmancan you give more detailed error messages?15:01
_mavrick61Any backup done recent?15:02
juemy knowledge about mdadm is very limited too15:02
tilmani think there are backups, but i don't know where15:02
tilmanteK_: did you create a copy of the crux.nu system last year?15:03
teK_nope (was uncerftain due to bandwidth issues for the provider) ...15:03
jue_mavrick61: do you have running system, are you able to to run soem mdadm commands?15:03
teK_nad yes we have to create a 'concept' urgently..15:03
jue_mavrick61: if your partition types are of type raid autodetect, you should be able to assemble the raid with 'mdadm --assemble --scan'15:06
jueif not something like mdadm --assemble /dev/md0 /dev/sdaX /dev/sdaY15:07
tilman_mavrick61: if you're willing to give me access to the rescue system, you can send me the credentials via gpg15:09
tilmankey ID is B4517D8815:09
tilmanavailable at the key servers and at http://files.code-monkey.de/tsauerbeck-public-key.asc15:10
_mavrick61I have now activated the sshd in to the rescue mode15:12
_mavrick61gpg ?15:12
_mavrick61IRC is just a black clud for me15:13
_mavrick61cloud15:13
tilmani meant gnu privacy guard15:13
tilmanencrypted email15:13
tilman(because you mentioned a "more secure way")15:14
_mavrick61mount /dev/md1 /mnt15:14
_mavrick61mount: you must specify the filesystem type15:14
_mavrick61SMS?15:14
_mavrick61Send you phone number on eMail i think you got it.15:15
tilmannot sure i understood correctly15:18
tilmani just gave you my number here though15:18
tilmanfrinnst: awake? :p15:19
tilman_mavrick61: did you run the mdadm --assemble yet?15:26
tilman_mavrick61: what is /dev/md0 in the rescue system?15:30
tilmanmaybe it's the rescue system itself15:30
*** mike_k_ has quit IRC15:36
tilmanso i sent a cry for help to jaeger15:58
tilmanand then decided i was brave15:59
tilmanand did the forceful assembly (mdadm -Asf)15:59
tilmanwhich seems to have worked15:59
jueyou've ssh access to our box?16:00
tilmanyes, i have ssh access to a rescue system running on 'crux.nu'16:00
juea 'mdadm --detail /dev/mdx' works?16:01
tilmani get output16:02
tilmanbut i don't know what to look for16:02
tilmani think md1-md3 are in degraded state, because one disc is broken (sdc)16:03
tilmanbut i could mount md1-md316:03
tilmanmmh16:06
jaegertilman: can you do something like mdadm -As /dev/md1 ?16:06
jaegernm, I see you did that16:06
tilmanyay!16:06
jaegerjust read back in her e:)16:06
jaegerer, here16:06
jaegerdegraded is a step up from inactive, at least16:07
tilmanjaeger: -As didn't work ("too few devices"), but -Asf did work16:07
jaegerthat was gonna be my second suggestion16:07
jaegerfor that exact reason :)16:07
jaegerI had the same problem on morpheus.net once16:07
jaegerwhat does the output from mdadm -D /dev/mdX look like for any of them?16:07
tilman"Raid Devices: 3    Total Devices : 2    Preferred Minor: 1"16:08
tilman"State : clean, degraded"16:08
tilmanDevice number 0 has State removed16:08
tilmanDevice number 1: "active sync /dev/sdbX"16:09
tilmanDevice number 2: "active sync /dev/sdaX"16:09
jaegerok, that looks like what I'd expect16:09
tilmanthat information is the same for md1/md2/md316:09
jaegergood, it's consistent16:09
tilmanphew16:10
jaegerIt should be easy to fix, too, once the bad drive is replaced16:10
jaegerI'm guessing you already ran something like "mdadm /dev/mdX --manage -r /dev/sdxy" since it reports state removed16:11
jaegerbut generally the process looks like this:16:11
jaeger1. mdadm --manage <array> -r <failed device>16:12
jaeger1a. (repeat 1 for all necessary arrays)16:12
jaeger2. replace hardware16:12
jaeger3. fdisk new drive to match other drives16:12
jaeger4. mdadm --manage <array> -a <new clean device>16:12
jaeger4a. (repeat 4 for all necessary arrays)16:12
jaegerthen you can watch /proc/mdstat while it rebuilds or whatever you like16:13
tilmani suspect debian's mdadm-startall tool did the removing16:14
tilmantrying to remove sdc7 again says "hot remove failed for /dev/sdc7: no such device or address"16:14
tilmanwait16:15
tilmando i run --manage in assembled state?16:15
jaegerIt might have, not familiar with that particular tool16:20
jaegerIt should work in either state, I believe16:20
tilmanokay16:21
jaegerwould you privmsg me the output from mdadm -D on one of them now that they're assembled?16:22
tilmansure16:23
jaegerwould you mind also the output of /proc/mdstat now?16:24
tilmannot minding anything. so glad you help us out :)16:25
jaegerGlad to help. :) It looks good, just wanted to verify it after seeing the output in your email with the S flags16:26
jaegeralso, wanted to make sure I'd seen properly that there was a raid1 among the raid5s16:26
tilmanfinal question: do i run "mdadm --stop <array>" before shutting down the system?16:28
tilmanto make sure it's brought down cleanly16:28
jaegerYou shouldn't need to do it manually, no16:29
tilmanso the kernel will do it for me?16:29
jaegerIt should, yeah16:29
tilmanbut stopping it manually wouldn't hurt? =)16:30
jaegerIt would fail if there are open file handles and the fs can't be unmounted or whatever but it should happen automagically16:30
tilmanright16:30
jaegerNope, should be fine :)16:30
tilmani made sure it's not mounted anymore16:30
jaegerif you want to be extra safe it probably wouldn't hurt to sync and stop them16:30
jaegerif you already unmounted then sync won't be useful but you get the idea :)16:31
tilmanyeah16:31
tilmanso, for the record, i stopped all of the arrays (md0 - md3)16:32
tilmanwe need to ask _mavrick61 to replace sdc16:32
tilmanafterwards, i will run the mdadm commands to add the replacement disc to the arrays16:32
tilmanand we should be good \o/16:32
tilmanteK_ jue ^^^^^16:33
jueyeah, we will be :)16:35
juehopefully16:36
jaegertilman: sounds good :)16:36
jaegerjust make sure sdc is partitioned properly first16:36
juejaeger: many thanks for your help16:36
jaegerNo problem at all16:36
_mavrick61I'll think I have simular SATA disk.16:37
tilmanjaeger: oh, right16:37
jaegerIf the disc is different hardware the partition sizes need to be at least as big as the original16:37
jaegereasy to duplicate by copying the numbers from fdisk -l output16:38
tilmanstarting sectors of the partitions need to match the existing discs, right?16:38
jaegerI don't think they do16:38
jaegerjust have to have enough room in each partition to fit the same amount of data16:38
tilmanoh, heh16:38
jaegerfor example, I replaced my 2x160GB drives with 2x320GB in morpheus.net16:39
jaegerso I failed 1 160GB, took it out16:39
jaegerpartitioned the new 320GB with basically the same swap and /boot and / and a much larger /home16:39
jaegerthen readded it and rebuilt the array16:39
tilmanokay, i think i get it16:40
jaegerit resynced the data into the bigger partitions but the filesystem on /home ended up smaller, consequently16:40
jaegernothing to do about that until both drives were upgraded, so I went ahead and did the same with the other 160GB16:40
jaegerafter they were both in and resynced, I resized the FS16:40
jaegerto the max partition size16:40
_mavrick61I have a 160 GB SATA. the original is 82 GB but that should work ok16:40
jaegerthen it was happy16:40
jaegeryes, that should work16:41
jaegerthere will just be about 78GB unused16:41
tilmanwe could turn those unused bytes into an extra partition, right?16:41
jaegersure, though not in the raid config16:41
tilmanwhich we could mount separately from the raid array?16:42
jaegeryeah :)16:42
tilmanright16:42
tilmani really need to get to bed16:43
jaegerbest would be to create that extra partition at the same time you do the raid partitions16:43
tilmanyeah16:43
tilmanmmh16:44
_mavrick61Jaeger, will you be around. tilman will go to sleep. I can change the disk now. But I have not time to do any other maintenance16:44
_mavrick61Local time is 11.45 pm, som I have to sleep in 2-3 hour must be up before 9 am.16:45
jaegerI am leaving work now and have to run an errand, I expect to be home in about an hour. I will reconnect when I get home and say something here if you like16:46
tilmanotherwise i can try to do it in about 8 hours (ie 7:45 am local time)16:47
_mavrick61OK.. I can give same access as tilman. But let me know when you are home again. I replace the disk now.16:47
jaegerI will let you know. I'm leaving work now. :)16:47
tilman_mavrick61: i can send jaeger the login information by encrypted email if you want16:47
tilmanthat's probably safest. if you are okay with that?16:48
jaegertilman: not sure if I have any up-to-date pgp keys =/16:48
tilmanoh okay16:48
tilmannevermind then ;)16:48
jaegerI'll work on getting an updated one so we can use that in the future but for now I don't have any recently16:48
jaegermaybe encrypted IM or a password protect zip or something? anyway, I need to head out16:49
jaegerback soon16:49
tilman_mavrick61: can you please boot into the rescue system again after you replaced the disc?16:54
_mavrick61Hmmm /dev/ms1 is now rechecking16:55
_mavrick61So it seem it has be fixed. I  have add an exre 250 GB sata so we kan copy over the system there and then create a new RAID with new disks16:55
_mavrick61Need password for root to the crux server. It's up and running now.17:12
_mavrick61But I on an other network. So it is not accessbal17:13
frinnstim awake, why?17:34
_mavrick61Asking me?17:38
frinnstno, tilman asked :)17:39
_mavrick61I think he is sleeping. He told me that 1 hour ago.17:39
frinnstah17:39
jaegertilman, _mavrick61: I'm home now17:40
_mavrick61Hi17:43
_mavrick61I don't know what tilman did but the server booted up normally after removed the sdc17:44
jaegerok17:45
_mavrick61And I have added a extra 250 GB had sdd. I think if we can backup the whol system to that disc, then recreate the system with 3 new 160 GB it would be greate.17:45
jaegerThat shouldn't be difficult, I'd think17:45
_mavrick61But shall we recreate that using our rescu pxe boot or?17:46
_mavrick61rescue*17:46
jaegerthe backup and new raid would probably be best done in the rescue environment if it has the right mdadm tools17:46
_mavrick61Rescue booting a debian system17:47
jaegerIf that's what you and Tilman used the first time it should be fine17:47
_mavrick61Yes.  Shall I restart in rescue mode and then give you ssh access17:48
_mavrick61But I need to send you all info in more secoure way. I sent tilman an sms. Can I do the same to you.17:49
_mavrick61You and tilman is long members in core team so i trust you.17:49
jaegerDo we want to do all this right now or schedule it for another time? It will take quite a while17:50
jaegerI don't mind SMS but it would be an international one17:50
*** jue has quit IRC18:41
*** jue has joined #crux-devel18:57
_mavrick61SUCCESS19:31
_mavrick61An international teamwork19:31
teK_\o/19:33
jaegeryay :)19:34
jaegerthe raid resync in md2 is going along happily19:35
jaegers/in/on/19:35
_mavrick61Check the backup script in Cron.19:36
_mavrick61Daily19:36
jaegerI'm looking at it19:37
jaegerncftpput: cannot open etc-bak.dnsnoc.net: username and/or password was not accepted for login.19:37
_mavrick61send me the syntax.19:38
_mavrick61That is wrong?19:38
_mavrick61etc-bak.dnsnoc.net19:38
frinnstis it working?19:41
frinnstwebsite looks ok19:41
jaegerthe server is, yes19:41
jaegerIt's still resyncing the RAID array on /home19:42
frinnstdu är en kämpe _mavrick61 :)19:42
_mavrick61Man gör så gott man kan19:44
*** _mavrick61 has quit IRC21:29
*** _mavrick61 has joined #crux-devel21:30

Generated by irclog2html.py 2.11.0 by Marius Gedminas - find it at mg.pov.lt!