Sys Admin Pocket Survival Guide

CoreOS 101

A DevOps approach to Linux. Minimalist OS. Dual booting partion, so run live on one, upgrade on the other, then a quick reboot to have a complete upgraded system. Easy revert to previous known state.
There are 3 main building components for CoreOS:

etcD, which provides a centralized configuration mechanism for cluster of CoreOS machines.

Docker. Applications/services are provided via container instead of yum/apt

SystemD
DevOps is taking over the role of traditional sys admins, and CoreOS may just be the ticket for their world dominance :) InfoWorld article that CoreOS poses an existential threat to Linux vendors
So, if you can't beat 'em, join 'em :)

Core OS Characteristics and Technologies

CoreOS Architecture Diagram by Offnfopt - Own work. Licensed under Public Domain via Commons - https://commons.wikimedia.org/wiki/File:CoreOS_Architecture_Diagram.svg#/media/File:CoreOS_Architecture_Diagram.svg

FastPatch update a second partition of the dual partition boot system for consisten upgrade and stable roll back.
kexec (kernel exec) can be used for fastbooting. Essentially, kernel from the standby partition is loaded into memory, and it starts executing right away. This bypass BIOS hardware initialization and bootloader.
/usr is mounted as read-only, since update is done as a whole new OS is installed on the passive partition.
etcd, fleetd written in the GO programming language.
Intra-node communication use std unix socket.
inter-node communication uses SSH tunnel and auth (fleetctl).
Originally used Docker exclusively, now have support for Rocket (rkt).
fleet and locksmith may make CoreOS into a very interesting HPC system...

Initial install

CoreOS is build using ChromeOS' SDK.  
The doc all points to running CoreOS as a VM in the cloud, but 
it can actually run on bare metal.
The install ISO is 211 MB.  https://coreos.com/os/docs/latest/booting-with-iso.html

It provides only essentials so that it can run conainers (eg docker), 
utilizing cgroup and namespace for resource management and security.  
There are minimal additional functionality for service discovery and configuration sharing (via etcd).

Disk partition uses the newer GUID method.  
fdisk doesn't fully work.  coreos don't ship with gparted.  use cgpt instead.

partiton 3 (and 4 are the active/standby for USR-A and USR-B, respectively.  
GTP is flagged to boot from either one of this, and such boot partition will be set as read-only.
/usr in AWS CoreOS image used about 500 MB.
There are 482 uniq commands in /usr/bin, 260 in /usr/sbin 
(my cygwin has about 800 and 16 command in /bin and /sbin)

parttion 9 is for stateful / partition (only one of this, no standby.  upgrade don't change anything here?)
/ used about 666 MB

sudo cgpt show /dev/xvda	# show partition table, /usr will be booted on part with priority=1

CoreOS partition doc says:

Due to the unique disk layout of CoreOS, an rm -rf / is an un-supported but valid operation to do a "factory reset". The machine should boot and operate normally afterwards.

etcd

etcd provides a replicated key-value store of all system configuration
each node in the cluster run a copy of etcd, thus HA and all data avail to all nodes.
Utilizes Raft algorithm for master election and consensus.
etcd also double as service discovery. Application announce themselves to etcd.
communication with etcd is thru REST (JSON on top of HTTP) http://127.0.0.1:4001/v2/keys/. These is accessible inside the container as well.
etcdctl is the cli interface. can read and set key-value pair

fleetd

cluster manager to control the systemd of each coreos node. ie, a distributed Init system
fleetclt provides a single point to manage process across the cluster, which is especially useful for managing HA apps.

locksmithd

locksmith is a mechanism to control the booting of a cluster of CoreOS machines. This is so that less than half the cluster is upgraded at any point in time, ensuring some basic level of service functionality.
Each node run a locksithd daemon
locksmithctl is a cli to control this.

Update/Reboot

CoreOS perform update automatically. It wries the upgraded OS into the standby usr, then reboot when system is idle. System idle is determined by etcd, if etcd-lock can perform the lock, then it can invoke reboot. Well, in a cluster environment where CoreOS is intended, services are migrated to other nodes, thus eventually etcd-lock will succeed, and it will be able to invoke a reboot.
For details, see CoreOS update strategies and forum
manually update CoreOS
So, not too sure about stand alone system (eg one in lab/test use)...

Pros and Cons of CoreOS vs traditional Linux

Commands/Troubleshooting

sudo update_engine_client -update		# manually run upgrade, for debug/troubleshooting (update are automatic normally)

Reference

...

Copyright info about this work

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike2.5 License. Pocket Sys Admin Survival Guide: for content that I wrote, (CC) some rights reserved. 2005,2012 Tin Ho [ tin6150 (at) gmail.com ]
Some contents are "cached" here for easy reference. Sources include man pages, vendor documents, online references, discussion groups, etc. Copyright of those are obviously those of the vendor and original authors. I am merely caching them here for quick reference and avoid broken URL problems.

Where is PSG hosted these days?

tiny.cc/coreOS
http://tin6150.github.io/psg/psg2.html This new home page at github
http://tiny.cc/tin6150/ New home in 2011.06.
http://tin6150.s3-website-us-west-1.amazonaws.com/psg.html (coming soon)
ftp://sn.is-a-geek.com/psg/psg.html My home "server". Up sporadically.
http://tin6150.github.io/psg/psg.html
http://www.fiu.edu/~tho01/psg/psg.html (no longer updated as of 2007-05)