学习目标
✅ 掌握进程查看和管理命令
✅ 学会系统资源监控和分析
✅ 掌握计划任务配置
✅ 理解systemd服务管理
第一部分:进程基础 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 cd ~/linux-learning/day4mkdir process-managementcd process-managementecho "当前Shell的PID: $$" echo "当前Shell的父进程PID: $PPID " ping baidu.com > /dev/null & while true ; do echo "running" > /dev/null; sleep 5; done &
第二部分:进程查看命令 1. ps - 进程快照 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ps ps aux ps -ef ps aux | grep nginx ps aux --sort =-%cpu ps aux --sort =-%mem ps -u root ps -p 1 ps -eo pid,ppid,cmd,%cpu,%mem --sort =-%cpu | head -10 echo "=== 当前系统前5个CPU消耗最大的进程 ===" ps aux --sort =-%cpu | head -6 echo "=== 当前系统前5个内存消耗最大的进程 ===" ps aux --sort =-%mem | head -6
2. top - 动态进程监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 top top -b -n 1 > top_snapshot.txt top -b -d 2 -n 5 > top_log.txt top -p $(pgrep -d ',' -f "nginx|mysql" )
3. 进程树查看 1 2 3 4 5 6 7 8 9 10 11 12 13 14 dnf install -y psmisc pstree pstree -p pstree -u pstree -ap pstree -p | grep sshd -A 5 -B 5 ps -p $$ -o ppid=
第三部分:进程控制 1. 进程信号和kill 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 kill -lcat > test_process.sh << 'EOF' echo "测试进程启动,PID: $$" count=0 while true ; do echo "运行中... $count " ((count++)) sleep 2 done EOF chmod +x test_process.sh./test_process.sh & TEST_PID=$! echo "测试进程PID: $TEST_PID " kill -15 $TEST_PID kill -9 $TEST_PID kill -19 $TEST_PID kill -18 $TEST_PID kill -1 $TEST_PID pkill -f "test_process" killall test_process.sh kill -HUP $(cat /var/run/nginx.pid)nginx -s reload
2. 进程优先级(nice) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ps -eo pid,pri,nice ,cmd | head -10 nice -n 10 ./test_process.sh & renice -n 5 -p $TEST_PID renice -n -5 -p $TEST_PID cat > cpu_test.sh << 'EOF' echo "CPU测试开始 - PID: $$" for i in {1..10000}; do echo "scale=5000; 4*a(1)" | bc -l > /dev/null done echo "测试完成" EOF chmod +x cpu_test.sh./cpu_test.sh & ./cpu_test.sh & renice -n 10 $(pgrep -f cpu_test.sh | head -1) top -p $(pgrep -f cpu_test.sh | tr '\n' ',' | sed 's/,$//' )
第四部分:系统资源监控 1. 内存监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 free -h free -m free -s 2 cat /proc/meminfo | head -10cat > memory_monitor.sh << 'EOF' THRESHOLD=90 LOG_FILE="memory_alert.log" while true ; do MEM_USAGE=$(free | grep Mem | awk '{print ($3/$2)*100}' ) MEM_USAGE_INT=$(printf "%.0f" $MEM_USAGE ) if [ $MEM_USAGE_INT -gt $THRESHOLD ]; then echo "$(date) : 警告!内存使用率: $MEM_USAGE %" >> $LOG_FILE echo "Top 5内存进程:" >> $LOG_FILE ps aux --sort =-%mem | head -6 >> $LOG_FILE echo "---" >> $LOG_FILE fi echo "$(date) : 当前内存使用: $MEM_USAGE %" sleep 10 done EOF chmod +x memory_monitor.sh
2. CPU监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 dnf install -y sysstat mpstat 2 5 cat /proc/cpuinfo | grep "model name" | head -1nproc uptime cat /proc/loadavg dnf install -y epel-release && dnf install stress -y stress --cpu 4 --timeout 60 & top -b -n 5 -d 1 | grep "Cpu"
3. 磁盘IO监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 iostat -x 2 5 dnf install -y iotop iotop -o dd if =/dev/zero of=testfile bs=1M count=1000 &iotop -b -n 5 > iotop.log
4. 网络监控 1 2 3 4 5 6 7 8 9 10 11 12 13 14 netstat -tuln ss -tuln netstat -an | grep ESTABLISHED | wc -l ss -s dnf install -y nload nload eth0 watch -n 1 'netstat -i'
第五部分:计划任务 1. crontab - 定时任务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 crontab -e * * * * * echo "$(date) : 计划任务运行中" >> ~/cron_test.log 0 2 * * * tar -czf /backup/home_$(date +\%Y\%m\%d).tar.gz /home/ 0 3 * * 1 find /var/log -name "*.log" -mtime +7 -delete crontab -l cat > /home/$USER /system_monitor.sh << 'EOF' LOG_DIR="/var/log/system_monitor" mkdir -p $LOG_DIR TIMESTAMP=$(date +%Y%m%d_%H%M%S) REPORT="$LOG_DIR /report_$TIMESTAMP .log" { echo "=== 系统监控报告 $TIMESTAMP ===" echo "负载: $(uptime) " echo "内存: $(free -h) " echo "磁盘: $(df -h /) " echo "Top CPU进程:" ps aux --sort =-%cpu | head -5 echo "连接数: $(ss -s | grep "estab" ) " } >> $REPORT find $LOG_DIR -name "report_*.log" -mtime +7 -delete EOF chmod +x /home/$USER /system_monitor.sh(crontab -l 2>/dev/null; echo "*/5 * * * * /home/$USER /system_monitor.sh" ) | crontab -
2. systemd定时器 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 sudo cat > /etc/systemd/system/monitor.service << 'EOF' [Unit] Description=System Monitor Service [Service] Type=oneshot ExecStart=/home/你的用户名/system_monitor.sh User=你的用户名 EOF sudo cat > /etc/systemd/system/monitor.timer << 'EOF' [Unit] Description=Run monitor every hour [Timer] OnCalendar=hourly Persistent=true [Install] WantedBy=timers.target EOF sudo systemctl daemon-reloadsudo systemctl enable monitor.timersudo systemctl start monitor.timersystemctl list-timers
第六部分:systemd服务管理 1. 服务基本操作 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 systemctl status sshd systemctl status nginx systemctl start nginx systemctl stop nginx systemctl restart nginx systemctl reload nginx systemctl enable nginx systemctl disable nginx systemctl is-active sshd systemctl is-enabled sshd systemctl list-units --type =service --all systemctl list-units --type =service --state=running
2. 创建自定义服务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 cat > /home/$USER /myweb.sh << 'EOF' while true ; do echo -e "HTTP/1.1 200 OK\n\nHello World" | nc -l -p 8080 -q 1 done EOF chmod +x /home/$USER /myweb.shsudo cat > /etc/systemd/system/myweb.service << 'EOF' [Unit] Description=My Simple Web Server After=network.target [Service] Type=simple User=你的用户名 ExecStart=/home/你的用户名/myweb.sh Restart=always RestartSec=10 [Install] WantedBy=multi-user.target EOF sudo systemctl daemon-reloadsudo systemctl start mywebsudo systemctl enable mywebsudo systemctl status mywebcurl http://localhost:8080
第七部分:综合实战 - 系统监控平台 现在我们把今天学的知识组合起来,创建一个简单的系统监控平台:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 cd ~/linux-learning/day4mkdir monitoring-platformcd monitoring-platformcat > monitor_collector.sh << 'EOF' CONFIG_DIR="$HOME /linux-learning/day4/monitoring-platform" DATA_DIR="$CONFIG_DIR /data" mkdir -p $DATA_DIR collect_cpu () { echo "=== CPU信息 ===" > $DATA_DIR /cpu.txt top -bn1 | head -3 >> $DATA_DIR /cpu.txt echo "CPU使用率TOP5:" >> $DATA_DIR /cpu.txt ps aux --sort =-%cpu | head -6 >> $DATA_DIR /cpu.txt } collect_memory () { echo "=== 内存信息 ===" > $DATA_DIR /memory.txt free -h >> $DATA_DIR /memory.txt echo >> $DATA_DIR /memory.txt echo "内存使用TOP5:" >> $DATA_DIR /memory.txt ps aux --sort =-%mem | head -6 >> $DATA_DIR /memory.txt } collect_disk () { echo "=== 磁盘信息 ===" > $DATA_DIR /disk.txt df -h >> $DATA_DIR /disk.txt echo >> $DATA_DIR /disk.txt echo "IO统计:" >> $DATA_DIR /disk.txt iostat -x 1 1 | grep -v "^$" >> $DATA_DIR /disk.txt } collect_network () { echo "=== 网络信息 ===" > $DATA_DIR /network.txt echo "连接数统计:" >> $DATA_DIR /network.txt ss -s >> $DATA_DIR /network.txt echo >> $DATA_DIR /network.txt echo "监听端口:" >> $DATA_DIR /network.txt ss -tuln >> $DATA_DIR /network.txt } collect_process () { echo "=== 进程信息 ===" > $DATA_DIR /process.txt echo "总进程数: $(ps aux | wc -l) " >> $DATA_DIR /process.txt echo "僵尸进程: $(ps aux | grep 'Z' | wc -l) " >> $DATA_DIR /process.txt echo >> $DATA_DIR /process.txt echo "systemd进程树:" >> $DATA_DIR /process.txt pstree -p | grep systemd -A 5 >> $DATA_DIR /process.txt } generate_html () { HTML_FILE="$DATA_DIR /report_$(date +%Y%m%d_%H%M%S) .html" cat > $HTML_FILE << HTMLHEAD <!DOCTYPE html> <html> <head> <title>系统监控报告</title> <style> body { font-family: Arial; margin: 20px; } h1 { color: #333; } h2 { color: #666; border-bottom: 1px solid #ccc; } pre { background: #f5f5f5; padding: 10px; border-radius: 5px; } .warning { color: red; } .normal { color: green; } </style> </head> <body> <h1>系统监控报告 - $(date)</h1> HTMLHEAD for section in cpu memory disk network process; do echo "<h2>${section^^} 信息</h2>" >> $HTML_FILE echo "<pre>" >> $HTML_FILE cat $DATA_DIR /$section .txt >> $HTML_FILE echo "</pre>" >> $HTML_FILE done echo "</body></html>" >> $HTML_FILE echo "HTML报告已生成: $HTML_FILE " } main () { echo "开始收集系统信息..." collect_cpu collect_memory collect_disk collect_network collect_process generate_html echo "收集完成!" } main EOF chmod +x monitor_collector.shcat > alert_checker.sh << 'EOF' CPU_THRESHOLD=80 MEM_THRESHOLD=90 DISK_THRESHOLD=85 ALERT_LOG="alerts.log" check_cpu () { CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1) CPU_USAGE_INT=${CPU_USAGE%.*} if [ $CPU_USAGE_INT -gt $CPU_THRESHOLD ]; then echo "$(date) : CPU告警 - 使用率 $CPU_USAGE %" >> $ALERT_LOG return 1 fi return 0 } check_memory () { MEM_USAGE=$(free | grep Mem | awk '{print ($3/$2)*100}' ) MEM_USAGE_INT=${MEM_USAGE%.*} if [ $MEM_USAGE_INT -gt $MEM_THRESHOLD ]; then echo "$(date) : 内存告警 - 使用率 $MEM_USAGE %" >> $ALERT_LOG return 1 fi return 0 } check_disk () { DISK_USAGE=$(df / | tail -1 | awk '{print $5}' | sed 's/%//' ) if [ $DISK_USAGE -gt $DISK_THRESHOLD ]; then echo "$(date) : 磁盘告警 - 使用率 $DISK_USAGE %" >> $ALERT_LOG return 1 fi return 0 } check_process () { ZOMBIE_COUNT=$(ps aux | grep 'Z' | wc -l) if [ $ZOMBIE_COUNT -gt 5 ]; then echo "$(date) : 进程告警 - 僵尸进程数 $ZOMBIE_COUNT " >> $ALERT_LOG return 1 fi return 0 } echo "=== 开始检查 $(date) ===" check_cpu check_memory check_disk check_process if [ -s $ALERT_LOG ]; then echo "⚠️ 发现告警,请查看 $ALERT_LOG " tail -5 $ALERT_LOG else echo "✅ 所有检查正常" fi EOF chmod +x alert_checker.sh(crontab -l 2>/dev/null; echo "*/10 * * * * $PWD /monitor_collector.sh" ) | crontab - (crontab -l 2>/dev/null; echo "*/5 * * * * $PWD /alert_checker.sh" ) | crontab - ./monitor_collector.sh ./alert_checker.sh ls -la data/
第八部分:今日挑战任务 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ps aux --sort =-%cpu | awk '$3 > 50 {print $2}' | while read pid; do echo "降低进程 $pid 的优先级" renice -n 10 -p $pid done cat > cleanup_logs.sh << 'EOF' LOG_DIR="/var/log" KEEP_PATTERN="*-??????01" find $LOG_DIR -name "*.log" -type f -mtime +30 ! -name "$KEEP_PATTERN " -delete find $LOG_DIR -name "*.log.*" -type f -mtime +30 ! -name "$KEEP_PATTERN " -delete echo "$(date) : 清理完成" >> /var/log/cleanup.logEOF cat > process_watchdog.sh << 'EOF' PROCESS_NAME="nginx" CHECK_INTERVAL=10 while true ; do if ! pgrep -x "$PROCESS_NAME " > /dev/null; then echo "$(date) : $PROCESS_NAME 未运行,正在启动..." systemctl start $PROCESS_NAME echo "$(date) : 已启动 $PROCESS_NAME " >> watchdog.log fi sleep $CHECK_INTERVAL done EOF cat > load_stat.sh << 'EOF' STATS_DIR="/var/log/loadstats" mkdir -p $STATS_DIR HOUR=$(date +%H) LOAD=$(uptime | awk -F'load average:' '{print $2}' ) echo "$(date +%Y-%m-%d_%H:%M) $LOAD " >> $STATS_DIR /load_$(date +%Y%m%d).log if [ "$HOUR " = "00" ]; then YESTERDAY=$(date -d "yesterday" +%Y%m%d) REPORT="$STATS_DIR /report_$YESTERDAY .txt" echo "=== 昨日负载报告 $YESTERDAY ===" > $REPORT echo "最大负载: $(sort -k5 -rn $STATS_DIR/load_$YESTERDAY.log | head -1) " >> $REPORT echo "最小负载: $(sort -k5 -n $STATS_DIR/load_$YESTERDAY.log | head -1) " >> $REPORT echo "平均负载: $(awk '{sum+=$5} END {print sum/NR}' $STATS_DIR/load_$YESTERDAY.log) " >> $REPORT fi EOF
第九部分:面试题精选 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 答:uptime , top, w 排查:top看CPU进程,iostat看IO,netstat看网络 答:lsof -i:80 或 netstat -tulnp | grep :80 答:nohup script.sh & 或 使用screen/tmux 答:子进程结束但父进程没有回收,状态为Z 处理:kill 父进程,让init回收 答:使用cpulimit工具,或cgroups 答:需要,用\%表示 答:systemctl是新版systemd的命令,service是旧版SysVinit
第十部分:今日总结 今天你掌握了:
✅ 进程查看:ps, top, pstree
✅ 进程控制:kill, nice, renice
✅ 资源监控:free, iostat, mpstat
✅ 计划任务:crontab, systemd timer
✅ 服务管理:systemctl
实战经验:
创建了系统监控平台
实现了自动告警
配置了定时任务
管理了系统服务
今日名言 :”进程是Linux的心脏,监控是运维的眼睛。”