Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 110 additions & 0 deletions REPRODUCTION_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# 复现指南:模拟两节点通信数据包随时间变化的路径

本项目(TinyLEO)提供了 `network_orchestrator` 组件,该组件可以在拥有完整 Linux 网络命名空间(Network Namespaces)权限的服务器中模拟低轨卫星网络,并跟踪两节点通信路径随时间的变化。

由于沙箱(Sandbox)及常规 Docker 容器的权限限制(不支持 `ip netns` 完整创建及底层网络配置),我们将演示如何配置并在您的真实物理机或具备完整权限的虚拟机(Root privileges & Multi-core)上运行。

## 前置环境准备

1. **准备服务器**:
- 至少一台具备 Root 权限的 Ubuntu 20.04+ 服务器。建议内存 >= 120GB,CPU 核心 >= 32(用于生成数千个卫星网络节点)。
- 确保安装了 Python 3.10 或更高版本。

2. **安装依赖环境**:
在具有 Root 权限的终端下执行以下命令安装相关的依赖包(建议使用全局环境或使用 `--break-system-packages`,由于涉及到系统底层网络修改):

```bash
sudo apt-get update
sudo apt-get install -y libnetfilter-queue-dev python3-pip openssh-server
sudo service ssh start

# 安装 Python 依赖包
sudo python3 -m pip install argparse numpy requests paramiko tqdm networkx \
grpcio grpcio-tools NetfilterQueue pyroute2 scapy python-iptables watchdog scipy
```

3. **配置 SSH 免密登录(供仿真使用)**:
**⚠️ 强烈建议:** 此项目会频繁操作底层网络命名空间,必须使用 root 权限。请**务必在独立、隔离的虚拟机或物理测试机中进行**,不要在生产环境中尝试。

仿真器基于 SSH 与各个容器及命名空间交互,需保证配置文件的机器允许 Root SSH 登录。推荐使用 SSH Key 认证:

```bash
# 1. 如果你还没有 ssh key,请先生成
ssh-keygen -t rsa -b 4096 -C "tinyleo_test"

# 2. 将公钥拷贝到目标机器(或本机的 root 账户下)
# 如果是本机测试:
sudo mkdir -p /root/.ssh
sudo cp ~/.ssh/id_rsa.pub /root/.ssh/authorized_keys
sudo chmod 700 /root/.ssh
sudo chmod 600 /root/.ssh/authorized_keys
```

## 代码配置修改与准备

### 1. 生成 gRPC 代码

在项目目录执行以下命令生成 gRPC 通信代码:

```bash
cd network_orchestrator/southbound/
python3 -m grpc_tools.protoc -I. --python_out=./ --grpc_python_out=./ ./link_failure_grpc/link_failure.proto
```

### 2. 配置服务器连接(tinyleo_config.json)

修改 `network_orchestrator/test/config/tinyleo_config.json` 文件中的 `Machines` 数组,填入本机或目标主机的 IP 和 SSH 信息(若使用 key 登录,password 保持配置中对应的值即可,paramiko 也支持通过秘钥自动连接,具体依据你的环境配置):

```json
{
"IP": "101.6.21.153", // 替换为你的服务器 IP
"port": 22,
"username": "root",
"password": "xxx" // 若有密码在此填入,若配置了免密通常可忽略
}
```

### 3. 配置路径追踪脚本(example_orchestrator.py)

我们在 `network_orchestrator/test/example_orchestrator.py` 中已经进行了适配修改,增加了异常捕获,并确保开启了 `TEST = True`。

关键追踪逻辑在 `for timestamp in range(0, sn.duration):` 循环中:
```python
if TEST:
time.sleep(2)
try:
# 记录此时的 traceroute(路径数据)
sn.set_traceroute("GS1", "GS2", f"ts{timestamp}")
except Exception as e:
print(f"Skipping traceroute at timestamp {timestamp} due to error: {e}")
```

在这个循环中,仿真器在每个时间戳都会更新一次拓扑(`sn.update_tinyleo_topology(timestamp)`),并向对应的容器内发送指令执行 `traceroute GS1 GS2`。

## 运行仿真与分析

在终端中执行:

```bash
cd network_orchestrator/test/
sudo python3 example_orchestrator.py
```

### 执行过程
1. **拓扑预测与网络创建**:脚本首先加载所有的卫星数据,开始计算每个时间片(Timestamp)的最优拓扑结构。
2. **命名空间创建**:基于 Paramiko SSH 连接,底层会在系统中创建几千个以 `SH1SAT...` 为前缀的网络命名空间(Network Namespace)。
3. **数据打点(Traceroute)**:系统会在指定的时间戳(默认每 20 秒间隔模拟一次时间推移)向两节点发送 traceroute 指令,结果会保存在新生成的文件夹中(例如 `tinyleo-Arbitrary-LeastDelay/result/` 目录下)。

### 查看路径变化结果

运行结束后,你可以通过以下方式查看数据包传输路径随时间的变化:

1. **自动生成的结果文件**:
在 `network_orchestrator/test/tinyleo-Arbitrary-LeastDelay/result/` 文件夹内,会生成形如 `ts0_traceroute.txt`、`ts1_traceroute.txt` 等日志文件,你可以按时间戳对比路由跳数和经过的节点差异。

2. **进入容器手动验证**:
你可以将 `example_orchestrator.py` 中的清理代码去掉,然后在仿真运行时运行:
```bash
python3 get_container.py
```
它会输出 `nsenter` 命令(如 `nsenter -m -u -i -n -p -t <PID> bash`),允许你直接进入卫星或者地面站容器,手动执行 `traceroute <目标节点IP>` 来追踪当前路由。
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,20 @@
import subprocess
from tqdm import tqdm

dir = '/root/tinyleo-Arbitrary-LeastDelay'
import sys
import os

# Allow overriding dir via command line argument, fallback to current dir or hardcoded path
if len(sys.argv) > 1:
dir = sys.argv[1]
else:
dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

pid_path = f"{dir}/container_pid.txt"
if not os.path.exists(pid_path):
print(f"Error: PID file not found at {pid_path}")
sys.exit(1)

pid_maps = {}
with open(pid_path, 'r') as f:
for line in f:
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Generated by the gRPC Python protocol compiler plugin. DO NOT EDIT!
"""Client and server classes corresponding to protobuf-defined services."""
import grpc
import warnings

from link_failure_grpc import link_failure_pb2 as link__failure__grpc_dot_link__failure__pb2

GRPC_GENERATED_VERSION = '1.78.0'
GRPC_VERSION = grpc.__version__
_version_not_supported = False

try:
from grpc._utilities import first_version_is_lower
_version_not_supported = first_version_is_lower(GRPC_VERSION, GRPC_GENERATED_VERSION)
except ImportError:
_version_not_supported = True

if _version_not_supported:
raise RuntimeError(
f'The grpc package installed is at version {GRPC_VERSION},'
+ ' but the generated code in link_failure_grpc/link_failure_pb2_grpc.py depends on'
+ f' grpcio>={GRPC_GENERATED_VERSION}.'
+ f' Please upgrade your grpc module to grpcio>={GRPC_GENERATED_VERSION}'
+ f' or downgrade your generated code using grpcio-tools<={GRPC_VERSION}.'
)


class LinkFailureServiceStub(object):
"""Missing associated documentation comment in .proto file."""

def __init__(self, channel):
"""Constructor.

Args:
channel: A grpc.Channel.
"""
self.HandleLinkFailure = channel.unary_unary(
'/LinkFailureService/HandleLinkFailure',
request_serializer=link__failure__grpc_dot_link__failure__pb2.LinkFailureRequest.SerializeToString,
response_deserializer=link__failure__grpc_dot_link__failure__pb2.LinkFailureResponse.FromString,
_registered_method=True)


class LinkFailureServiceServicer(object):
"""Missing associated documentation comment in .proto file."""

def HandleLinkFailure(self, request, context):
"""Missing associated documentation comment in .proto file."""
context.set_code(grpc.StatusCode.UNIMPLEMENTED)
context.set_details('Method not implemented!')
raise NotImplementedError('Method not implemented!')


def add_LinkFailureServiceServicer_to_server(servicer, server):
rpc_method_handlers = {
'HandleLinkFailure': grpc.unary_unary_rpc_method_handler(
servicer.HandleLinkFailure,
request_deserializer=link__failure__grpc_dot_link__failure__pb2.LinkFailureRequest.FromString,
response_serializer=link__failure__grpc_dot_link__failure__pb2.LinkFailureResponse.SerializeToString,
),
}
generic_handler = grpc.method_handlers_generic_handler(
'LinkFailureService', rpc_method_handlers)
server.add_generic_rpc_handlers((generic_handler,))
server.add_registered_method_handlers('LinkFailureService', rpc_method_handlers)


# This class is part of an EXPERIMENTAL API.
class LinkFailureService(object):
"""Missing associated documentation comment in .proto file."""

@staticmethod
def HandleLinkFailure(request,
target,
options=(),
channel_credentials=None,
call_credentials=None,
insecure=False,
compression=None,
wait_for_ready=None,
timeout=None,
metadata=None):
return grpc.experimental.unary_unary(
request,
target,
'/LinkFailureService/HandleLinkFailure',
link__failure__grpc_dot_link__failure__pb2.LinkFailureRequest.SerializeToString,
link__failure__grpc_dot_link__failure__pb2.LinkFailureResponse.FromString,
options,
channel_credentials,
insecure,
call_credentials,
compression,
wait_for_ready,
timeout,
metadata,
_registered_method=True)
20 changes: 19 additions & 1 deletion network_orchestrator/southbound/pyctr.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,25 @@ static int container_init(
}

char controller_src[PATH_MAX+add_len];
snprintf(controller_src, sizeof(controller_src), "/root/tinyleo-Arbitrary-LeastDelay/controller");
// Find the base directory (e.g. /home/user/tinyleo-Arbitrary-LeastDelay/shellX/overlay/SATY)
// We assume the experiment_name is "tinyleo-Arbitrary-LeastDelay" and controller is adjacent
// Let's pass it via argument or infer it from base_dir by looking for "shell" or "GS"
// Since we can't easily change the C interface without updating Python everywhere, we'll
// infer it by finding the path up to the experiment directory.
const char *exp_marker1 = "/shell";
const char *exp_marker2 = "/GS-";
char exp_dir[PATH_MAX] = {0};
strncpy(exp_dir, base_dir, PATH_MAX - 1);
char *ptr = strstr(exp_dir, exp_marker1);
if (!ptr) ptr = strstr(exp_dir, exp_marker2);
if (ptr) {
*ptr = '\0';
} else {
// Fallback
strncpy(exp_dir, "/root/tinyleo-Arbitrary-LeastDelay", PATH_MAX - 1);
}

snprintf(controller_src, sizeof(controller_src), "%s/controller", exp_dir);
if(mount(controller_src, controller_dst, NULL, MS_BIND|MS_REC, NULL) != 0) {
return child_err("mount --bind controller failed: ", err_fd);
}
Expand Down
2 changes: 1 addition & 1 deletion network_orchestrator/southbound/sn_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,7 @@ def deploy_tinyleo_srv6_agent(self):
"""
sn_remote_wait_output(
self.ssh,
f"python3 {self.dir}/controller/geographic_srv6_anycast/deploy_srv6_agent.py"
f"python3 {self.dir}/controller/geographic_srv6_anycast/deploy_srv6_agent.py {self.dir}"
)

def ping_async(self, res_path, src, dst):
Expand Down
17 changes: 15 additions & 2 deletions network_orchestrator/southbound/sn_remote.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,6 +501,7 @@ def _load_netns(pid, name):

pid_file = open(dir + '/' + PID_FILENAME, 'w', encoding='utf-8')
sat_cnt = 0
os.makedirs("/run/netns", exist_ok=True) # Ensure /run/netns exists
for shell_id, mid_dict in enumerate(sat_mid_dict_shell):
for node, mid in mid_dict.items():
if mid != machine_id:
Expand All @@ -509,7 +510,13 @@ def _load_netns(pid, name):
node_dir = f"{dir}/shell{shell_id}/overlay/{node}"
sat_cnt += 1
os.makedirs(node_dir, exist_ok=True)
pid_file.write(node+':'+str(pyctr.container_run(node_dir, node))+' ')
try:
c_pid = pyctr.container_run(node_dir, node)
if c_pid <= 0:
print(f"Error starting container for {node}")
pid_file.write(f"{node}:{c_pid} ")
except Exception as e:
print(f"Exception starting container for {node}: {e}")
pid_file.write('\n')
print(f'[{machine_id}] shell {shell_id}: {sat_cnt} satellites initialized')

Expand All @@ -522,7 +529,13 @@ def _load_netns(pid, name):
gs_lst.append(node)
node_dir = f'{overlay_dir}/{node}'
os.makedirs(node_dir, exist_ok=True)
pid_file.write(node+':'+str(pyctr.container_run(node_dir, node))+' ')
try:
c_pid = pyctr.container_run(node_dir, node)
if c_pid <= 0:
print(f"Error starting container for {node}")
pid_file.write(f"{node}:{c_pid} ")
except Exception as e:
print(f"Exception starting container for {node}: {e}")
pid_file.write('\n')
print(f'[{machine_id}] GS:', ','.join(gs_lst))

Expand Down
37 changes: 23 additions & 14 deletions network_orchestrator/test/example_orchestrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,22 +73,31 @@
sn.update_tinyleo_topology(timestamp)
if TEST:
time.sleep(2)
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}")
try:
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}")
except Exception as e:
print(f"Skipping traceroute at timestamp {timestamp} due to error: {e}")
if timestamp == 0:
if TEST:
time.sleep(1)
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}_before_link_failure")
sn.set_ping("GS1", "GS2", f"ts{timestamp}_when_link_failure")
sn.set_iperf("GS1", "GS2", f"ts{timestamp}_when_link_failure")
time.sleep(1)
sn.tinyleo_fault_test()
time.sleep(2)
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}_after_link_failure")
time.sleep(update_time-0.5 - (time.time() - start_time))
try:
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}_before_link_failure")
sn.set_ping("GS1", "GS2", f"ts{timestamp}_when_link_failure")
sn.set_iperf("GS1", "GS2", f"ts{timestamp}_when_link_failure")
time.sleep(1)
sn.tinyleo_fault_test()
time.sleep(2)
sn.set_traceroute("GS1", "GS2",f"ts{timestamp}_after_link_failure")
except Exception as e:
print(f"Skipping tests at timestamp 0 due to error: {e}")
time.sleep(max(0, update_time-0.5 - (time.time() - start_time)))
if timestamp != sn.duration - 1:
sn.set_ping("GS1", "GS2", f"ts{timestamp}-{timestamp+1}_topo_change")
sn.set_iperf("GS1", "GS2", f"ts{timestamp}-{timestamp+1}_topo_change")
time.sleep(update_time - (time.time() - start_time))
try:
sn.set_ping("GS1", "GS2", f"ts{timestamp}-{timestamp+1}_topo_change")
sn.set_iperf("GS1", "GS2", f"ts{timestamp}-{timestamp+1}_topo_change")
except Exception as e:
print(f"Skipping cross-timestamp tests due to error: {e}")
time.sleep(max(0, update_time - (time.time() - start_time)))

if input('clear environment?[y/n]').strip().lower()[:1] == 'y':
sn.clean()
# Automatically clean up environment without waiting for user input
sn.clean()