Amir Shehata, Whamcloud DDN
Amir Shehata has been working on Lustre Networking for the past 7 years. He implemented multiple features including dynamic configuration and the multi-rail feature set among others.
Lustre Networking (LNet) can have multiple interfaces available for sending messages. These interfaces can be of the same or different types (IE eth, mlx, opa). It is important for Lustre to use all the interfaces to increase both its bandwidth and resiliency. To effectively use the interfaces in such a manner an LNet Multi-Rail and Health features were implemented. The Multi-Rail feature allows the use of all available interfaces irregardless of the underlying wire protocol. The Health feature adds resiliency such that the healthiest interfaces are always used. A single Lustre network must have homogeneous interfaces, example all Mellanox or all OPA. However, if two Lustre peers can be reached on both networks, there is no reason not to use both to communicate between the peers. However, if one of the networks becomes unreachable, Lustre traffic should switch to the other network without dropping messages. This is achieved by maintaining the health status of each of the interfaces and resending messages over other healthier interfaces. A set of parameters such as timeout, retry count and sensitivity along with traffic control policies provide fine-tuned traffic control to allow admins to configure Lustre in a way which fits their network.
Ещё видео!