КРАТКИЕ СООБЩЕНИЯ BRIEF NOTES
УДК 007.51
DOI: 10.17586/0021-3454-2024-67-10-893-898
VARIABLE IMPEDANCE LEARNING CONTROL FOR ROBOTIC ARMS FROM GMR-ENCODED BEHAVIOR PRIORS
Waddah Ali, S. A. Kolyubin*
ITMO University, St. Petersburg, Russia * [email protected]
Abstract. This study presents a control approach, where Cartesian variable impedance control parameters are tuned online as the result of quadratic programming optimization dynamically modulating stiffness and damping coefficients based on desired sensory-motor skill encoded by Gaussian mixture regression behavior prior model.
Keywords: learning from demonstration, variable impedance control, skill transfer, contact manipulation
For citation: Waddah Ali, Kolyubin S. A. Variable impedance learning control for robotic arms from GMR-encoded behavior priors. Journal of Instrument Engineering. 2024. Vol. 67, N 10. P. 893-898. DOI: 10.17586/0021-3454-202467-10-893-898.
УПРАВЛЕНИЕ МАНИПУЛЯТОРАМИ С НАСТРОЙКОЙ ИМПЕДАНСНЫХ РЕГУЛЯТОРОВ НА ОСНОВЕ МОДЕЛЕЙ СЕНСОРНО-МОТОРНЫХ НАВЫКОВ
Али Ваддах, С. А. Колюбин*
Университет ИТМО, Санкт-Петербург, Россия * [email protected]
Аннотация. Представлен метод управления, при котором параметры импедансного регулятора в декартовом пространстве настраиваются в режиме реального времени посредством оптимизации на основе метода квадратичного программирования. Настройка параметров выполняется в соответствии с генерируемыми моделями сенсорно-моторных навыков, желаемыми профилями скоростей и сил взаимодействия инструмента робота с окружением.
Ключевые слова: обучение на основе демонстрации, управление переменным импедансом, передача навыков, манипуляция с контактом
Ссылка для цитирования: Ваддах Али, Колюбин С. А. Управление манипуляторами с настройкой импедансных регуляторов на основе моделей сенсорно-моторных навыков // Изв. вузов. Приборостроение. 2024. Т. 67, № 10. С. 893-898. DOI: 10.17586/0021-3454-2024-67-10-893-898.
Introduction. This work is aimed at developing variable impedance learning control strategy (VIC) from collected behavior priors which is an extension to our previous work [1]. Imitation learning (IL) or learning by demonstration (LfD) techniques are tools that enable machines to imitate human behavior to perform a task [2, 3]. Standard LfD approaches have focused on path-following problems, but recent developments have expanded robot learning to the impedance domain [4]. In [5] an approach to improve Gaussian clusters then further Gaussian Mixture Model/Gaussian Mixture Regression (GMM/GMR) so that LfD enabled cobots can carry out a variety of complex manufacturing
© Waddah Ali, Kolyubin S. A., 2024
tasks effectively was optimized. Stiffness matrices were estimated using residuals from the regression process instead of calculating the optimal stiffness subjected to generic constraints. Kinesthetic demonstrations were adopted in [6] to teach a robot change in stiffness based on tactile sensations. In this research, only desired trajectory was fed as input to the robotic manipulator while learning stiffness was by disturbing the robot during execution of the trajectory to learn the appropriate stiffness by cartesian/joint impedance controller. A similar approach was followed in [7] but use the constraints derived from [8] to ensure the convergence of the trajectory obtained using GMR.
In [9], the damping term from the interaction model was excluded and GMM was used to encode the end effector position. Work in [10] used LfD to study the motion and impedance parameters of two manipulators performing two-handed assembly. In their assessments, they show that adapting the impedance of both robots in both rotation and feed is beneficial because it allows the assembly task to be completed faster and with fewer joint movements.
Problem Statement. In this work the GMR behavior prior model (BP model) developed in [1] was adopted as a generator that takes time and material category U = [tt, Mt] c R2xN as input and the outputs desired trajectory/ twist and wrench data V = [Pd, Wd] c R18xn.
The case study focuses on the robot learning impedance parameters during straight cuts by fixing the X-axis orientation of the scalpel while allowing other angles to adjust, thus simplifying the problem by disregarding learned angular velocities and torques.
The dimensionality of the BP model output [1] was then reduced to
V = [Xd, Xd, Fd] c r9
N
where Xd = [Xd, Yd, Zd] c R3xN, Xd = [Xd, Yd, Zd] c R3xN and Fd = [Fd, Fd, Fd] c R3xN are desired linear trajectory, velocity and Forces learnt from behavior priors model respectively.
The output was then fed to a QP (Quadratic programming) optimizer that allows online modulation of the stiffness of the Cartesian whole-body controller illustrated in Fig. 1.
Fig. 1
The QP is formulated as follows:
min -X (||Ffxt - Fd||Q + ||Kd - Kmin||R)
Kf,lfERmXm2 1=1
Kmin < Kd < Kmax; i G {1, ..., N}, imin < if < imax; i G {1, ...,N}, Fmin < Ffxt < Fmax; i G {1, N},
T(Xt) > G,
where inequality constraints are considered element-wise, m is the cartesian DoF number, Kd, Kmin, Kmax G ^mXm are the optimized, minimum and maximum stiffness matrices respectively, , ^mm, imax g %mXm are the optimized, minimum and maximum damping ratio metrices respectively, is the initial minimum tank energy that should be stored to maintain the system passivity, Fext the estimated external force generated on the scalpel blade during cutting process at time step i,
Ffxt = KdXi + DdXi, X = X - Xd,
X = X - Xd,
Dd = 2^,
where Xi is the position error between the actual scalpel pos and desired one at time step i, Xi is the linear velocity error accordingly, T(xt) is the tank that stores the energy and xt is the state of the tank,
T(xt) = -xi2, 2
the tank energy is initialized so that T(xt(0)) > g.
Each time step i, the robot must be controlled via VIC controller that adjusts the stiffness for sake of reaching the desired cutting force at the instant position error Xi. As this is done online, only the instant time step is considered in this approach. Thus, the problem statement is as follows:
1 N
min - X (||Fext - Fd||Q + ||diag(kd- Kmin||R)
\eR3 2 i=i
kmin < kd < kmax,
^min < id < ^max Fmin < Fext < Fmax,
T(xt) > G,
where kd, kmin, kmax G are the vectors of the diagonal elements of desired, minimum and maximum stiffness respectively, imin, id, imax G the vectors of the diagonal elements of the desired, minimum and maximum damping ratio accordingly, Q, R G ^3x3 are weighting matrices used to modulate the attention of the optimizer to the optimization terms.
Experimental setup and technical details. This study employed the iiwa KUKA LBR 14 robotic platform for two key experiments:
1. Force Estimation Calibration: This experiment aimed to verify the reliability of the external forces estimated by the robot's inverse dynamics. A Force/Torque (FT) sensor was integrated to compare the measured forces with the robot's estimated forces at the TCP (Tool Center Point) frame.
The robot was programmed to follow a predefined path on the plate's surface, applying different desired forces along the X-axis of the TCP frame: [15, 20, 25, 30, 35] N. Both the FT sensor and the
JOURNAL OF INSTRUMENT ENGINEERING. 2024. Vol. 67, N 10
H3B. By30B. nPHBOPOCTPOEHME. 2024. T. 67, № 10
robot's sensors recorded force data, which were filtered using an exponential moving average (EMA) filter (a = 0.1) to remove noise. The error between the measured and estimated forces was calculated using the formula:
e = FJT - FjCP,
where e is the error signal, FjT, FjCP are the recorded forces from FT sensor and the robot end-effector (TCP frame) respectively.
The mean and variance of the error were calculated, revealing a mean error of 4.29 N, which was deemed negligible. This small error was attributed to partial synchronization issues between the real FT sensor data and the robot's ROS-based readings, confirming the reliability of the robot's force estimation (refer to Fig. 2, a — force measured from FT sensor (FjT) vs. estimated force from robot inverse dynamics on TCP frame (FjCP); b — error value e).
Fig. 2
2. Cutting experimental setup and stiffness optimization: Like the previous experiment, the robot (Fig. 3, using penoplex material as an example) was commanded using a ROS program script written in Python with the rospy library. However, in this setup, impedance control mode was used as the base controller. The desired stiffness was applied in three different scenarios: constant stiffness k = [500, 500, 500] N/m, maximum allowed stiffness kmax, and the online optimized stiffness kd from the QP optimizer output. These scenarios were tested on three different materials: cork, PVC,
and penoplex.
The chosen values for the optimizer were as follows: kmin = [10, 10, 10] N/m, kmax = [5000, 5000, 5000]N/m, = [0.0, 0.0, 0.0] N/m and ^max = [1.0, 1.0, 0.1] N/m. The bounds for the force component were chosen according to the maximum and minimum generated forces by BP model for each material Fmin = min(Fd), Fmax = max(Fd) with different bounds for different directions. Simultaneously, applying forces using constant stiffness i.e., k results in failure to achieve the required task.
Experimentally, the weighting matrices were set to be diagonal matrices as with values Q = I, R = I10-9.
The Python programming language with SciPy library were used to adopt the QP optimizer, which relies on the bound-constrained optimization algorithm L-BFGS-B [11]. The position and velocity errors Xi, Fig. 3 Xi together with the force dynamic constraints play an
essential role in the optimization behavior (Fig. 4 — optimization results for the online QP stiffness optimizer (PVC material): Figs. a-c represent the position error on X, Y, Z axes respectively; Figs. d-f represent how Fext applied by robot using optimal impedance at each time step fits the desired force Fd produced by GMR-model on X, Y, Z respectively; Figs. g-l represent the values of the optimal impedance calculated by the optimizer on X, Y, Z axes respectively).
a)
X 1.0
0.0
-1.0
<0 Fx 100
50
0
S\'
— Position X
) 100 200 3( )0
j)
1.0
0.0
— Desired Force [N]
■ ■ ■ Actual Force [N]
100
200
300 I
— GainA^f
p
— Gain^ rn
1
b) Y 1.0
0.0
f1.0
e) Fy 100
50
0
h)
k) ^y
1.0
0.0
— Position Y
) 100 2( )0 300
— Desired Force [N] ■ ■ ■ Actual Force [Nl
100
200
300 I
c)
Z 1.0
0.0 71.0
f) Fz 100
50
0
— Gain ki
J
J
0 100 200 300
Gain^;
k* 4000
/ °
0
£
1.0
0.0
— Positic n Z
100
200
300 I
t r
—Desired Force LNJ *
■ • • Actual Force [N]
100
200
300 /
— Gain ki
100 200 300 I
— Ga in^ 1
100 200 300 I 0 100 200 300 I 0
Fig. 4
100 200 300 I
Conclusions and future work. This study aimed at developing variable impedance learning control strategy (VIC) based on desired human sensory-motor skill encoded by GMR behavior prior model. Initial experiments confirmed the reliability of external force estimates through robot inverse dynamics using a specialized setup with a pointing tool and an FT sensor for accurate measurement. Subsequent experiments involved cutting different materials, where behavior priors guided a Quadratic Programming (QP) optimizer to tune stiffness and damping in real-time. The method's effectiveness was tested under various impedance scenarios: optimal, constant, and maximum. A notable innovation was the incorporation of developing a variable impedance control strategy where the motion dynamics are learnt from encoded human sensory-motor skill instead of classical tuning algorithms.
REFERENCES
1. Ali Waddah, Kolyubin S.A. Journal of Instrument Engineering, 2024, no. 6(67), pp. 500-510.
2. Hussein A., Gaber M.M., Elyan E., and Jayne C. ACM Computing Surveys, 2017, no. 2(50), art. 21, https://doi. org/10.1145/3071073.
3. Ravichandar H., Polydoros A.S., Chernova S., and Billard A. Annual review of control, robotics, and autonomous systems, 2020, no. 1(3), pp. 297-330, D0I:10.1146/annurev-control-100819-063206.
4. Abu-Dakka F.J. and Kyrki V. IEEE International Conference on Robotics and Automation, Paris, France, 2020, pp. 4421-4426.
JOURNAL OF INSTRUMENT ENGINEERING. 2024. Vol. 67, N 10
H3B. By30B. nPHBOPOCTPOEHME. 2024. T. 67, № 10
5. Wang Y.Q., Hu Y.D., El Zaatari S., Li W.D., Zhou Y. Robotics and Computer-Integrated Manufacturing, 2021, no. 9(71), pp. 102169, D0I:10.1016/j.rcim.2021.102169
6. Kronander K. and Billard A. IEEE Transactions on Haptics, 2014, vol. 7, pp. 367-380.
7. Saveriano M. and Lee D. IEEE International Conference on Ubiquitous Robots and Ambient Intelligence, Kuala Lumpur, Malaysia, 2014, pp. 368-373.
8. Khansari-Zadeh S.M. and Billard A. IEEE Transaction on Robotics, 2011, vol. 27, pp. 943-957.
9. Li M., Yin H., Tahara K., and Billard A. IEEE International Conference on Robotics and Automation, Hong Kong, China, 2014, pp. 6784-6791.
10. Suomalainen M., Calinon S., Pignat E., and Kyrki V. IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 2019, pp. 8676-8682.
11. Byrd R.H., Lu P., and Nocedal J. SIAM Journal on Scientific and Statistical Computing, 1995, no. 5(16), pp. 11901208.
DATA ON AUTHORS
Waddah Ali — Post-Graduate Student; ITMO University, Faculty of Control Systems
and Robotics, International Laboratory of Biomechatronics and Energy-Efficient Robotics; Engineer; E-mail: [email protected] Sergey A. Kolyubin — Dr. Sci.; ITMO University, Faculty of Control Systems and Robotics,
International Laboratory of Biomechatronics and Energy-Efficient Robotics; Professor; Chief Researcher; E-mail: [email protected]
Received 27.05.2024; approved after reviewing 26.06.2024; accepted for publication 23.08.2024
СПИСОК ЛИТЕРАТУРЫ
1. Ali Waddah, Kolyubin S. А. Training behavior priors models for programming robotic contact-rich manipulation // Journal of Instrument Engineering, 2024, Vol. 67, N 6. Р 500-510.
2. Hussein A., Gaber M. M., Elyan E., and Jayne C. Imitation learning: A survey of learning // ACM Computing Surveys. 2017. Vol. 50, N 2. Art. no. 21. https://doi.org/10.1145/3071073.
3. Ravichandar H., Polydoros A. S., Chernova S., and Billard A. Recent advances in robot learning from demonstration // Annual Review of Control, Robotics, and Autonomous Systems. 2020. Vol. 3, N 1. Р. 297-330. D0I:10.1146/annurev-control-100819-063206.
4. Abu-Dakka F. J. and Kyrki V. Geometry-aware dynamic movement primitives // IEEE International Conference on Robotics and Automation. Paris, France, 2020. P. 4421-4426.
5. Wang Y. Q., Hu Y. D., El Zaatari S., Li W. D., Zhou Y. Optimised Learning from Demonstrations for Collaborative Robots // Robotics and Computer-Integrated Manufacturing. 2021. Vol. 71, N 9. Р. 102169. D0I:10.1016/j. rcim.2021.102169.
6. Kronander K. and Billard A. Learning compliant manipulation through kinesthetic and tactile human-robot interaction // IEEE Transactions on Haptics. 2014. Vol. 7. P. 367-380.
7. Saveriano M. and Lee D. Learning motion and impedance behaviors from human demonstration // IEEE International Conference on Ubiquitous Robots and Ambient Intelligence. Kuala Lumpur, Malaysia, 2014. P. 368-373.
8. Khansari-Zadeh S. M. and Billard A. Learning stable non-linear dynamical systems with gaussian mixture models // IEEE Transaction on Robotics. 2011. Vol. 27. P. 943-957.
9. Li M., Yin H., Tahara K., and Billard A. Learning object-level impedance control for robust grasping and dexterous manipulation // IEEE International Conference on Robotics and Automation. Hong Kong, China, 2014. P. 6784-6791.
10. Suomalainen M., Calinon S., Pignat E., and Kyrki V. Improving dual-arm assembly by master-slave compliance // IEEE International Conference on Robotics and Automation (ICRA). Montreal, QC, Canada, 2019. P. 8676-8682.
11. Byrd R. H., Lu P., and Nocedal J. A Limited Memory Algorithm for Bound Constrained Optimization // SIAM Journal on Scientific and Statistical Computing. 1995. Vol. 16, N 5. P. 1190-1208.
СВЕДЕНИЯ ОБ АВТОРАХ
аспирант; Университет ИТМО, факультет систем управления и робототехники, Международная лаборатория биомехатроники и энергоэффективной робототехники; инженер: E-mail: [email protected]
д-р техн. наук; Университет ИТМО, факультет систем управления и робототехники, Международная лаборатория биомехатроники и энергоэффективной робототехники; профессор, г.н.с.; E-mail: [email protected]
Поступила в редакцию 27.05.2024; одобрена после рецензирования 26.06.2024; принята к публикации 23.08.2024.
Ваддах Али
Сергей Алексеевич Колюбин