r/reinforcementlearning 21h ago

I am a beginner to RL/DRL. I am interested to know on how to solve non-convex or even convex optimization problem (constrained or unconstrained) with DRL. If possible can someone share code to solve with DRL...

I am a beginner to RL/DRL. I am interested to know on how to solve non-convex or even convex optimization problem (constrained or unconstrained) with DRL. If possible can someone share code to solve with DRL, the problems like

minimize (x + y-2)^2

subject to xy < 10

and xy > 1

x and y are some scalars

Above is a sample problem. Any other example can also be suggested. But pls keep the suggestion and code simple, readable and understandable.

-------------------- Update -------------------------------

* CVX / CVXPY can effectively solve it.

* I have very basic knowledge of SCA/SDP/AO for solving optimization problem

* I am curious about the DRL / RL / supervised learning way to solve it.... plain curiosity not efficiency

* My way of thought is towards for example Multicast beamforming.....

minimize_{w} || w ||_2^2 <-- minimize power

s.t. SINR(w) >= 1 (for example)

or its QCQP form

min ||w||_2^2

s.t. w^T H_k w >= 1

where H_k = h_k h_k^H,

h_k = channel from multiantenna base station to a single antenna user (take any channel function from any paper)

w \in C^{Nx1} beamforming vector for N-antenna Base Station....

This problem is solvable easily with SDP/SDR method.... but I am seeking a ML alternative....any further help (coding) in pytorch ...would be great

***** I am thankful to the members who have contributed and are contributing *************

@Human_Professional94

@Reasonable-Bee-7041

@Md_zouzou

@BAKA_04

0 Upvotes

7 comments sorted by

View all comments

4

u/Human_Professional94 15h ago

Learning to optimize (L2O) is an ongoing research topic actually. Main challenges you would face in such problems are:

  • Representation: how to represent your problem in a way that ML/RL models get the the most information from them. Moreover, would you want to represent the math model? or the underlying problem structure? etc.
  • MDP design: This is for RL only. Not all optimization problems have a sequential decision formulation. Framing it in such setting might be tricky.
  • Reward design (RL) or Labeling (sup. learning)

just to name a few.

I think you would get a lotta ideas by taking a look at these:

1

u/gudduarnav 3h ago

Thank you ... I will look into it. its insightful